Matthew Skelton - Accidental Architects: how HR designs software systems (EN)

[00:00:14] So, my name's Matthew Skelton. I'm the co-author of the book team topologies, organizing business and technology teams for fast flow. And I'd like to share with you some ideas today about how HR, or the human resources department, people department, can actually end up being the accidental architects of our software systems. This sounds really strange, right? But let me, uh, let me share with you why this might be the case.
[00:00:48] So the book team topologies came out just over a year ago, September 2019, it was published by IT Revolution Press. The home of books like DevOps handbook, Phoenix project, project is product, and accelerate. It's a great family of books. Before we go any further, let's look at this word topology because it's not very familiar.
[00:01:09] So it comes from a Greek word meaning kind of where things are placed. And the way we interpret it these days is, it means the way in which constituent parts are interrelated or arranged. So if we think about team topologies, this is the way in which teams are interrelated or arranged. And this is in the context of a fast flow of change for building software systems.
[00:01:34] And the ideas that, um, that are contained in the book, have been really useful for lots of organizations around the world. So the the the ideas that that we that we were working on before the book was published, were used by companies like Netflix. So Phillip Fisher Ogden, who's director of engineering at Netflix, said, uh, thanks for your insightful articulations of develop topologies. They inspired many discussions and helped us to think about what model Netflix teams could be using. It also, also used at uh places like uh Conde Nast, the big publishing house. Crystal Hershon, she was director of engineering there until recently, and she said your topological model has resonated extremely well. With both the development and the operations sides, and she liked the balanced arguments for the different different perspectives for for each pattern. The book has been described as innovative tools and concepts for structuring a next generation digital operating model.
[00:02:32] So these are strategically important uh concepts and ideas that we can apply at least inside IT and in practice, probably in other parts of the organization too.
[00:02:45] Okay, so let's, let's have a little uh think about who designs the architecture of the software systems.
[00:02:52] Is it software developers? It can be. Is it software architects? Could be. By the way, this is a, this is an official picture of a software architect, they they tend to look exactly like this, really happy all the time. I'm joking. Um, or could it actually be the HR department people?
[00:03:14] That sounds odd.
[00:03:17] But actually, the HR department can act as accidental architects of the software system.
[00:03:24] This is an actual photograph of me when I first realized this a few years ago. Let's explore why this might be the case. First, we actually need to look at something called Conway's law.
[00:03:36] So in 1968, Mel Conway, who was then I think working for a large computer corporation, um, he wrote a paper in which he, uh, a published in a journal, wrote an article that said, um, any organization that designs a system broadly defined, will produce a design whose structure is a copy of the organization's communication structure. Now this has been quite well known recently and in the last few years has been has resurfaced as a as a kind of guiding principle. But let's just take a few minutes to to check this. Is it really true? Let's have a look. Got a lots of people who have been quite skeptical about this. And one of the things that we did when writing the team topologies book was to actually look at the academic research in this kind of area.
[00:04:31] In 2012, there was a research paper published, um, by Mac and colleagues, and they did a study about closed source and open source software. And this was one of their conclusions. Products tend to mirror the the architecture of the of the organizations in which they are developed. This dynamic occurs because the organizations governance structures, problem-solving routines,
[00:04:56] and communication patterns constrain the space in which it searches for solutions.
[00:05:05] They actually used um something called design structure matrix to analyze boundary and interface relationships between all kinds of different software. There's lots of detail if you go and read the original paper.
[00:05:18] Um, we're not going to go through all that now, you'll be glad to know. Their key conclusion was we find strong evidence to support the hypothesis that a product's architecture tends to mirror the structure of the organization in which it is developed. So that's for software. And there are other studies apart from this one, but this this is, this is the one that's that's we'll look at today. Back in 2004, there was a study looking at the design and manufacture of jet engines for air for for airplanes.
[00:05:48] And they looked at problems that related to the um where where the boundaries of components were inside inside the the jet engines.
[00:05:59] And in their conclusion, they said we provide empirical evidence that product ambiguity exists, and it is more likely to be present across organizational and system boundaries. That's that's a quotation from their paper effectively, it's saying saying something very similar that if there is a mismatch between the the organization and the product architecture there will be problems. And a further study in 2010 looking at the design and manufacture of cars, so automotive industry from and colleagues. And in their study, they eventually concluded that
[00:06:35] mismatches, differences between product architecture and organizational structure, are positively associated with quality problems. That means again, if you have a mismatch between your your your organization and your product architecture,
[00:06:49] there's likely to be problems.
[00:06:53] Now, we can take advantage of this mirroring effect of Conway's law.
[00:06:59] And we can do that by using something called reverse Conway maneuver.
[00:07:05] Now, we should say very clearly, this is not some kind of thing you do on the highway. This is not something that you do with your car and do some strange maneuver, that's not what we're talking about. We're really, what we're really saying is we should design the organization to mirror the desired system architecture. So if we know there's a kind of system architecture which would help meet the business need and customer needs, then how can we set up our organization so it mirrors, so the communication pathways mirror, the desired architecture. And if you think about it, this is basically heresy, this is basically, you know, if I were saying this, uh, I should be burned like a witch in the in the in the medieval medieval period, in the Middle Ages, right? It's, it's basically heresy, it's basically heretical compared to traditional software architecture.
[00:07:54] Where we design it very carefully, we don't really think about the organization and then we have people build this wonderful architecture. Turns out, we actually need to design our organization and our software architecture at the same time. It's it's it's the same process effectively when we're moving at speed.
[00:08:14] What does this mean in practice? Let's have a little example. Um, let's look at an example of of Conway's law kind of in practice. This is this is a simplified version.
[00:08:25] Just imagine to start with that we've got uh four software teams, A B C D. They've got front end and back end, so front end in blue, back end in green. And there's a single database team, DBA, in orange.
[00:08:43] And the each within each team, the front end developer does some work and then passes it to the back end developer, they do some work and they pass it to the database team saying, hey, can you update the database? It's this is, if this is the communication pathway in the organization, then let's just take this diagram and turn it through 90 degrees. The likely software architecture would look like this.
[00:09:05] We'd have four separate applications, but all sharing the same database. If that's what you want, that's fine.
[00:09:15] But these days we probably don't want an architecture like that because it limits flow. The shared database will tend to tend to cause bottlenecks, it'll tend to cause conflicts. We would tend to want an an an architecture that looks a bit more like this. Where we've got uh independent databases for for different applications or different services. If we want this kind of architecture, then let's change our organization architecture to match this. So what we're going to do just we'll turn take this diagram and turn it back through 90 degrees and there we've got the organization architecture. So we've now got a database capability or data capability inside each team. This is a very simplistic example, but this is a, this kind of tries to illustrate what we mean. Uh by Conway's law in practice.
[00:10:03] Ruth Malan has won some awards for her work with software architecture and she puts it like this. If the architecture of the system and the architecture of the organization are at odds, if they're different, the architecture of the organization wins. So she put it even more strongly than than than Mel Conway in the original article.
[00:10:22] The key thing to to realize here is that the design of the organization is really constraining the kind of solutions that we can find. So organization design is a constraint on the solution search space. Now, that's kind of a strategic problem, potentially strategic problem. If we get the if we get the design wrong, if our organization's been set up in the in in a way which may have had some good intentions, but but it works against the architectures that we need, then this is really constraining the entire business, potentially constraining the entire business for finding the right kind of solutions.
[00:11:06] And what kind of principles should we have in 2020 and beyond when we're designing an organization? What kind of principles do we need if we're if we have lots of software in the organization, if the software is a real key aspect of how our organization works? And you know, these days now, software is a key aspect of many, many, many organizations, even if you're not building or or or selling software, the software is a key aspect of it. So our starting point in team topologies is a rapid flow of change. We need to have a rapid flow of changes to the software system on an ongoing basis. We are not building a piece of software and then shipping it, it's not discrete. It's a continuous flow of change always, on an ongoing basis. But it's not just change going in one direction.
[00:12:00] We need to listen to what's happening in the live environment, in production, and bring that information back into the organization for us to be able to change what we're doing in some way. So this old model of development, doing some work and then handing over to an operational infrastructure team doesn't work. It doesn't work because at this handover point, things go wrong. In fact, in many organizations, there used to be many, many separate functions.
[00:12:35] Development, testing, some kind of transition or release, live. And something would go wrong at each of these stages typically. and equally importantly, when we have many handoffs, this causes delays. This causes waste time, uh, effectively we've got many different queues inside the organization. And that causes that causes uh huge waiting time when we're trying to do something. So these handovers can kill a flow of change. So what we need to do is kill these handovers. What we need is a team
[00:13:14] that has end-to-end responsibility for all the aspects of a piece of the software system. Including not just the features, not just what the software does, provides. But the other aspects too, things like our ability to release it, our ability to test it, our ability to make it work well in production. Our ability to support it. So releaseability, testability, operability, supportability, all of these things need to be addressed by this team end to end.
[00:13:46] And so a product team, it's sometimes called a product team, has this end-to-end responsibility. They they have all the skills and capabilities they need to be able to take it from idea, build it, run it, listen to the signals that are happening in the live environment and bring those that information back into the team to be able to improve things.
[00:14:05] And this reduces, this is reduces all these there are no handovers now. We're not handing this work to another team, which allows a fast flow of change. And in the book, we call in the team topologies book, we call these teams stream aligned. These teams are aligned to a stream of change in the organization.
[00:14:23] Um, and that's the starting point.
[00:14:28] If this, if a team like this can, um, can build and run a system entirely by itself, then that's all we need. But typically in a in a in a larger organization, we end up having what's called a platform, which gives us certain kind of uh underlying capabilities. It might be relating to data, design, infrastructure, uh automation, this kind of thing. So that this team, this stream aligned team can focus on the business domain or on on the main thing that they're they're focused on, and they don't have to worry about the details of things that don't do not relate to their main focus.
[00:15:06] Now we potentially end up with multiple streamline teams like this, sharing some aspects of a platform.
[00:15:15] Um, but these stream line teams are substantially independent. They don't need they don't depend on each other. We've aligned the software architecture to have multiple independent, loosely coupled applications and services with a with a well-defined interface to a platform. So this is our organization architecture and our team architecture, sorry, organizational architecture and the software architecture at the same time. It's it's a diagram of of of effectively the same the same diagram represents both the organization and and the software architecture.
[00:15:54] Now, there's a limit to the size of these teams. And that's because
[00:16:00] there are natural human uh limits to our trust that we can have with with other people. Robin Dunbar is a is a British uh anthropologist who has done lots of research in this space. And he talks about an individual's social network. That we're not just talking about online media like Twitter and things, we're talking about there there are relationships with other people in the in the real world. And his his his finding, which is quite famous, is that an individual social network is typically around about 100 to 200 individuals. Usually the number 150 is used. So we can have meaningful relationships with about 150 people. Interestingly, Robin Dunbar has repeated this work in uh online social networks like LinkedIn and Twitter and Facebook and found a similar pattern, typically we actually only interact with up to about 150 other people in in these different spaces. So, um, there there are many different kind of group trust boundaries, um, that have been discovered through a kind of anthropological uh research process by people like Robin Dunbar. But actually, we can see some similar kind of trust boundaries, uh, if we look at uh the military. Um, and you'll find a very interesting article called social inflection points by uh by someone called Ben Ford. Now Ben Ford, uh, used to be in the British military in an elite British military uh force, so he's got direct experience of what it's like to work in a high performing team in a military context. He now works uh in software development, helping people to make their organizations more effective and and understand kind of dynamics about software development. And he talks about uh these social inflection points
[00:17:55] that exist in the kind of military groupings. The military, in the military, in the army, for example, uh, armies have had hundreds, if not thousands of years to find these kind of grouping sizes that work effectively. And he he describes three at least, a group of eight people in in in the British military is called a section. And that's roughly the size of a of a hunting party from from kind of ancient times.
[00:18:28] A group of 30 to 50 people is called, in the military, is called a troop, and that relates roughly to a kind of tribe of people from again from ancient times. And a group of about 100 to 150 people is called a company, in military terms, and that's about the size of a village. Now here's what's really interesting. We've got that 150 number, which is the Dunbar number, the number of people that we can have meaningful relationships with. In the UK, in Britain, we have some very old records that go back to, um, about nearly a thousand years. And if you look back at these records, the typical size of a village in England at the time was 150 people. There was a census that had happened and a set of records, people were counted. So we've got data going back all the way many hundreds of years.
[00:19:20] that suggests that this, this, this trust boundaries are very consistent in lots of different parts of human society. Okay, let's bring this back into kind of a software delivery organizational context. If there are a set of trust boundaries in place, this has a strong implication for how effectively we can organize the the the organization, how we can group things. To enable the fast flow of change. If you want the fastest flow of change possible, we have to have the the the highest trust possible. And that's a group of about eight people. That goes back to the social inflection point that we saw from from Ben Ford. And that's that uh that smallest grouping of eight being proven by the military for many hundreds of years. That's the highest trust, that's going to allow us to deploy change the the quickest. But there are other groupings that are slightly larger than that, um, up to we've got a boundary of about perhaps about 100 about 50 people, that's several teams, but not very many teams. And there's another group of at about 150 people. Again, that's several groups of teams, but it's not infinitely large. The key thing here is that trust dynamics can change when we're crossing one of these Dunbar boundaries or one of these social inflection points. And we should expect different kind of rules, different behaviors to emerge. We cannot just keep growing the team. We have to understand the the, um, the effect of of these kind of trust dynamics at play in the organization. So we cannot have, for example, HR or people department just assigning many keep assigning more and more people into these groupings because the trust dynamics will change, that will affect our flow of change, we will also start to affect the the kind of software architecture that we're building because we're growing some of these teams too large and that has a direct then a result on the likely software architecture that's going to to emerge.
[00:21:20] There's a very interesting case study in our in the team topologies book from a company in the UK called Autotrader. Um, and uh they, this is a few years ago, when they were uh re-doing, rebuilding and refitting their office, their head office. Um, they wanted to take advantage of Conway's law and this kind of socio-technical mirroring. They wanted to enable a fast flow of change in each of their three business areas. Um, for the sake of the talk today, we'll call these the these different business areas, private car sales, commercial car sales, and car leasing. And they they wanted to have the fastest flow of change in each one of these different business areas.
[00:22:03] So what they did is they put everybody who was who was working on private car sales on the same floor. Everybody who was working in commercial car sales on the same floor and everybody's working on car leasing on the same on the same level, on the same floor of the building. When I say everyone, I mean everyone. Support, engineering, operations, legal, HR, marketing, sales, everybody was in the same floor. So they were all focused on a flow of change in that particular business unit, in that particular area.
[00:22:44] And we they did not have a group of people on for marketing or sales all sitting together working across these three different areas. They'd split it by kind of business area. It's a very interesting example of thinking about the effects of communication and trust boundaries at an organizational level when we're thinking about a fast flow of change. It also means we might need to take some care about how our uh digital communication, our online communication is set up. If we're using something like Slack or Microsoft Teams or something like this. That's the kind of the the digital environment. What does that feel like?
[00:23:21] Well, perhaps we actually need to have separate Slack or communication groups for different parts of the organization. Because otherwise we might not have the right kind of trust inside that grouping. We might lose trust because because there are too many people inside that group. It's it's something that organizations are starting to explore now. Maybe you have a separate Slack for each of these three groups. It certainly means it would it would keep those significantly separate.
[00:23:54] So crucially, we need to co-design, we need to design together the organization and the system architecture. So this is a joint effort, a joint effort between engineers, architects, and people who are making decisions about teams. So this is managers, HR people, facilities people, They're all making decisions about the the where people work in an organization, whether it's a physical building or whether it's a grouping inside a chat tool or or a team boundary or something like that. We have to work on these things together, they're not it's not possible, it's not sensible to separate them.
[00:24:36] And it's fundamentally then is is is a team first approach.
[00:24:40] So, that's basically that that gives you a flavor of why HR might be accidentally designing software architecture if we're not thinking about where people are placed, if we're not thinking if we're not taking a team first approach, we might accidentally be influencing and designing aspects of the software system. So let's not do that. Let's take a proper team first approach, understand we need to design these things together, um, and and take advantage of things like the reverse Conway maneuver to help us do that.
[00:25:14] As I mentioned, the book's available online. If you go to teamtopologies.com/book, you'll find lots of uh books bookstores all all the way around the world selling the book. We're actually working currently on a workbook that's specific to remote teams, so remote ways of working like we find ourselves in now in in the in the pandemic. Um, we're that will hopefully come out soon. We're working with IT Revolution, the publisher of the book on this. And uh certainly initially at the beginning it'll it'll be a free free PDF download.
[00:25:47] So if you'd like more information, then uh head to our website. There's lots of material and learning and videos and slides and all sorts of things like that on the website at teamtopologies.com.
[00:25:59] We do offer some remote friendly training, we've been doing that since March. Um so get in touch if that's something you're interested in.
[00:26:06] Uh, we've got some assessments as well.
[00:26:09] Lots of resources online. Um, we've got a GitHub repository as well for some kind of templates and and and code and things like that, so you can send us a pull request and we'll get things get things modified. Free to use, um, freely available, kind of open source. So, Thank you very much. It's been great to be here. That's the end of my talk and I'm looking forward now to the questions.

Transcript