Holly Cummins

[00:00:15] Last year I had the the happy talk about how to have fun in the workplace. This year I have the um the unhappy talk about what happens when you get things wrong. Um hopefully there'll be a little bit of of fun in this as well because it's always both enjoyable and educational to hear about other people's mistakes. Um so the the sort of the starting point um for this talk is that I'm a consultant in the IBM garage. And so one of the things I do is I go out and work with both startups and large companies and they say, we'd really like to move to the cloud, we'd really like to get them. And then they tell me about what's happened so far and I go, oh, I wouldn't have done it like that. Um but as with anything interesting, there's lots of ways to get it right and I think there's lots of ways to get it wrong as well or or maybe get an outcome that you didn't really want. Um, so one problem we see, the the sort of the first problem we see is just figuring out what even is cloud native. And it should be sort of an easy thing, but it turns out to be really hard. And I started thinking about the definition of cloud native because I saw um a tweet from Daniel Bryant who writes for InfoQ. And he had an article um from someone who works at Red Hat, Bilgan Ibrahim. And um, he said, oh, this I really like how Bilgan framed, um, the relationship between ESB and microservices and cloud.
[00:02:01] And if you look at the actual picture, um, you can see that what Bilgan has done is he's he's sort of shown a progression from 20 years ago, we all used to do SOA or enterprise service bus. And then we went on to microservices, and then we went on to cloud native. And what defines cloud native is that you have a lot of little services and the platform is smart and the services are dumb. And I looked and I thought, this this just doesn't match my understanding of cloud native. And it's a really great article, um, and Bilgan does so much good stuff. But I just thought I I feel like I'm doing cloud native and yet nothing I do has an architecture that looks anything like the one in Bilgan's article. So, um, am I wrong? Or or what's going on? And in particular, there was so much emphasis on microservices. Um, and again, if you look at the title of the article, the title of the article is microservices. So, of course, it was going to be about microservices. But I still felt like the emphasis on microservices wasn't quite right. Um, so I went away and I looked at the Cloud Native Computing Foundation's definition of cloud native. And they actually have something really similar. So they said it uses microservices and containers and the important thing is that it's dynamically orchestrated. And I thought, that that just doesn't feel right to me. But then that puts me in a really weird position because I'm here at Flocon saying, I, Holly Cummings, know what cloud native is. The Cloud Native Computing Foundation, they don't know what cloud native is. And of course, they're the Cloud Native Computing Foundation. They, they know a lot about cloud native, and yet their definition didn't seem right to me.
[00:04:18] Um, and then it's sort of, I feel like at any point when I say that, someone's going to say, oh, we've made a bit of a mistake inviting Holly because she's saying the Cloud Native Computing Foundation are wrong. Maybe we could just uninvite Holly. But in fact, it isn't, um, it isn't just me who has, who worries about that definition of cloud native. Um, this was a tweet I saw yesterday from Sam Newman. Um, and he's saying that an interesting side effect of the CNCF is that now he often speaks to people who think they have to be in Kubernetes in order to be cloud native. And you can see that Sam Newman doesn't agree with that either.
[00:05:07] So about um a few months ago, I was looking um again, and I realized that actually, the CNCF have changed their definition of cloud native. So it used to be that you'd go to their website and it would just have like a page that said, what is cloud native? But now if you go to their website, you have to go to another page, which is the Cloud Native definition. And if you go to that page, it's actually a Git repo. And so when you see Git in there, you think, hmm, this probably changes quite a lot. Um, and you can see, yeah, there's 18 people who've been making changes to this definition. And if you look at the history for the definition, there's a lot of changes that just goes on and on and on. And some of those are things like translation. But a lot of them are actually changing the wording of the definition. And then if you go even further back, if you look at the first bit of the definition, even before it made it into Git, they had 11 drafts. So I wasn't in those conversations, but I think they spent a lot of time trying to figure out what cloud native was. And that's pretty much the same for all of us in the industry. If you ask 10 people what cloud native is, they will all know what cloud native is, but you'll get 10 different answers. So, some people will say, cloud native means it was born on the cloud. It was designed to be run on the cloud.
[00:06:55] Some people say, cloud native is microservices, like the CNCF used to.
[00:07:04] Some people say, like the people talking to Sam Newman, cloud native means Kubernetes.
[00:07:11] Some people say, they may not say it in so many words, but when they describe how cloud native works, it sounds almost exactly like the definition of how DevOps works. And some people, I think, think cloud native just to mean something that was built in the last five years and it's modern and it's nice and it's not legacy. I sometimes I think we're so used to saying cloud native that we forget, it is actually possible to say just cloud. We don't have to attach cloud native native onto the end of it. But I've kind of almost forgotten that there is a word, which is cloud.
[00:07:54] And sometimes people mean item potent. Um, and then the problem with item potent is that when you say item potent, everybody goes, item what? Um, but item potent really just means rerunnable, so it's about, it's about that state that means that you can take it down and you can bring it up again and it will give you the same behavior.
[00:08:20] So that's a whole bunch of definitions.
[00:08:28] But I think maybe what it's most helpful to to do is to think about what native isn't. Um, at least for me, I think cloud native is not a synonym for microservices. Many microservices applications are cloud native, but there are a lot of cloud native applications where because of what the domain model was, because of the size of the application, it didn't make sense to split it into microservices, and that's okay. And I think if cloud native had, for me at least, if cloud native had to be a synonym for anything, it would be item potent. And you can kind of see why I definitely need the synonym.
[00:09:14] I can't I find it difficult to even say item potent.
[00:09:20] So going back to that definition that we had in 2019. Um, although they they talk about microservices.
[00:09:33] in then in 2020, they still talk about microservices, but it's different because what they say is, it's about immutable immutable infrastructure. And so again, I think that's sort of that idea of item potency and infrastructure as code and that kind of thing. And they talk about microservices, but they don't say you have to be microservices. Services exemplify this approach, which I think to me at least seems a lot more okay. But even in their 2019 definition, one thing that I really liked about it was read down after they say microservices. have a why. And the why is that you want to build great products faster, which is just like, yeah, that that's why we're doing this. That's why we've changed how we architect things, is because we think on the cloud, we can build great products faster. And in the modern version of it, um, they still have a why, but they're a bit more precise about it. So instead of just saying we're going to be really fast and we're going to be great, they say we make high impact changes, so non-trivial changes. We do it frequently. And when we do those changes, things kind of work, they're predictable, and there's minimal toil. So we don't have to do a whole bunch of manual work. And I really, really like that why. But with both definitions and with both whys, I think we have to go back to what problem are we, are we even trying to solve? And that turns out to be another failure. Often, I think organizations feel they should be cloud native, but they don't really know what problem they're trying to solve.
[00:11:31] So, and I think like what problem are we trying to solve is almost always the most important question in any conversation. And for cloud native, we do see this sort of um social driver to it, which is everyone else is doing cloud native, it's got to be a good idea, maybe. And I think we assume that if we have the same architectural patterns as companies like Netflix, we will have the same engineering success as companies like Netflix. And it doesn't really work like that. So there's this sort of wishful mimicry where we take the visible rituals.
[00:12:13] But we don't actually take what I think are the most important parts.
[00:12:22] And I think in order to really figure out why we want to step back and figure out, well, why why are we even going to the cloud?
[00:12:32] It used to be that the motivation for going to the cloud was just a cost thing. Um, running in the cloud was cheaper.
[00:12:39] And the reason it was cheaper was partly because there was an economy of scale and else managing your infrastructure. But it was also that you didn't have to pay for most of your infrastructure most of the time. You had that elasticity you could scale up, you could scale down. And so on Black Friday today, you can have lots of servers, the rest of the year, you can have a small number of servers, and that's what, that's what really enables the cost savings.
[00:13:07] Another benefit of cloud is speed. And this is partly about faster hardware, but mostly about speed to market. The cloud allows us to deliver so much faster because we don't have to print CDs and then like post them out to people. Another benefit of cloud is we're starting to see exotic capabilities on the cloud, which kind of goes back to the speed as well. So if you wanted to do machine learning and you want a whole bunch of GPUs to do that machine learning, you probably don't want to buy those for your own data center because they're really expensive. So you want to just use them in the cloud. Similarly, if you want to do quantum computation, you definitely don't want a quantum computer for your own data center because you have to keep it at about 0.5 Kelvin, which is colder than the space between the stars, and you don't want that in your data center. So IBM was the first to make quantum computers available on the cloud and it's just so obviously a a sensible provisioning model for that kind of computation.
[00:14:17] So that's really good.
[00:14:20] So then why do we want cloud native?
[00:14:23] Well, it turns out, a lot of people when they first moved to the cloud, they got electrocuted. There were all sorts of bad things that can happen if you try and write software the way you always wrote it in the cloud. And so in 2011, the industry came up with the 12 factors.
[00:14:47] And I like to think of the 12 factors really as how to write a cloud application so you don't get electrocuted. So these are the good habits that you need to have. And we talk about them less and some don't really make sense anymore. Some we just don't even think about.
[00:15:04] So it was in 2011, but even before then, there was another idea, which was cloud native. And cloud native was first coined in 2010, so before 12 factors, um, and microservices was 2013, at least in that, um, sort of distributed context. So cloud native as an idea came first. And what Paul Freeman talking about was cloud native was here are the things that you need to do again to like not get electrocuted in the cloud. And so to me, because I remember that, that kind of a definition that makes a lot of sense.
[00:15:48] Uh, and you can see other people still think about that definition as well. So in the Twitter conversation yesterday, another, I have to say that Ian is not a very young developer and he thought cloud native means built to operate in cloud infrastructure, born on the cloud. And now sometimes we see it meaning just running on Kubernetes. Or sometimes we see an idea like lift and shift to cloud native, which to me doesn't make sense, but I see it more and more. So we've got all these definitions of cloud native.
[00:16:30] And if we if we try and do something with so many definitions, can we are we all going to agree on the goal? Well, maybe, but there's a lot of opportunity for misunderstanding.
[00:16:43] And so then we get things like, you you go back to your stakeholder and you say, I've I've done this amazing cloud native application and they were hoping for microservices. And then there's a you know, there's a disagreement. Um, or they were hoping for better speed to market and the cloud was ended up only saving money. So then we have this opportunity for problems. But I think one of the biggest problems is that that wishful pre where and in particular for microservices.
[00:17:13] Microservices are a brilliant architectural pattern, but they are not a goal on their own. They're a means to another goal, and we need to figure out that other goal and then figure out if microservices help. And we had a conversation with a bank and they they had a whole bunch of cobal, and there's a lot of good reasons to get rid of your cobal.
[00:17:41] It may be expensive to maintain, it may be that all of your cobal developers are retiring and so just to get the skills, you need to go to a newer language. But in this case, their competition was eating their lunch. They were going too slowly, they needed to go faster. So they said we need to get rid of Cobal and make microservices. And we thought, great, yep, we can totally help you with this. And then they added, release board only meets twice a year. And so at that point, it doesn't matter how many microservices you have, if you're only allowed to release twice a year, they're all going to get released in a batch and you may as well not have bothered with the cost of microservices because you're not actually going to go faster until you fix the release board. And we do see this idea that, I think I think it's maybe almost sort of a, you know, we want to show what good engineers we are. And so we look at what other people are doing and we think containers, that's great. The more containers I manage, the cooler I am. If I have five microservices, I'm kind of okay, but if I have 500 microservices, I'm an awesome engineer. Which is true, but only if you actually make those 500 microservices all work together. And often what we see is that instead of having a really beautiful decoupled system, we have distribution, it's distributed, but it is still so coupled that actually it's a distributed monolith, and that's worse than a normal monolith. Because it doesn't have the compile time checking, it doesn't have the type safety that you get from a single codebase. And when one function calls another one, if it's distributed, that execution isn't guaranteed, the distributed computing fallacies apply, so the other thing could have just gone away, which isn't going to happen if they're all in the same process. So there is a cost to distribution.
[00:19:48] And so when figuring out whether microservices make sense for for you, there's a bunch of things to consider. If you're a small team, that's a good reason not to do microservices. If you're not planning to release independently, don't bother. that's worse than a normal monolith because it doesn't have the compile time checking, it doesn't have the type safety that you get from a single code base. And when one function calls another one, if it's distributed, that execution isn't guaranteed, the distributed computing fallacies apply, so the other thing could have just gone away, which isn't going to happen if they're all in the same process. So there is a cost to distribution. And so when figuring out whether microservices make sense for for you, there's a bunch of things to consider. If you're a small team, that's a good reason not to do microservices.
[00:19:58] If you're not planning to release independently, don't bother. When thinking about how these microservices are going to communicate and handle security between them and some of those sort of cross cutting concerns, usually the sort of the the pattern for doing that is to have a service mesh. And a lot of people hear service mesh and they go, oh, I've heard bad things about service meshes. They're really expensive. They're really hard to manage, I don't want a service mesh. Instead, I'm going to write my own security protocols, and I'm going to write my own discovery protocols, and I'm going to write my own load balancing protocols. And at some point, you've just written a service mesh. And you would have been better off with an open source one. And then sometimes as well, the domain model just doesn't split nicely. and when that happens, what you get is cloud native spaghetti. And I I worked, I was brought into a project, and I arrived and they showed me their microservices, and they said, every time we change one microservice, another breaks.
[00:21:12] And I think they sort of assumed, and and I think a lot of us now assume that if you have microservices because you're distributed, you're decoupled. But that's not necessarily the case, you can have something that is, you're always going to have some coupling. Anything that communicates is going to be coupled. And if just because it's distributed doesn't change that. So you need to be looking at your coupling or what your architectural pattern is. Because you can completely have spaghetti in the cloud. And it's it's scarier than normal spaghetti. So this is a a picture from cloudy with chance of meatballls, and I just love it because it shows the the terror of cloud spaghetti.
[00:21:57] Um, when I sort of dug deeper into why this this microservices were so coupled, I looked at the code and I saw the same the same code in about all of the microservices. And I said, what's going on with this? Why didn't we pull this out to a a common library?
[00:22:17] And they said, well, we didn't put it in a common library because we don't want that coupling. Which is fair, and sometimes it is definitely worth cutting and pasting code to avoid coupling.
[00:22:30] But in this case, the coupling was there because every time they changed a field name, every time they fixed a typo in one class, it broke everything else. So they would have been better off having a single source of truth. And the problem was that in an ideal world, what you have with your microservices is you have a whole bunch of microservices, and each one maps really neatly to the domain model. In this case, what had happened was that the domain model was common across all of the services.
[00:23:06] So, it wasn't good. Another sort of a non-software illustration of coupling, um, is this is the the Mars climate explorer.
[00:23:18] Um, and it looks a bit like this. So it it had um, sort of a sad end, and that's you can see it sort of looks like it's made together out of a bin liner and duct tape, and I think it was actually built more professionally than that. But despite that, instead of going around Mars, which it was supposed to happen, it went into Mars, which is not supposed to happen. And the when they did the analysis to figure out what had happened, it turned out that there was two sub components that controlled the navigation. One was on the the explorer and the other one was on earth. And so the earth one would send occasional corrections and then the explorer would do the rest.
[00:24:05] On the explorer itself, it used metric units, like centimeter. The base unit used Imperial units. And so then, of course, it didn't work. And the problem was that the the units were close enough that it wasn't obviously wrong. It was only wrong when it crashed into Mars. So in this case, you know, you couldn't have had a more distributed system, part of it was on Mars, but distributing did not help. So the way to fix this kind of problem, the way to make sure your interactions aren't going to be broken is you've got to have consumer contract tests. And they're pretty rare in our industry, and I think that's partly because they're hard to understand and because they take work and coordination between teams to write. But it's really worth that effort, it's worth that investment because they're going to give you the confidence that your microservices actually work together.
[00:25:04] And when there isn't the confidence that microservices work together, then we get the not actually continuous, continuous integration and continuous deployment system. And I I always worry when I talk to someone and they say, we have CICD, because CI CI, continuous integration, that's a verb.
[00:25:22] It's not something you buy, it's something you do.
[00:25:27] And and so I hear things like, I'll merge my branch into our CI next week. Well, if your branches are only going in once a week, that's not continuous, no matter what the tool is called. And then I sometimes hear things, you know, like there's just sort of a lot of conversation about CICD, CICD, and then we sort of say, and we release every six months with our CICD. And again, if you're only releasing, if you're only doing continuous deployment every six months, that's not continuous. So then the sort of word continuous has started to mean, well the meaning of it has been forgotten almost. And of course, there is there is a spectrum of how often you should do these things. So when you think about pushing to master, you know, which is, you know, integrating. Um, if you integrate every character, that would be truly continuous, but it would obviously be a ridiculous way to do software, it would cause chaos. If you do every commit and you commit several times an hour, that's pretty good. If you do it every few commits and you're doing it several times a day, I think that's also really good. If you're once a day, once a week or definitely not. Once a month, definitely, definitely not. Um, and if you're doing it once every six months, that's that sounds as ridiculous as doing it every character. But when I, um, when I first started at IBM, and I should say this was 20 years ago, this was not recently. Um, and I of course, I I didn't know anything about professional software engineering. And a teammate was showing me how to use the, um, version control system. And this was websphere and you did not want to be on the websphere build call, and if you broke the build, you would get summoned onto the websphere build call, which happened every day including Saturday. And so the best way to avoid breaking the build and being on the websphere build call was to not integrate code. So they would save all their changes for literally six months on their laptop before pushing. I think we all know now that this is not a good thing, but still do see changes not being committed very often. So I'm a huge advocate of trunk-based development, and the sort of the official line for trunk-based development is once a day. So if you're to the left side of that line, you can say you're doing trunk-based development. Um, and really if you're to the right side of that line, I think it's, you know, it's not a best practice. My is wound every few commits.
[00:28:04] So how often should you release? Well, same story, there's a there's a spectrum. Um, and on this one, I think, you know, we see a lot of companies who really push and they do manage to release every push, which is many times a day, which is great.
[00:28:17] Um, and again, you know, the D and CD is deploy, so really you need to be doing that if you're talking about CD. We used to only release once every two years, obviously, that's not good enough anymore. If you are going to be doing that many times a day, you need to have a really good handle on feature flags and that kind of thing. I think the sort of the minimum to be okay for a best practice is probably once a sprint, anything less than that. it's, you know, definitely pretty old school. Um, anything more than that, it's great. Every push, that's that's pretty hardcore skills. I feel like every user story is maybe a a slightly more attainable goal for most of us, and and still good. So, what about the continuous delivery, which is basically just the testing and staging? Really, I think you've got to be doing that every push, I don't think there's so much of a spectrum on that.
[00:29:23] And the the sort of the the barrier with the CD not actually turning into releases, is that there is a fear of release. So it's a client saying, we can't I want to unpick it and say, why? What what's stopping us?
[00:29:39] doing more frequent deploys. And often it comes back to that microservices issue. Which is the the the point of microservices, the goal is that independently deployable. But some organizations deploy them all at the same time. Some even write their pipelines to kind of enforce that they all get deployed at the same time.
[00:30:08] And sometimes that's because we want every feature to be complete. And this is where I think we want to go back to some of those lean startup ideas. And say, if you're, you know, it's okay to get stuff out there, get the feedback. Because the point of cloud native is speed. And so if you sort of look like cloud, but you're going really slowly,
[00:30:35] what's what's the point? So why have this beautiful architecture that has an overhead that enables you to go faster? If you don't go faster.
[00:30:49] And when you think about the value of feedback, and I think I'm probably with a flow on audience, I'm kind of preaching to the converted. But like, you would never drive like this because you want the feedback about who you're about to run over. But we often deliver software like that, and we'll work for months without getting any feedback. But that feedback, it's good engineering.
[00:31:14] And it's not just like an engineering thing, that feedback also is good business, it helps the business make more accurate decisions. And if it's too scary to ship things, then all these tools that we can use. So we can use the deferred wiring, so we the code is sort of out there. But nothing talks to it. We can use the feature to make sure that the new code is hidden. We can expose it just to a small proportion of our users to see if anything bad happens. So we can do AB testing, we can do Canary deploys. And often the reason these things don't happen is because of a lack of automation. And that often the reason we can't release more quickly is because we have no idea if it works. So we have to do a long face manual QA because the tests aren't automated. And basically, if I hear our tests aren't automated, what I actually hear is, we don't know if our code works. And of course, if you don't know, you know, things are going to change, systems are going to behave in unexpected ways, even if your code is perfect, its dependency update can change behavior, so you need that that testing.
[00:32:27] But the that lack of confidence in the quality stops organizations from shipping.
[00:32:35] And, you know, especially if it's microservices, you need those automated integration tests and those automated contract tests.
[00:32:42] Another thing I see with automation is that the the build, which should be the thing that we all care a lot about, is sort of buried off and people have to go hunt down a web page. And we should be bringing our build status everywhere in a really passive way, so think having things like build radiators, having things like traffic lights to show us when there's a problem and act on it quickly. So something like this, you know, where it's just a monitor that's just always showing the build status, I really like to have that in in my teams.
[00:33:14] And the the problem is if you don't have that, and or if you don't act on it quickly, then the build breaks, and then it stays broken, and everybody hopes it's someone else's problem. So sometimes I look at the build and and someone says, oh yeah, that build's been broken for a few weeks. And I think, well, why didn't we fix it? If it's broken, because it stops giving us useful information if it's just always read.
[00:33:40] So I wanted to shift a little bit and think about again, the sort of the why and the how we use the cloud. Because often we see a cloud which has lost its cloud characteristics, it's sort of so locked down that it's not cloudy anymore.
[00:33:57] And this can happen for all sorts of reasons. So I was speaking to a bank today, and of course, they're very regulated and they do have to do some some things that, you know, maybe the rest of us would prefer not to do. But how they had their network, they were on their journey to cloud.
[00:34:14] But you could you had to choose, so the when you had a laptop, it could either access the cloud systems or it could access their their network, which was on network. You couldn't do both. And so if you wanted to do both, you'd have to have two machines. And this introduced so much friction to everything. So if they wanted to start coding on a new project, it would take them a week just to write that first line of code. Because it would take two days to get a repo, two days to get a pipeline.
[00:34:44] Another thing I see is sometimes we don't change the processes even though we're on the cloud. So going back to that release board, sometimes the will be ready to ship. But the architecture board hasn't happened yet. But also we just often put so much governance around the cloud.
[00:35:04] So this was a story told to me, I didn't it. Um, but client, we sold some provisioning software to IBM, or to um, IBM sold the software. And this was a few years ago when provisioning was still something that was exciting and new. And we told them this software is amazing, you can provision a virtual machine in 10 minutes. And they said, oh yeah. But then they came back to us and they said, this software you sold us doesn't work, it takes us three months to provision something. And we said, oh no, there must be a a technical problem in here. And so, but when we looked, we realized that what had happened is they'd put a pre-approval process in front of the training. And it was an 84 step pre-approval process. I'm amazed they managed to get it done in three months, really.
[00:35:55] So what what they'd done, they sort of take this beautiful cloud and they put so much governance around it. And it's just it's not going to work.
[00:36:07] Another story. Um, and they they had a cloud provider and they told us, oh, we're we're changing our cloud provider. And then they added that the reason they were doing it is because it took them so long to provision things on the original provider because of the internal procurement process. And I thought, changing providers isn't going to fix that. The problem isn't with the provider, the problem is with your internal structures. And you may have a little while before procurement catch up, but you're going to have the same problem, you need to fix something else.
[00:36:39] And so, in in general, like if if the developers are the only ones keen on becoming cloud native, it's not going to work. And when when we have this organization where everything has so much friction, you know, there's there's a real cost because developers will leave.
[00:36:59] But of course, that that governance is there for a reason. Which is sometimes cloud can be expensive. Um, and what what a lot of organizations see is that cloud, it makes it so easy to provision software, that's the joy of cloud. But of course, it doesn't actually mean the hardware is free, it's still running on a data center somewhere. And often we provision things and then they become they stop being useful, but we forgot about them.
[00:37:33] And I, you know, I've done this, when I was learning Kubernetes, I created a cluster to to figure it out. But I had too much work in progress. So I forgot about it for two months. And then when I picked it back up, I realized it had been running away and it was kind of a specified cluster, so it was 1,000 pounds a month. just wasted. And so, you know, I was off doing other things. Um, I think I wasn't actually on holiday, I think I was doing other work that was work in progress. But the effect is the same that money was just evaporating while I was doing something else. And this happens all across the industry. They did a survey in 2017 of 16,000 servers. And they found that a quarter of them were doing no useful work. And they sort of, the authors of this study said, perhaps someone forgot to turn them off. And that happens to all of us and it really adds up. And of course, there's a a climate impact to this, as well as a financial impact if we have all of this waste. And that's why we get the governance. But when we think about what's going on with the the cloud, I I love this quote from Peter Drucker. Because it the cloud makes it so easy to provision software, but if we shouldn't have provisioned it, then it is very, very useless, that ability of the cloud. And a lot of organizations really, really struggle to get a handle on their cloud spend because, you know, every organization is multi-cloud. And so it's spread across different organizations and then we get the sort of Conway's law at the development level. And then it means it's really hard to get a sort of view of cloud picture. So then I think this is one area where cloud actually helps, because now we're sort of starting to see cloud to manage your cloud. And things like, um, multi-cloud management will allow you to shift workloads to where it makes the most financial sense to to run them. And also just get that visibility of what's going on. And I think we're seeing more in this area as well. So Fino is, um, an emerging, it's sort of, you know, the cousin to design and get ops and dev ops and dev sec ops and everything. But it's about it's supposed to be about getting that accountability for the cloud costs, but I think it also helps with just figuring out who forgot to turn on their cloud. And in general, ops in the cloud, if you have a lot of cloud, if you have a lot of microservices, is going to be harder. So we need to be doing these new disciplines like SRE, um, site reliability engineering.
[00:40:28] And what we all of the the DevOps and the SRE and all of these newish disciplines, what they're going to help us do is make releases so boring. That we can do them often.
[00:40:42] And I think we sometimes imagine we we can't release often because our use case is special. And there are some use cases that are genuinely special. Space is one of them. So this is the Phobos one, it's a Russian space probe. But it also had a very sad ending. Um, because what happened was there's these fins that that it rotates to get solar energy. And an engineer did a deploy, and there was sort of the equivalent of linting for the machine code, but it was broken. Or you know, sort of not running properly, so the engineer bypassed the linting. And they had an extra zero somewhere in the machine code that the linter would have caught. But because there was no automated testing, it went straight up to the space probe. Everything seemed to be working okay. But what they didn't realize was that those arms that were supposed to sort of rotate to follow the sun, weren't rotating. And they only discovered that when it ran out of battery. Once it had run out of battery, nothing they sent to this probe was going to revive it. So it was out in space, so they couldn't just sort of, you know, go kick the server, turn it off and on again. So, it was dead.
[00:42:02] And in those cases, if there is a problem, there is no recoverability. But in most cases, there's really good recoverability, and we can be making engineering efforts to get even better recoverability. So, Space Probe, unrecoverable. Most other things pretty recoverable. And so if we think about the sort of the spectrum, at the worst end of the spectrum is that completely unrecoverable thing. If there's a lot of handoffs to recover a service, it's going to be bad. If it needs manual intervention, it's going to be bad. Um, if we if we get it back up fast but with data loss, well, that's a bit of a shame, but, you know, we're still pretty recoverable. And of course, the ideal is that we get it back in milliseconds with no data loss. So overall, hand off automation gig, we want to be trying to automate for that recoverability, which gives us the confidence to release often. So I told a whole bunch of sad stories and I I just want to end with a few things that, you know, we know these are going to help you succeed at cloud native, um, the first is you have that clear goal of what you're trying to achieve with cloud native beyond just buzzword compliance. Everything optimized for feedback, optimized for feedback. And look at the whole system. So this is great opportunity for systems thinking. So that if you automate something, look at the processes around which assume that the the process that was previously manual is expensive or error-prone, because it's not anymore.
[00:43:44] Now that it's been automated.
[00:43:47] So when we look at again, cloud native, it's a really good opportunity for value stream mapping. If we look at our whole release process, if we do a whole bunch of organization in one area, but it's just local optimization, that's not going to give us the outcomes we want. And with that, I'll stop and we can have some Q&A. Thank you.

Transcript