News and Notes from the Makers of Nexus | Sonatype Blog

Wicked Good Development Episode 28: Simon Brown on visualizing software architecture

Written by Kadi Grigg | March 03, 2023

 


When you interview your dream guest, the conversation is wicked good. In this episode, Kadi and co-host Sal Kimmich sit down with Simon Brown–creator of the C4 Software Architecture Model–and Sonatype Developer Advocate Dann Conn. Topics of conversation include the ins and outs of the C4 model, how having a detailed architecture diagram can make or break you, and more.


Tune in as we discuss the intention behind the model, best practices, and how critical it is for technical and non-technical folks alike to understand.



Listen to the episode



Wicked Good Development is available wherever you find your podcasts. Visit our page on Spotify's anchor.fm

Show notes

Hosts

Panelists

Relevant links

Transcript

Kadi Grigg (00:10):
Hi, my name's Kadi Grigg, and welcome to another episode of Wicked Good Development, where we talk shop with OSS innovators, experts in the industry, and dig into what's really happening in the developer community.

Sal Kimmich (00:21):
Excellent. And I am Sal Kimmich. I am the director of open source for AI DevSecOps at EscherCloud. Really excited to talk about C4 modeling with Simon Brown today.

Kadi Grigg (00:30):
Thanks, Sal, and I'm so excited that you're here as a co-host today. Hey, Simon, before we dive in, can you introduce yourself for the listeners at home today?

Simon Brown (00:43):
Yes, certainly. Sofor those of you who don't know me my name's Simon Brown. I'm an independent consultant specializing mostly in software architecture. So, my background is a software developer working mostly for consulting companies. So, we were building software either for or with our customers. And over the years, I've done more software architecture training, software architecture workshops. Now I get to fly around the world and teach people things like my C4 model and how to do software architecture in a nice, lean, lightweight, modern fashion. So, yeah, that's me.

Kadi Grigg (01:15):
Thanks for being here. And newcomer to the pod, Dan Conn.

Dann Conn (01:19):
Hi there. My name's Dan Conn. I am a developer advocate for Sonatype. Before that I was a developer for 10 years and also had a cybersecurity interest as a hobby. I recently graduated with postgrads in security and digital forensics.

Sal Kimmich (01:34):
Excellent. As we're talking about the C4 model today, I think to understand really what it is-- I think it would be interesting to understand, Simon, was there a single situation or a set of situations which inspired you to construct this initially and what were they?

Simon Brown (01:49):
So, thanks very much for inviting me along. I don't think it was a single situation. It was really like a culmination of things that happened over a number of years. So, I worked in London for a number of years, mostly for consulting companies. And if you want to grow a consulting company, you need more teams. And in order to have more teams, you need more tech leads and more architects. So, I was part of the small team of people who would often go and teach software developers how to do architecture and tech lead roles. And eventually I took that training outside of the company and started doing it as public courses in London. And during the public courses, we used to have this little design exercise where we would break the attendees up into groups of 2, 3, 4 people, give them a very simple set requirements, and then ask them to go and design a solution and draw some pictures.

Simon Brown (02:35):
And after that exercise, we used to present the diagrams, look at the diagrams, and try and figure out whether the solutions met any of the requirements. And it turns out that I couldn't understand any of the diagrams and neither could anyone else on this workshop. So, trying to answer the question of "did the solutions actually match the requirements" was almost impossible. So, I thought to myself, well, there must be a better way of doing this. And in the late 90s, early 2000s in my career, I was a big UML user. So, we used to use UML in pretty much all our projects. But UML usage kind of declined quite rapidly in the mid to early 2000s. So, what I did was I really started teaching people the method that I used to draw architecture diagrams when I needed to write documentation diagrams for our customers, because working for a consulting company, of course, that's what you do. And that's essentially what became the C4 model. So, it was just different levels of detail, different diagrams, and the C4 model is just really a formalization of that stuff. That's it really.

Sal Kimmich (03:38):
Yeah, I think it's really interesting. So, I work a lot on observability, and for years I've had to point people to C4 modeling for exactly that situation, actually turning them away from my own consulting work because they literally-- I was like, "You've asked me if I can solve your problem, and you haven't given me enough information to know the answer to that." And so I think it's really important to think with C4 modeling particularly, having the different layers and levels of contextual information is I think what makes it unique to what was being offered at that time and what really makes it so powerful. So, diving into that a little bit, that highest level down to the lower-level architectures, can you explain a little bit about what it is to dive into those models and a little bit about what audience each level would be most appropriate for?

Simon Brown (04:26):
Yeah, certainly. So, the top level, level one, is called the "system context diagram." So, imagine you are in an organization, and you're building a product, you want a diagram that basically says: "This is the product we're building. These are our different types of users." So, users, actors, roles, personas. There's a whole bunch of ways you can think about your users and how they interact with the system you're building. And then your product, your system is not going to run in isolation. It's always going to talk to other systems, whether it's security systems or credit card systems or knowledge management systems. So, the context diagram basically says, what's the thing you're building, who's using it, and how's that fit into the word around it. And it turns out it's such a simple diagram to draw, and many teams skip over this diagram because they think, "Oh, everybody knows the context."

Simon Brown (05:09):
And of course for simple systems, that's true. If you've got like two user types, and you're talking to a credit card provider where it's easy-peasy. But most enterprise systems, it's not two user types in one system. It's like 53 user types and 140 different systems, and no one person has that set of systems in their head. So, it's a great diagram for actually everybody, from non-technical people-- product owners, business users, sponsors, all sorts of non-technical stakeholders-- through to non-technical testers, because obviously they need to know what the system is in order to test it properly, through to us as architects, developers, infrastructure people, people looking at security and compliance concerns. So yeah, it's just like a really nice simple starting point.

Sal Kimmich (05:51):
And then as you dive into it a little bit more, how does it get a little bit more interesting and a little bit more complex? And what does, does it mean to have that capital-V verification sitting inside of that model?

Simon Brown (06:02):
Yeah. So, let me continue the story. So, once you have that context diagram, you can now start to do the Google Maps pinch-to-zoom movement on that system box, on that product box, and then you drop down to level two, which is what I call a "container diagram." Now, there's an unfortunate clash of naming here with something people might be familiar with, which is Docker. This has nothing to do with Docker, I'm afraid. I kind of came up with this naming before Docker, so I had it first. But that's a little bit irrelevant now, I guess. So, by "container," all I'm really talking about is an application or a data store. So, I wanted to choose a generic term to represent application and data store. And of course I completely failed at that, but what's done is done now.

Simon Brown (06:48):
So, when you zoom into the system boundary, the container diagram basically shows you the set of applications and data stores that make up your system. So, if you're building a web-focused product, you might have an AngularJS frontend or a ReactJS frontend that ends up running people's browsers like a single-page app that might be sending data across the internet to like a backend Java Spring Boot app or a .NET app or a Ruby on Rails app. So, that's another container. And then you might be storing information in an Amazon S3 bucket or a MySQL database schema. And really that's what that second-level diagram captures. So, where the context diagram is asking the questions (What's the thing we're building? Who's using it? How's it fit into world around it?), once you zoom into level two, now you're really looking at questions like: "What are the technology building blocks that we are going to use to build our system? What are those technologies that we're choosing? How are we partitioning responsibilities and data across those things that I'm calling containers? And how do they communicate with one another?" So, we're talking about interaction protocols and stuff like that. So yeah, now when we get to this from now, we've kind of lost all the non-technical people, and we're really focused on architects, developers, and people who want to start evaluating and verifying architectures.

Sal Kimmich (08:03):
Right. So, I think that this is so fascinating because we're working right now on trying to help a couple of open source projects with their threat modeling. And again, it's the same situation where you've asked me to help you, but I can't help you because I can't see what you want me to help you fix. And this visibility is so essential, and it has to be the kind of visibility for threat modeling. I need a kind of portable object that stores the information about the state of my system that I can then say, "All right, let's give it a new condition, a new endpoint, a new platform. Does it still then work or is there a security concern or a uptime concern?" So Dan, I'm really curious when we're thinking about these different forms of modeling and observing architecture, when we think about it with these different levels of context diagrams, does that prove to be very useful? I would assume it would be because it allows me to maintain the information that I need to hand off to my non-technical partners. But much more important for me is a proof of record that I have validated each of the security conditions that are possible for a specific architecture. And I find that fascinating.

Dann Conn (09:14):
It is, absolutely. I think the thing with threat models in particular is that they are used across a business. So, it's a very similar approach where you could have people that are chief technology officers, but also CISOs, information officers reporting to the CEO who may not actually have any technical background, and they need to see what the threat model, what the landscape is, because they're the ones fundamentally making the decisions on what products to buy, how to mitigate against these things, or accepting the risk. And if they accept the wrong risk, they may end up in prison one day. So, I think that's a very useful tool, is having this abstracted but also very detailed approach. I think in particular, there's a program called Fragile, which is very similar to this approach actually, but focuses much more on threat modeling.

Dann Conn (10:02):
So, this guy called Christian Schneider basically wrote it. He presented it at DEFCON in 2020. And it's essentially a YAML-based system that you can map your data assets, you can map components at higher level than data assets. You can also map the communication links and then also the trust boundaries, which is the much larger element. Your C-suite board are not going to be really concerned with data assets and the communication links, but your AppSec people are definitely going to as are your security researchers. But at the higher level you then have this. And I think it sounds like it would work amazingly hand-to-hand with C4 because you have, again, that layer of extraction, but actually from a much more focused development point of view. Am I right Simon?

Simon Brown (10:46):
Yeah, that's absolutely correct. There's a really good blog post, if we have the ability to embed links in the podcast I'll send you a link to that blog post. And it's written by a guy who works here in the UK for a very well-known company. And yes, they're using a combination of STRIDE and LINDDUN compressed together because there's some overlap, of course. On top of the C4 model diagrams and in the blog post, he has an example of how he does that with his own website. But yeah, this company's doing the same thing. And you're right, it turns out that the different levels of C4 diagram allow you to see different risks. So, the top level, it's very kind of integration type risks. As you dive down as containers, it is very much more data leakage and that sort of thing. But then you've also got the deployment architecture as well. So the C4 container diagram is deployment agnostic, which means you can then have a separate version of that for every deployment environment. So, you can now do threat modeling on top of your staging and potentially target and your production environments as well if you wanted to. So, yeah.

Sal Kimmich (11:50):
Yeah. And I just want to really hone in on that point. It's both a best practice, but it is genuinely sometimes a legal concern in corporate spaces. And it still does not cease to surprise me when people are asked to do their security analyses, how absolutely manual they are doing those when there is an opportunity to verify those threats. Because as a manager, the legal threat and also the loss of just developer time from not having an articulate understanding of what the boundaries are of my architecture, where the high risks are sitting, and exactly how I will remediate those when the time comes. You can only do that if you can articulately observe and communicate your architecture onward. And I think that's also a very good point. Without these diagrams-- and again making them clear with endpoints. Unless they are to the level of reality that I know the endpoints, it's probably not a real diagram. But if I have that, it also makes sure that it's much easier for me to onboard and contextualize any new engineers, which again in itself is reducing both energy and risk within that system.

Sal Kimmich (13:07):
So, I'm really curious to see what we can look at in a couple of years of maybe getting people to accepting this as not just a best practice, but the only practice. Because we cannot continue. Our architectures are too large. They're made up of too many inter-operating systems for a human brain to be able to articulately understand whether or not that system is stable or not. And if we add on my final thought here, which is always dependencies and vulnerabilities within that system. It's a constantly decaying massive architecture that no one can keep in their head. So, putting it down in two files that we can go in and make sure we know exactly where we need to tie into makes the biggest, biggest difference. So, I guess a really good question here is-- you provide a best practice to get people on that first step to digital transformation. Anecdotally, do you see a difference between teams that take this on and perform very well, or teams that maybe take on this practice? And are there any sort of operational limitations that make it so even if you have provided them a C4 model that's validated, sometimes they don't take the next steps? I'm curious if there's a disconnect in some companies or if this developer-first information-first really makes a difference.

Simon Brown (14:28):
It's kind of a hard question for me to answer, to be honest, because most of the stuff I do with organizations is-- I kind of go in there, teach some C4, maybe we do some C4 diagrams for their systems, and then I leave. Occasionally, I do go back to the same companies, but I'm going back because other parts of the organization and other teams want to learn the same stuff that the other people are doing. And often it's a comment of, "Oh, that other team does really nice diagrams. How are they doing it? Can we get that too, please?" So yeah, it's not really a question I can answer, I'm afraid. But the anecdotal evidence I've kind of heard is it does make some improvements, especially around things like developer onboarding and just having a much better clear view of what's going on.

Sal Kimmich (15:08):
Yeah. One statistic that I'm really fixated on from the recent security supply chain report was that, on average, managers are reporting that they are free of vulnerabilities and good on remediation 3.5 times more than the developer doing that labor. So, when I think about this kind of problem, I refer to it as "technical empathy." Does everyone who engages with this architecture have enough technical empathy to be able to look at and take care of any bruises that come up in that architecture? And I think that is very hygienic across the board for the end user.

Simon Brown (15:52):
And I guess on a related note, this is where you'll see many organizations focusing on software bills of material now, because when things like the OpenSSL vulnerability pops up or Log4Shell or the Spring thing, if people don't know they're using those things, there's no action they can possibly perform to remedy them.

Sal Kimmich (16:11):
Exactly. And sometimes I'll get the argument that "automation is improving." And it is, but the reason why we still have about 30-33% of vulnerable downloads of Log4j right now is because they're in hidden parts of people's architecture that they literally just don't know are there. It's in a JAR, or in an Uber-JAR. And that would not have happened if someone had taken the time to communicate what was built when it was built. And I think that's a real shame.

Simon Brown (16:39):
Agreed. And lots of transitive dependencies. People don't realize they're using it, but it's crept in there somewhere.

Sal Kimmich (16:45):
Yeah. And you can even remove it and pull it in because you didn't know that it was still part of your ingestion pathway. And really you cannot get good at doing supply chain security until you get good at genuinely understanding how to do your source code security. And you can't do source code security if you don't know what your source code looks like. This sounds very simple, and I'll say it again, but observability is the foundation of good security, good development, and probably retention at the end of the day. I'm more likely to stick around and develop something if I know what I'm developing. And I can tell you that, because anecdotally, we've all worked in spaces where you jump in and maybe two senior engineers that were here seven years ago knew how it worked. No one else does.

Simon Brown (17:33):
Yes. Yeah. I've got two really interesting stories on that note. So, story one: Sometimes when I run my C4 workshops, instead of using my little case study, which is a financial risk system, I say, "Draw me some pictures of your software." And often during the first situation, we get these very nice-looking layered component diagrams. And all the arrows go downwards, and it looks perfect and clean. I'm like, hang on a second, this is either the best team on the planet or something's up. And normally something's up. So, what I do is I ask the developers to go get their laptops, and then we dig through the source code, and we'll try and find the things that they've drawn as boxes. So, we'll try and find the arrows, and it turns out you have to add a bunch of arrows that they forgot to draw.

Simon Brown (18:16):
And when they draw all these arrows from looking through their codebase, yeah sure there are some arrows going down, but most of the arrows go up, sidewards, and they loop around things, and the whole thing is just an utter mess. And it's a journey of discovery for these teams, because often-- and this has actually happened to me-- a bunch of developers have stared at this horrible diagram that they've now created with all these weird dependencies. And someone will say, "Oh, when we changed that thing up there, this thing breaks all the time and we never knew that it was connected." I'm like, yeah, of course it's connected, it says in your source code, but you have this simplified view of how your system works. So, that's that story one. Story two. I got invited to do a workshop for a very well-known company, again in the UK.

Simon Brown (18:55):
And they wanted to send all of their senior architects along because they thought they'd get most value for money, of course. And just to fill up the rest of the seats, they sent some of their junior developers, because they thought it'll be good experience for them. So, we do this workshop, and again, it's the same "draw some pictures of your software." And the architects-- their diagrams were literally like three boxes. And that was all they could do. But the junior developers, although they'd not been involved in the system long, but they were involved in it on a day-to-day basis-- their diagrams were much, much better. So yeah, those people who had been around for seven years, they did know how it worked seven years ago. And now, because they're out of the technology, they're out of the codebase, all they can say is, "Oh, we have a system and it does stuff." Probably with .NET, but we're not quite sure.

Kadi Grigg (19:39):
That's crazy.

Simon Brown (19:40):
Yeah, it's absolutely crazy.

Sal Kimmich (19:42):
And it's a very normal story. I mean, that's most of my career.

Simon Brown (19:48):
It's terrifying. It's really terrifying.

Sal Kimmich (19:50):
Yeah. And it's on some of our like fundamental infrastructure, we have this problem, because the turnover of the average dev is like a little less than two years right now in the U.S. market.

Kadi Grigg (19:59):
Yeah, that's exactly what I was going to say-- about two years.

Sal Kimmich (20:01):
How are you going to maintain an understanding of an always changing source code without having that external to the minds that created it? I think culturally developers are beginning to understand, we're at that stage of maturity of understanding that having that centralized understanding of observability, communication, onboarding, we have not done that well as engineers because it hasn't been prioritized. That has been something that people have considered a form of documentation. And if it's documentation, a developer is allergic to it, and I agree.

Simon Brown (20:39):
"This Is boring."

Sal Kimmich (20:39):
Yeah. "Documentation Is boring, stupid, stale. I'm not going to do any of those things. I am too smart, and I'm too busy, and I'd rather work on trying to get my thing to actually execute than talk about what it does." And I think the really major differences and what's so powerful about this is that it aligns-- it's the only documentation that I have never had a problem introducing to developers culturally, because we understand exactly why we're going to add any annotations onto the system. There's no extra labor if we've accurately engaged with the architecture. There's zero extra labor, zero extra time. It allows you to have your mean time to remediation be immediate, because you're not going to spend two weeks finding the bug. I was watching this hilarious Java developer joke YouTube yesterday, and I just laughed so hard. And he was like, "Oh, you want me to fix this bug for you? Not a problem." Turns around to his computer, he's like, "Give me two weeks to find it." That's still the situation that we're in. And that's such a waste.

Simon Brown (21:45):
I guess the pandemic made this worse, because when everybody's in the office, if you don't know where something is, you can stand up and shout, does anybody know where this thing is in the codebase? If you're stuck at home on different time zones and things, it's much, much harder. So yeah, I think the pandemic has encouraged people to consider this topic once again, which is good.

Sal Kimmich (22:04):
Yeah, I think so. Well, I think we've covered a little bit around threat modeling, a little bit around developer culture. I'm really interested, as you're looking forward in the next three to five years, what is it that you're seeing out there on the ground? Is there anything that you're seeing that surprises you or delights you about the ways that developers are changing their mindsets? And what is it that you hope to see in the next few years?

Simon Brown (22:34):
Well, something I've seen over the past few years is people are backtracking. So, over the last five years, it's all been microservices, microservices, microservices, serverless, serverless, serverless. And I've seen lots of organizations jump on these things go, "Oh, this must be amazing." And now they're going, "Oh, it's really hard and really complicated, and it's really easy to build horribly fragile systems." So yeah, backtracking is what I'm seeing, which I think is a good thing. I think we're finally starting to realize that perhaps we should take decisions a bit more consciously. It is very easy for software development teams to just jump on the new shiny fashion-based thing. But at the end of the day, we are building software for a specific reason, not because it looks nice on our CVs. And yeah, I think hopefully, we'll swing the pendulum back maybe and hopefully reduce some of the fashion-led decisions I see teams making.

Sal Kimmich (23:30):
I like that fashion-led development.

Simon Brown (23:33):
That's probably the biggest thing I've seen, apart from AI and ML and all the other stuff. Yeah.

Sal Kimmich (23:37):
Yeah. Well, I hope to see it. I've got fingers crossed, because everything runs on software, and if we don't have any minds that know what that software is, we're not going to succeed. But I do think we're seeing a transition back to more monolith, or just more simple, or-- I think what's important here is to understand that we're not moving to simplicity because it's easier, or moving to simplicity because it's more functional and more secure. We are making the decision that's going to have the longest lifespan, so that you can get more of your time back, because I think we're tired of patching microservices. And I think that's a good move, and it allows us to have more platform interoperability as well. But I think as we move into this new space, we are going to see over the next few years-- and I'll do my part in this-- we're going to see people move to having these verified architecture diagrams.

Sal Kimmich (24:37):
Really my personal next step is to see what I can do to secure most of Kubernetes and Kubernetes-related groups. And to do that, I'm teaching them how to develop these diagrams to a standard that can be interoperable with other open source packages. And I think that's my final thought on this, is that genuinely the difference between documentation and diagrams is that I can hand my diagram off to a project that I want to be working with or to a developer team that I want to be working with and know before we spend developer time what problems we might run into, or even if it's worth trying to engineer over. And I think if managers really understood that what you were doing is saving yourself months of decision time, this would be adopted already.

Simon Brown (25:28):
Yeah, yeah, certainly. I was just going to say on that pendulum swing thing, 20 years ago it was all about big design up-front, and teams would spend months and months and months trying to do the stuff you're doing. And then Agile came along and pulled lots of stuff together that happened for it, and teams forgot all of that important stuff, and they started rushing into making decisions, and in some cases literally just writing code from day one. So yeah, that's the other nice thing I'm seeing there is a pendulum swing back to teams slowing down a little bit, not going back to what we were doing 20 years ago, but slowing down a little bit to figure out, is this the right direction? Do we have a good starting point? And is it a risky solution from a security or performance or scaling perspective?

Kadi Grigg (26:12):
So, Simon, I've got one last question for you. We've talked about quite a few different topics today, as Sal just quickly recapped. But I think under the lens of what we've discussed, what do you think wicked good development looks like?

Simon Brown (26:26):
What do I think good development looks like?

Kadi Grigg (26:29):
Wicked good development. What does that look like?

Simon Brown (26:31):
Oh, wicked good development?

Kadi Grigg (26:33):
<Laugh>

Simon Brown (26:33):
Wicked good development. For me, it's having a good starting point. So, this is the stuff I teach people all the time. It's about-- make sure you understand the important things that are driving the decisions you're making. And again, that's something that seems forgotten how to do over the past couple of decades. So this is your important requirements, core attributes, constraints of the environment you're sitting in and the principles that you want to adopt as a set of developers. So, before you start writing thousands and thousands of lines of code, for example, let's agree on how we're going to do modularization or packaging, or how we're going to do structure, and how we're going to do error handling and logging, those sorts of things. So, even just some of that basic stuff teams skip over, and then you've got lots of different inconsistent approaches to solving the same problems.

Simon Brown (27:17):
So, for me, that definitely fits in silos. It's, "Let's make sure that as a team, we are actually working as a team, and we've agreed upon a set of stuff." And that doesn't mean we can't change that set of stuff, but let's change it in agreement, and let's change it for justified reasons. So yeah, I definitely factor all that stuff in. I want to see teams doing much better communication, hence all the stuff we've been talking about today, of course. And it's all the same sort of stuff we've seen before. The teams should be really good at automated infrastructure provisioning, deployment, automated testing, and all of that good stuff. And of course, this doesn't need to be just developers. This is a wider view of a software development team, of course. So yeah, I don't think there's anything too unusual in my definition there. But it's really I want to pull some of the architecture stuff in and give teams a better starting point than perhaps they might have been doing over the past decade or so.

Kadi Grigg (28:08):
Beautiful. Well, I couldn't have wrapped that any better myself. Well, Simon, Sal, Dan, thank you so much for taking the time to be here today. Until next time.

Kadi Grigg (28:20):
Thanks for joining us for another episode of Wicked Good Development, brought to you by Sonatype. Our show was produced by me, Kadi Grigg. If you value our open source and cybersecurity content, please share it with your friends and give us a review on Apple Podcasts or Spotify. Check out our transcripts on Sonatype's blog and reach out to us directly with any questions at wickedgooddev@sonatype.com. See you next time.