Architecting the uncertain - Getting started with Agile Software Architecture

The idea of an Architecture Sprint or Iteration Zero is quite common, though somewhat controversial in agile circles. Although valuable, it’s often misused by teams new to agile, to keep their own waterfall-era decision processes alive. You end up with a process that’s called water-scrum-fall. But there’s much to be saved - and gained from an agile approach to upfront Software Architecture activities. In this post I want to explore this idea.

If you are anything like me, the start of new project will leave you with a lot of questions and not a lot of answers. The amount of decisions that are yet to be made and the questions you cannot answer yet can be overwhelming at this point. But first let’s see how upfront work can benefit an agile project.

Keeping upfront work agile

I am convinced that at its heart software development is a learning process. If you don’t agree, take some time to think about the beginning of your last project. At this point your knowledge is close to zero. Now consider your current situation (assuming some time has passed). Chances are, you could rebuilt the software in a fraction of the time it took you to do it in the first place. And you would probably build a better solution along the way. That’s the effect of the learning you did a long the way.

All of the following techniques will help you learn important aspects about your new project faster or give you criteria on what areas are most valuable to explore early.

Please keep in mind, that the learning capacity of your team is limited. It’s a good idea to start focusing on a few areas of high reward and not spread your focus too wide. Too much work in progress will slow progress or incur nontrivial cost when diverging views eventually need to be merged back into one coherent big picture.

I have seen teams decompose the initial workload into presumable independent packages that could be decided upon independently (also known as “Divide and Conquer”). I am not a fan of this approach. Values and approaches held by different sub-teams may differ so much, that the architecture will look more like patchwork than a unified big picture. I have seen this approach drive accidental complexity as adapters and complicated merging or syncing mechanisms were needed to get different, technically driven models to work together. The initial decomposition was not quite right.

I have also seen the divide an conquer approach compartmentalise knowledge, especially if all communication of learning is done in writing. This compartmentalisation may hinder a team from achieving true cross-functionality in the long run, as some team members will invariable become the only people who understand or work on certain parts of the system.

The alternative approach, which I think fits much more neatly to the agile mindset, is to start out architecting systems incrementally (focus on a few areas, don’t try to build it all at once) and iteratively (keep things simple, don’t try to get it right from the beginning). This is much more difficult, as it requires more collaboration and communication, but reaps high rewards as the whole team participates in learning and understanding.

Illustration of iterative+incremental by Henrik Kniberg

The following is a collection of techniques and activities that I found useful in this phase of a project.

What to do in Iteration Zero?

Understand your context

What’s your situation? New projects are always born within a context. You may be part of a start-up company, where your runway, the time before you run out of cash and the uncertainty of product-market fit will drive your project constraints. Or on the other end of the spectrum you may be a hired gun, working for a big enterprisey company where company politics, regulation and company policies or guidelines may apply to you. You might also be in a re-write project, that’s on a sharp deadline to deliver an improved version of an existing software solution, where the minimum viable product is what the old system that took five years to build did and then some. These forces will drive your architectural decisions.

One way to identify and map these forces early is to get everyone involved to draw a diagram of the system context. Focus on where your system will touch other systems and people. Do you need to integrate with a legacy systems? Not even data migration? Who are your stakeholders? How are operations and security departments involved?

I like to use a set of lightweight boxes-and-arrows diagrams made popular by Simon Brown to capture these interactions. He calls his model the C4 Model. Here’s an example of a systems context diagram that I took from Simon:

A C4 Model Context Diagram

C4 Context Diagram Key

I try to draw at least the context diagram or even a container diagram, brainstorming it together with all the team members early in a project. This will make sure we’re on the same page and also to help with the next bit: identifying risks.

Identify risks

Once you gained initial understanding of the context in which our application will be built and run, you can start to identify and rate project risks. A format I like a lot was recommended also by Simon. It’s called Risk Storming. In Risk Storming, you place post-it notes on the objects from your diagram(s) that pose a risk to your project.

You could place a post-it note on the customer with “expects working software in 3 months” written on it, to indicate the hard deadline as a risk (don’t forget about this one!). Or you could place a post-it on the old legacy system you need to integrate with that reads “this thing is a pain to integrate with”. The example from Simon’s blog post looks like this:

Risk Storming results illustrated

You then rate the risks by the estimated likelihood and magnitude of impact criteria to get a grasp on the relative importance of the risk. That’s a good starting point for shared understanding, but once you identified and rated the risks don’t just write them down. These risks should drive your initial decisions. E.g. If you know a legacy system is a pain to integrate with, but the system integration is critically important to your project success, you might want to start early with a feature that needs data from the legacy system. By integrating this system early, you get more time to get this right and minimise the associated risk by resolving the associated uncertainties.

Mike Cohn also recommended an interesting approach to track and address these risks continuously by building a Risk Burndown Chart which sounds interesting, but I have yet to try in practice.

There’s also another format called the Pre-Mortem) which is basically a thought experiment of total project failure. Looking back from the assumption that the project failed utterly, a group is asked to come up with reasons for this failure. This format may yield interesting and different insights, if you can include less technical business oriented people.

Understand the business process

Picture from an event storming workshop

If you are writing software for a living, chances are the application will support some sort of business process. Also you are probably new to this business domain and not a domain expert. This will be a challenge as all the knowledge of how the business operates will be in the head of some domain expert(s), who might be pretty busy and hard to access. Chances are your developers are disconnected from these people by an army of business analysts and won’t talk to you directly. That’s bad, because you cannot outsource understanding to someone else and paper won’t answer all your questions.

Alberto Brandolini came up with a genius way to get the people with questions and the people with answers to collaborate in a fun and engaging way and come up with a collective understanding of how a business process works. It’s also very well suited for even the most compartmentalised of companies.

He calls this method Event Storming and I’ve seen it work really well, if you get the right people to work together and maybe repeat this workshop format several times to incrementally and iteratively refine the collective understanding. The discussions this sort of collective modelling approach will spark in your group will show you areas of high complexity or even things you can simplify or areas where domain experts disagree on and need to clarify things. If this happens, you just saved yourself a lot of trouble, debating over specs!

Get testing infrastructure

Angy Tester with T-Shirt: It doesn’t work on my machine

One of the classic IT risks, that is so common, it became a meme, is the old excuse of “it works on my machine”. As developers we develop and test software in an environment that’s pretty different from production. Creating and using a testing environment, that matches production early on is therefore an important means to mitigate the risk that your initial deployment of the software will fail miserably and publically the day your software is released. In big companies environment creation usually takes a long time, which is why I try to order these as soon as I can responsibly do so.

You should also set up version control, build servers, etc. from the beginning, but keep things simple at this point. Don’t try to find the perfect project structure or delivery pipeline yet. You will not get it right from the beginning anyway. In my experience incremental and iterative investment trumps the alternatives.

Understand quality attributes

Another driver of your application architecture will be quality attributes (formerly known as non-functional requirements). Is it crucial for your application to be very efficient, usable, flexible, reliable or secure? Is it a good idea to invest in these attributes?

A way to identify and communicate crucial qualities of a system, creating a strong vision of what makes this product unique, is to create a Product Box for it. If this sounds like something you would like to try out, check out this Product Box Workshop Blog Post.

Another great way to find out is to invite your stakeholders to an ATAM Workshop. Once you identified and prioritised the quality attributes, you make trade-offs between them. Does the need for security outweigh the need for usability? Also in this meeting you get a chance to make sure your stakeholder understand they can’t just ask for world-class software on a shoestring budget and with a strict timeline. Give them a feeling for how their wishes will affect overall cost and timeline risk and remind them in an agile project, these things aren’t set in stone.

I like to work with quality attributes in the form of quality scenarios. Scenarios will force you to express the requirement in a quantifiable, and thus verifiable way. What does “the application should be as user friendly as possible” mean? A scenario on the other hand: “The average user that visits our website will grasp the way it works in under 10 Minutes without reading a manual.”. That you can test!

Make sure your quality scenarios don’t rot in some document where no one reads them. You can add these scenarios to your definition of done (if they apply to all stories) or to the acceptance criteria of some stories (if they apply to a limited number of stories) or even create new user stories from them if this fits best.

Get to know the people

It’s likely you will be working with people you never worked with before. As an architect I like to do a series of one-on-one interviews with my future collaborators to get to know these people better. Here’s some questions I like to ask:

How do you feel about this new project? Are you optimistic, pessimistic? Do you look forward to it?
What did your old project/team do, that we should definitely do as well?
What did your old project/team do, that you wouldn’t want us to repeat?
How do you learn? Do you read books, blog articles, watch conference talks or do you learn by doing?
Would you mind if we talk like this on a regular basis? Every two weeks?
What’s your favourite sweet/savoury treat? (This is important, as you might need to bribe or apologise to this person later!)

Make up your own questions, but make sure it’s just a friendly conversation, not an interview! I try not to force myself into the personal life of my co-workers, but if the conversations gets you there organically, great! Don’t try to force the conversation back to work too quickly. The point of these talks is to build rapport and not to gather all the information you might possibly need.

For more information on the value and purpose of one-on-one meetings (although through the eyes of a more traditional “manager” role) check out this blog post by Benjamin Reitzammer. And make sure you also read the follow-up post, for even more resources.

Prepare an initial product backlog

Your product owner will usually build an prioritise the initial product backlog. But you and your team can help. With the strong insight you gained into project risks and also the way the business process works, you can contribute a lot of value to the backlog creation.

The format I like to use a lot is to run a User Story Mapping Workshop as introduced by Steve Rogalsky. You will collaboratively create a big picture of Stories that will end up in valuable product increments.

But even if you don’t collaboratively come up with a product backlog, as an Architect you would be well advised to consult the PO. You can demand the priorities of certain high-risk features to be increased, even if that need to build fakes for certain parts of the system to do so. It’s worth the effort, especially when you need to rule out technical uncertainties as project risks (e.g. we’re not sure, we can run this piece of software on AWS, we’ll need to try it at some point).

Build a walking skeleton/spike

When you identified a few good user story candidates to start with, you might want to start development by implementing one (very small) user story from end-to-end. This way of staring out was also called a Walking Skeleton and was popularised by the book “Growing Object-Oriented Software Guided by Tests” also known as the “GOOS Book”.

Alistair Cockburn independently described it like this: “a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel”.

The Walking Skeleton will give you most of the (technical) necessities to start with the delivery of features, without spending too much time on the “plumbing” of the application. In other words, you get to prepare some of the technical skeleton that will be necessary to the features that are at the top of your new product backlog.

But mind that if a technical decision (e.g. should we use NoSQL databases or Relational?) is not urgent, you can increase your room for learning, by faking parts of the system until you gained enough insight to make a well-founded decision. If it is crucial you get these decisions right, but they are not urgent, you might want to look into the idea of Set Based Design, where you build all the valid alternatives to reduce overall risk of getting things wrong.

Build a learning backlog

This backlog is just for your team. Whether you entering a new domain or rebuilding an existing application with new technologies there will be lots of things your team might need to learn about:

Agile methods (Scrum, Kanban…)
Development practices (TDD, CI, Pair Programming …)
Technologies (Databases, Frontend Frameworks, Libraries …)
Infrastructure (Docker, Kubernetes, Vagrant …)
Tools (IDE, Version Control, Build Server …)

This backlog of things to learn will probably be quite long, but it will also give you some reassurance that you will need to reserve some project time for collective learning and training on the job.

Again, as learning capacity of your team is limited, I would recommend putting a limit on the amount of new things to learn at the same time. Start with the most pressing things. I once spent a few weeks learning OSGi for a new project, when we then decided not to use OSGi at all.

As an Architect this also gives you a chance to lead by example and expose your ignorance. Hopefully this will make it a bit more acceptable to say “I don’t know” or ask for clarification and explanation in your team.

It’s not big design up front. It’s structured learning.

I hope I have convinced you that the best way to use this project phase of biggest uncertainty is to learn about your domain, risks, new technologies and team members and given you a few methods to try and things to think about.

Hopefully you don’t use this phase to make decisions on things you don’t need to decide immediately, needlessly painting yourself into a corner. Agility means embracing change, and you cannot embrace change when everything is pre-determined from the start. Resist the urge to decide on everything so soon. Closing with the words of the great @GeePawHill:

i genuinely believe that nearly all the woe in software development, the whole socialtechnical enterprise, derives from the belief that we can sidestep increment & iteration, "a little better now" and "we'll change it again later".
— Michael D. Hill (@GeePawHill) May 7, 2018