Sunday 19 March 2017

On the other side of Certainty

"We have to create the preconditions under which
Exaptation can happen naturally...
which actually means introducing inefficiency into systems"

"The question is no longer how systems behave ...
but how to ask for the probability distribution
of the properties that change the system"


Moving from project- into product world last year, I wanted to experience the real long-term view, because only strategy can tackle complexity. And our software grows more complex, though arguably less complicated, every year. But I also wanted to understand uncertainty, the dark matter of complexity, better. After some time in, I understand it's probably more important for an architect* to have strong operations knowledge than very strong algorithmic skills.

Resilience is sometimes mistaken as being adaptive, or agile. It just means expecting uncertainty and disruption, but also working properly under normal conditions. Just calling everything disruptive, and reacting in an agile way to every random demand, is not resilient. For instance, "The datacenter as a computer"** came to us, rather unexpected, from a relatively "boring" infrastructure level, rather than from new frameworks and startups, but also not from consortium standards and language ecosystems. Similarly, in true grassroots manner, polyglot programming and JavaScript on top of Unix principles catapulted us into the 21st century and will eventually enable domain logic to exist, as Adrian Cockcroft calls it, in functions, unaware of most technological constraints. In order to understand resilience, you need to care about your product, and want to improve it, have a real goal, a story you want to tell. Ironically, you have to be a little bit un-adaptive, inefficient and un-agile in order not to overfit, but to really improve.

We are still waiting for the 4th paradigm of programming. My guess is it's going to be more than just goal-oriented, it will be probabilistic, in the sense of an abstract goal corridor. While engineers will live inside the goal corridor, making sure its workings are predictable, specializing in certainty, architects live on the outside, in the long tail of the probability distribution, the Multiverses where the Dragon Kings live, outside predictive models, specializing in uncertainty. That’s what makes a system resilient. It implies engineers have a fairly static/discrete/fitted view of a system, the perfect snapshot, the position, whereas architects have a time-smeared continuum perspective, the story, the momentum. It's the whole story, not only the goal, that differentiates between emergence and evolution. But it's also important to understand that both of those roles are equally creative and forward-looking, none is a "higher level", they are two sides of the same coin.

Helga Nowotny's distinction between risk and uncertainty fits nicely here - risk can be computed, uncertainty not. Risk a relation between snapshots, uncertainty is the continuum in between. A weather forecast has risk, the climate is uncertainty. In that sense, a risk-driven architecture involves everyone, but uncertainty needs a different perspective. The future leaves to the architect (the systems-of-systems-carer, archeological gardener, forensic librarian, ontological cartographer or whatever you like to call her) the role of the curator, and maybe narrator, of the uncomputable. Paraphrasing what Akka Architects say: Architects don't design system interactions, they curate the context for a discourse about system interactions.

"It is not a question of establishing limits with walls,
but by other means"

Why is all of this important, why should we get used to speak in terms of probabilities, but, most importantly, tackle certainty and uncertainty differently, but with the same importance?

In my last job I learned the importance of maintainability and traceability. In every single one of my projects the first thing I introduced was proper monitoring, alerting and analytics - Robustness was core, as was accountability, to be able to become lean and agile. With new infrastructure, whether in the cloud or not, this has become the default. The battle for certainty, traceability, and robustness is won.

While we were busy fighting this battle, uncertainty has come back, as Ms. Nowotny would put it, "systemic risk", as risk deeply embedded in the complex relationships of our services: "Uncertainty switches gestalt". A cloud service going down is a risk that can be predicted - the dashboard not showing it, because it is hosted on the same cloud service is systemic risk, the type of Dragon King uncertainty which we don't expect. Soon it will be mainstream that coders will pair with AI, and operations and product teams will regularly train models rather than manually define metrics. The relationships between components will become so complex that, more often than not, it will take a long time to even recognize errors, or feature usage, let alone find the root cause or customer need.

Despite all the data we accumulate, Observability does still not mean Explainability (sometimes as beautifully visualized as our old architecture diagrams) or Introspection, the ability to rationalize the inscrutability of AI.

Most of our architecture diagrams are nothing more than Thomassons, useless depictions of a fictional state which we only keep because it took us so long to create them. But similar to code, it's actually more productive to delete them. What we need is a visualization of the statistical complexity of actual state. In my book, a good friend of mine wrote a story how we made network traffic audible in order to hear inconsistencies - that's what we need to understand the architecture of a system. A real-time explanation of what's going on, mapped to the architecturally relevant components, such as interfaces, deployment units (like functions) and, last but not least, rules inside machine learning systems.

I am looking forward to new, real-time programming and data toolkits, think Eve, Jupyter and Glitch, especially because they enable a different kind of coder to build software. And with serverless and deep learning we will be able to scale those apps and the data required quicker than ever before. But it will require architects to understand them, if something is wrong, if a use case needs to be developed or a feature is behaving unexpectedly - i.e. in operations and product development. These architects won't be able to look at diagrams anymore, they won't be certain about documentation (not that they ever were), and they won't even be able to observe the system in its entirety. Architects will indeed become archeological gardeners or forensic librarians. Most importantly, they will become like anthropologists or biologists in the early days, trying to understand evolution, with the help of experimentation and collection. We'll finally see architects developing models instead of diagrams. And with that, we will see very different people in this role too. Which is very exciting.

*) separating this from a Tech Lead here, sometimes it can make sense to combine the two roles, though
**) e.g. serverless computing platforms such as Lambda and globally distributed databases such as Spanner, with them a different approach to time, where evergreen becomes an axiom, but also a comeback of Spreadsheets through functional programming and k/v stores

No comments: