The 4 Evolutions of Your Observability Journey
https://lnkd.in/gfTxqiYr
When going on an observability journey, there tends to be a few concrete phases that every company goes through. Understanding how those unfold and take shape as you mature your observability practices can help you identify when you’ll run into certain types of challenges, and when you’ll start really wanting certain tools and practices to help address those challenges.
That said, when you’re communicating about this to others, you might often find that it’s difficult to explain how you know where you are in the journey, or articulate the issues you’re running into. Often, people express difficulty getting a shared understanding around this, which is where mnemonics and mental models can come in handy.
The known/unknown matrix in particular can be incredibly helpful in understanding where you are on your observability journey.
In the known/unknown matrix, we have four stages: known knowns, known unknowns, unknown knowns and unknown unknowns. Each one corresponds to a different way of approaching three very crucial tasks in operating a system: asking questions, learning about the system and explaining what you learned.
Those three tasks, in a nutshell, are what almost everything we do in platform engineering, observability, site reliability engineering (SRE) work, DevOps and more can be boiled down to. So let’s go over them and see how we can use the matrix to help understand where you’re at and share that understanding with others.
Known Knowns
Known knowns, when used to describe where you are on your observability journey, are the first stage. You know what question you’re asking and what you’re looking at. Here, you want to be able to look into the past and ask, “What happened?”
Some examples:
The website had a spike in errors. Where, how, why?
The mobile app is experiencing more crashes in the latest version. Is it only buggy in the latest version?
Our auth service is completely down, but only in one geographical area. What does that mean?
Even though this is the first stage, this is actually one of the hardest ones. Most companies and most engineers never progress past this stage, and almost every tool and vendor you’ll encounter is primarily interested in this stage. That’s because this stage is all about the ability to ask meaningful questions about the past, and it turns out that “meaningful” is a tricky concept to nail down.
If that wasn’t enough, figuring out how to get your tooling and systems to let you ask those questions is even harder. It’s no wonder most people get stuck here and can find themselves having a difficult time explaining why this is just the beginning of the journey.
After all, if so many companies can be wildly successful without achieving this, you might wonder whether it’s even necessary to start this observability journey in the first place.
Known knowns are about investigation.
Truth be told, if you can’t sufficiently...
Sr. Software Engineering Manager - Platform & DevEx @ Bell + 🍁
4moSo true!