When we talk about metrics, there is often an assumption that everyone in the company needs the same data to make decisions and this is dangerously incorrect. Different levels of the organization need different kinds of data to make effective decisions. Yet, all too often we use the wrong data at the wrong point.

Let’s look at the organization through the lens of the flight levels model from the Flight Levels Academy.

Flight Levels

At Flight Level 1, we’re purely operational. This is where we find the teams that are actually doing the work. The kinds of metrics that we look at here are day to day operational ones. How many tickets have been opened and/or closed? How much work is in progress? How long is it taking to get work done? It’s very much focused on the flow of work through the system, regardless of the value of that work or it’s relevance to the business.

We may be looking at some business metrics like sales or signups, but largely we’re focused on operational metrics.

At Flight Level 2, we’re coordinating across all of those teams. The metrics we might look at here are aggregated data points. We might still be looking at operational metrics but we’re typically looking at a higher level. We might be tracking operational level metrics for features and epics but not for the individual work items that are passing through the team.

We’re starting to be more focused on business level metrics here. Are we spending our money effectively? Is the work that’s happening aligned with our strategic goals?

At Flight Level 3, we’re purely strategic. It should be extremely rare that we look at any operational metrics at all. Here, we’re entirely focused on business level metrics. This is the realm of OKR’s and KPI’s and strategic objectives.

The higher you are in the organization, the more you should be looking at business level metrics and the less you should be looking at operational metrics.

In order to make better decisions, leadership will ask for data and this is very reasonable. The complication is that we rarely measure the things that they should be looking at and so they base decisions on the wrong data.

Example: it’s common for leadership to ask for data from a ticketing system (ie Jira) to be “rolled up” to their level. This sounds reasonable on the surface and yet there is no data in there that is relevant for the decisions that they need to make. This is all operational level data and doesn’t tell us anything about business value or concerns that customers might have. That operational level data is extremely relevant for people working at Flight Level 1 (Operational) but irrelevant, and often misleading, for people operating at Level 3 (Strategy).

Example: DORA metrics are a hot topic today, for good reason. They provide valuable information about the dev/ops pipeline and how good we are at getting new features deployed. Since these are the new hot topic, leadership often asks to see these as well, missing the point that this is operational data and irrelevant to them.

Why do we so often fall back on operational data? Quite simply, it’s the easiest data to collect, and so we use that instead of taking the time to collect anything actually relevant. Getting good business metrics is hard and we are often unable or unwilling to make that kind of investment.

This is a flavour of the psychological phenomenon attribute substitution, where we unconsciously replace a complex problem with a simple problem and then solve for the simple.

We tend to use the metrics that are easy to measure, not the ones that give us good value. Then our decisions are flawed because we based them on misleading or irrelevant data.

We’ve talked about how operational level data is irrelevant and/or misleading for senior leaders. Are there reasons why we actively don’t want them looking at that? Yes, there are.

The moment leadership starts looking at operational data, that data ceases to be a measurement and becomes a target. This is known as Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

Now everyone will start to optimize that data to make themselves look better. They may not even be aware that they’re doing it as it often happens at an unconscious level. The result is that the data very quickly becomes even less valuable and any decisions made on it become even worse.

Measures that have become a target will quickly be gamed and will become meaningless.

Example: Code coverage metrics are extremely useful for developers to give them feedback on where improvement is needed. The moment code coverage becomes a target, people will write tests that provide no value but that satisfy coverage numbers. We see this repeatedly.

Example: When management looks at throughput or velocity metrics and compares those across teams, those numbers will quickly be gamed. It’s not uncommon to close one ticket at the end of a sprint and open a new ticket for the same piece of work in the next. The metrics look great but they’re now meaningless.

Conversely, when people at Flight Level 1 (Operational) look towards high level business objectives, they often don’t see the connection between what they’re doing and how those metrics are looking. The data is largely irrelevant for them.

The metrics you use will depend on where you are in the Flight Levels model. Data that is appropriate for one level is likely not helpful, and often detrimental to use, at a different level. Context, and relevance of data, is critical.