Metrics Aren’t Permanent

The past couple posts highlighted two common but nuanced issues that cause metrics to be misjudged or misinterpreted: (1) a representational gap between what is measured and what is “ground truth reality”, or what one wishes to actually measure; and (2) Goodhart’s Law, or the incentives that intentionally or unintentionally cause metrics to be “gamed” and less useful over time.

This post talks about another issue that, while more obvious, frequently gets overlooked in the daily hustle: (3) metrics must follow, and adapt to, strategic and environmental shifts over time. As a product’s strategy changes, or as the users, ecosystem, or environment of the product changes, the metrics that guide the product must also change.

There are a lot of benefits to having a stable, immutable metric definition. Time series and historical analyses are much easier if metrics don’t frequently churn. Perhaps more importantly, institutional knowledge increases and gets better encapsulated in metric and product understanding. This generally gives organizations more confidence that they have the correct strategic initiatives and are doing the right things for the product and users.

However, too rigid of an emphasis on metric stability can result in major problems, especially if the product or environment are rapidly evolving — which is almost always true in tech. One area in which tech generally recognizes this issue quite well is for Trust and Safety. As spam and scam techniques evolve, teams are quick to improve and adapt the classification and identification of attack vectors. My experience is that metrics tend to pivot quickly in this domain. (The primary metric issues tended to be due to statistical complexities around proper measurement on low prevalence rate problems.)

I think it might surprise outsiders that the organic Search product’s north star changes fairly frequently, in order to align with major yearslong strategic pivots. This is a slower cycle than for Trust and Safety, but equally vital to long-term product success. When I joined Google, the primary metric was search queries, roughly defined as requests to a search backend system. This worked quite well in the “ten blue links” desktop computer days. But then a bunch of features were launched which were unambiguously good, yet could reduce search queries, for example autocomplete suggestions in the search box dropdown. More recent advances such as multimodal or AI mode cause further divergence between a search query and an atomic unit of user value. The north star metric obviously gets updated to incorporate such changes.

It’s something that we don’t necessarily realize, but we tend to think about the definition of metrics as Platonic ideals, immutable and eternal. I think this has something to do with the fact that, when we are defining metrics, it’s typically in the language of mathematics. The chaos of reality — of the data and measurement itself — is separated out. We trust that the mathematical formulas and properties of statistical convergence will always work; any errors are due to the messiness of measurement in the real world. But at least in the business world, the real world is what’s “really real.”

Leave a Reply

Your email address will not be published. Required fields are marked *