The Metric is Not the Product

The bulk of my time as a data scientist was spent on what we called Evaluation, which entailed designing experimentation platforms, helping teams run and analyze experiments, and defining success metrics. All in service of answering the question: “how do we know that what we’re launching is actually good (for users)?” This is a seemingly simple question, but for a product as complex as Google Search, I assure you it’s nigh intractable.

As a data science tech lead / manager, my primary task on which I spent literally tens of thousands of hours essentially boiled down to navigating the duality of:

  1. Convincing stakeholders that no metric is perfect.
  2. Convincing stakeholders that some metrics are useful, even if not perfect.

Or as the famous statistician George Box put it, “All models are wrong, but some are useful.”

The rest of this post has some high level generalizations on the fact that “the metric is not the product.” In subsequent posts, I’ll go into my observations when teams have overly relied on metrics, or when executives demand too much from metrics.

One fairly concise explanation of this topic is in Shane Parrish’s The Great Mental Models, under the chapter “The Map is Not the Territory.” Excerpt: https://fs.blog/map-and-territory/

“Reality is messy and complicated, so our tendency to simplify it is understandable. However, if the aim becomes simplification rather than understanding, we start to make bad decisions. When we mistake the map for the territory, we start to think we have all the answers. We create static rules or policies that deal with the map but forget that we exist in a constantly changing world. When we close off or ignore feedback loops, we don’t see that the terrain has changed and we dramatically reduce our ability to adapt to a changing environment.”

As a data scientist working in a very data-driven organization, it’s easy to forget that data (e.g., logged interactions, user survey responses, etc.) and metrics are merely abstractions of ground truth reality. My colleagues called mismatches in this abstraction mapping “representational uncertainty: the gap between the desired meaning of some measure and its actual meaning” in the excellent blog post “Uncertainties: Statistical, Representational, Interventional”: https://www.unofficialgoogledatascience.com/2021/12/uncertainties-statistical.html

The rather circular definition of “statistic” from wikipedia is: “A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose.” With apologies for abusing these terms in a non-technical manner, a statistic (or metric) is a calculation done on data, that is a reduced representation of the data, which captures some relevant information and throws away the rest.

In this sense, it should be abundantly clear that a metric, even a well-defined one, can’t possibly encapsulate all aspects of a product. It’s relatively easier to define a “north star” product metric for some products. For example, long-term revenue is fairly uncontroversial as the Ads org’s north star, though nuances and precise assumptions abound even there.

For organic Search, we were lucky to be able to focus strictly on user value and not monetization, which was and is rare. Since before I joined, the holy grail was to find “one metric to rule them all” that captured all aspects of user value. We were never successful in this pursuit, and most of us are convinced that a “constellation” of metrics is more practical and operationally viable than a single “north star.”

In my roles as product manager and data scientist, I had to constantly remind myself and others to use metrics carefully, to not either under-rely or over-rely on them. I believe they are an incredibly useful tool for measuring progress or success, and also in getting organizational alignment across large and complex organizations. But they are not some organizational panacea, nor will they perfectly represent any aspect of ground truth reality.

Hello Again, World

(FYI, I’m intending to cross-post upcoming entries to Substack, in case it’s easier to follow there: https://seantime.substack.com/)

I recently took a break from working on Google Search, after 15 years as a data scientist and most recently as an acting product manager. It was certainly a once-in-a-generation opportunity to be able to work with so many amazing colleagues on a product that was and is so impactful in society as a whole.

Having now the privilege of some extra time, I feel drawn to put some of my thoughts in writing because nuanced, subtly careless mistakes around data, metrics, and scale were so prevalent among my exceptional colleagues, including software engineers and data scientists, many of whom were much smarter than me. These nuances go beyond the occasional lapses in what we called “statistical thinking” or “probabilistic thinking”, though ultimately many issues are derived from the further implications of such.

There are many intelligent people who have thought about these topics more deeply than I have. I can only draw from my personal experiences and hope to slightly contribute to the conversation.

My goals for the next few months are to reflect on some of the highest-level insights I think I’ve learned in my career. I hope some of these reflections may be relevant or useful for some of you.

I’ll be upfront that my intention is to explore beyond the sphere of work and career. With any luck, I’ll have the fortitude in the future to more directly probe some of the broader and far more important issues that misinterpretations of data, metrics, and scale contribute to society.

This will undoubtedly be a wandering journey with many digressions, and many posts will be rough. Content is unlikely to be published with consistent frequency, so please subscribe on Substack to get email updates.

Non-AI Pact

(FYI, I’m intending to cross-post upcoming entries to Substack, in case it’s easier to follow there: https://seantime.substack.com/)

I now have a bit of free time on my hands, and have the itch to do some personal writing again, after roughly ten years of work-induced writer’s block. I acknowledge a bit of irony around starting to write again when generative AI technologies are commoditizing both content creation and authorial voice. It’s funny that you rarely get to dictate the timing of your opportunities. 

However, after 15 years of working on some of the most scaled products in the world, it feels nice to get back to something a little more bespoke.

There’s clearly a place for generative AI in today’s content ecosystem, but there’s also a place for humans to do one of the most human things: to create and share things they can call their own. Even in a world of mass manufacturing, there’s a need for handcrafted goods.

To me, LLM technology is today both an extremely useful and nearly unavoidable tool. But like any tool, it should be used responsibly and the user must be clear what is being delegated in the tool’s usage.

This is my Non-AI Pact: nothing written by me on this channel will be created or edited by generative AI.