Goodhart’s Law in Practice: Why Metrics Must Evolve

Goodhart’s Law is often summarized as: when a measure becomes a target, it ceases to be a good measure.

This observation appears across economics, education, and public policy. The underlying mechanism is reactivity. Once a metric is used to guide behavior inside an adaptive system—especially one with incentives or penalties—the system evolves in response to the measurement itself. Over time, the relationship between the metric and the underlying goal weakens.

The practical implication is not that KPIs are flawed. It is that they are inherently temporal.

In AI-driven organizations, this dynamic becomes harder to manage. Metrics increasingly influence not just reporting, but system design, incentives, and execution velocity. To understand how Goodhart’s Law shows up in practice, it helps to examine how metrics operate inside systems, how people adapt to them over time, and why AI accelerates the gap between what we measure and what we actually want.

Metrics as living components of a system

Some metrics are intentionally static. Guardrails, baselines, and safety thresholds exist to constrain behavior, not to optimize it. Their value comes from stability.

Others—particularly those tied to performance or efficiency—function as control inputs. They shape behavior, not just observe it.

An example is a tech support organization measured primarily on case closure rates. Early on, the metric drives focus and throughput. Over time, behavior adapts: tickets are reclassified, resolution is optimized for speed rather than outcome, and complex issues are prematurely closed or deflected. The metric continues to improve, while customer experience and brand degrade, possibly forcing an expensive course correction.

The KPI improves, but its ability to represent value declines. To just call this a bad KPI is missing the point. This is not misuse. It is the expected outcome of applying pressure inside an adaptive system.

Systems fail fast. People fail quietly.

System-oriented KPIs tend to surface misalignment quickly. Latency targets, availability SLAs, or cost constraints shape architectural decisions, and their second-order effects often appear early in production. Feedback is direct.

People-oriented KPIs operate on a longer arc.

Sprint velocity is a familiar case. When used as a performance signal, teams adapt: work is decomposed differently, estimates shift, complex tasks are deferred. Delivery appears faster. But the costs—production instability, rework, erosion of trust—arrive later and are rarely traced back to the original incentive. This particular use case existed before coding agents, but has been made obvious given acceleration of sorts by AI.

When failures surface, analysis defaults to focusing on execution details. Analysis that revisits the metrics that shaped behavior is relatively rare.

Measurement Coupled with Active Judgment

Most organizations understand that both KPI-dominant decision-making and narrative-driven dismissal of KPIs are problematic. A balanced perspective that allows for structured judgment opportunities is recommended. When and how KPI judgment is done is an organization-specific art form.

One of many possibilities is to habituate judgment during failures, or during periodic reviews. Revisiting a KPI is not a loss of rigor. It is how rigor is sustained over time. And the transformative nature of AI has made a deeper understanding of KPI lifecycle urgent.

Why AI accelerates the problem

These dynamics predate AI. With AI, the split of work between humans and AI, responsibility in AI-assisted work, and human adaptation to AI systems are dominant forces that continually reshape the definition of ‘performance improvement’.

AI initiatives often rely on productivity proxies—throughput, cycle time, artifacts produced—because they are measurable and locally useful. What organizations ultimately want is cost reduction that compounds into growth, or a new growth lever. The gap between what is measured and what is desired can widen with AI.

Incentives to adopt AI can temporarily improve surface metrics while masking deeper misalignment. Systems may be more efficient, or appear to be more efficient, in the short run. Real, sustainable value creation remains unchanged. This is missed opportunity in the application of AI.

Thinking through the path of contribution of operational KPIs to strategic KPIs, being agile as AI reshapes work is key.

The Takeaway

KPIs have always been living artifacts of the systems they govern.

Goodhart’s Law reminds us that once metrics shape behavior, they must evolve alongside it. In AI-enabled organizations, the cost of forgetting this rises sharply. The responsibility is not just to design KPIs, but to continuously re-anchor them—across people, systems, and outcomes—before optimization outruns intent.