When the Performance Metric Becomes the Governance Risk

Tokenmaxxing: When the Performance Metric Becomes the Governance Risk

Alan McCay

March 30, 2026

Employees at Meta, Shopify, and OpenAI are now being measured by how many AI tokens they consume.¹ Internal leaderboards rank staff by volume. Managers reward the heaviest users and chastise those who fall behind. One OpenAI engineer burned through 210 billion tokens, roughly 33 Wikipedias. An Ericsson software engineer in Stockholm told the New York Times that his company spends more on his Claude Code usage than it pays him in salary.²

They call it tokenmaxxing.

Jensen Huang pitched the concept at his GTC keynote: engineers should receive roughly half their base salary again in tokens.³ VCs are calling inference costs the “fourth component” of engineering compensation.⁴ The message from Silicon Valley is unambiguous. Consume more. Produce more. Move faster.

I’ve spent many years building governance, risk, and compliance programmes. What I see in tokenmaxxing is a governance failure being celebrated as a productivity strategy. The risks it introduces (technical, psychological, ethical) are the kind that stay invisible until something breaks badly enough that a leaderboard can’t explain it away.

Conditioning Humans Like Models

Here is the observation I haven’t seen made elsewhere.

Large language models learn through reinforcement. The model produces output. If that output aligns with the objective function, it gets a reward signal. Over millions of iterations, the model learns to optimise for whatever triggers the reward, regardless of whether the output is accurate, ethical, or safe. The model doesn’t understand the goal. It understands the reward.

Tokenmaxxing applies the same pattern to the humans operating these systems. The employee produces output. If consumption is high, the reward follows: a positive review, peer recognition, job security. Over weeks and months, the employee learns what the model learned. Quality, governance, ethical weight: none of these are the reward signal. Volume is.

There is a name for this. Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure” [5]. Token consumption was presumably created to track AI adoption. The moment it became a performance target, it stopped measuring anything useful and starts driving behaviour instead. This has been well understood in economics and public policy since 1975. The fact that Silicon Valley is rediscovering it with AI tokens does not make it new.

There is also a deeper layer. In neuro-linguistic programming, there is a presupposition: the map is not the territory [6]. A token consumption leaderboard is a map, a compressed representation of something far more complex. It captures volume. It says nothing about judgement, accuracy, or risk. When an organisation treats the map as the territory, when the leaderboard is performance, people stop looking at the terrain and steer by the metric alone. Their perception of what matters shifts to match the map they are given.

Goodhart’s Law tells us the metric will fail. The NLP presupposition tells us what happens to the person inside the failing metric. The human in the loop becomes the human in the metric.

I keep coming back to the same thought: this is not a metaphor. It is a design pattern, and we are running it on people.

Change Control Dies First

Anyone who has worked in information security or software delivery will recognise the first casualty immediately: change control.

Peer review, approval workflows, separation of duties, the distinction between test and production. These gates exist because unchecked output introduces risk that compounds. Fast output without review is not productivity. It is debt accumulation.

Under tokenmaxxing, every one of these gates becomes a penalty. An hour waiting for a colleague’s sign-off is an hour of idle capacity. A change request in a queue is a metric not being hit. The colleague whose approval you need is busy hitting their own target. The rational response is to skip the gate, and the person who does so climbs the leaderboard while the person who follows the process falls behind.

Separation of duties doesn’t collapse because the policy was poorly written. It collapses because the incentive structure made following it costly. In ISO 27001, the NIST AI Risk Management Framework, the EU AI Act, and every independent AI audit programme I’ve worked with, this separation is a baseline requirement. Tokenmaxxing doesn’t remove it from the policy library. It removes the willingness to comply with it.

Automation Bias Under Pressure

Every AI governance framework requires human oversight. The EU AI Act’s Article 14, ForHumanity’s independent audit criteria, and the NIST AI RMF GOVERN function all mandate the same thing: not human presence, but the capacity and willingness to evaluate outputs, exercise judgement, and stop the line when needed.

Tokenmaxxing amplifies automation bias, the well-documented tendency to accept automated outputs without scrutiny under time pressure. Every serious framework includes requirements to counteract this tendency. Tokenmaxxing creates the exact conditions that make it worse.

The override, the ability to halt or redirect AI-driven output, is probably the single most important control in any AI governance framework. It exists for the moments when automated output would cause harm if left unchecked. Under consumption pressure, the employee who exercises it drops to the bottom of the ranking. “I stopped because something didn’t look right” gets met with “why weren’t you producing?”

A control that people are afraid to use has already failed.

The Accountability Gap

In AI ethics there is a concept called the moral responsibility gap: harm occurs and no one accepts accountability. The developer built what was specified. The deployer followed instructions. The operator followed the process. The model has no agency.

Tokenmaxxing builds this gap into the organisational structure at every level. The employee was hitting their metrics. The manager was driving adoption. The executive was investing in productivity. The board was positioning for competitive advantage. Everyone has a rational defence. Nobody owns the outcome.

I have seen this pattern in cybersecurity incident investigations. When the incentive structure and the governance structure point in opposite directions, the incentive structure wins. Every time.

Ethical frameworks for AI require that organisations name who is accountable for AI-driven outcomes. A person, with authority to intervene and an obligation to answer. The concept of a standing ethics committee exists for exactly this: a body that can challenge organisational practices creating ethical risk, even when those practices come from senior leadership.

Does your ethics committee have standing to challenge a token consumption leaderboard? If it does, has it? If it hasn’t, why not?

What the Auditor Will Find

Every governance framework rests on evidence requirements. Controls, oversight mechanisms, accountability structures, and the records that prove they were operating during the period under review.

Tokenmaxxing erodes the evidence layer. Documentation slows output, so it gets deprioritised. Logging becomes an afterthought. Audit trails thin. When an auditor asks for evidence that human oversight was functioning during a particular quarter, the organisation finds it has a leaderboard and a pile of AI-generated outputs, but no governance record showing that anyone reviewed them.

Two of the five recognised threats to auditor independence are self-review and intimidation. Tokenmaxxing triggers both. The person who generated AI output has a personal interest in not flagging problems with it. The person who raises governance concerns risks being labelled a low performer. That is structural intimidation, even when nobody explicitly threatens anyone.

The gap between having governance artifacts and having governance is where tokenmaxxing does its damage. The policies exist. The committees exist. The review processes are documented. But if the incentive structure has made following them a career risk, what remains is theatre.

The Questions That Need Asking

I am not arguing that organisations should stop using AI or that token consumption is inherently wasteful. The productivity gains from AI tools are real.

If you are a CISO: can you show evidence that your change control and peer review processes are still functioning when employees are incentivised to bypass them?

If you sit on a board: has anyone assessed whether your AI adoption metric is creating the structural conditions for governance failure?

If you are a CxO who approved a token budget: did the business case account for the erosion of oversight controls that consumption pressure produces?

If you are an employee on a consumption leaderboard: when an AI output you shipped causes harm downstream, what evidence do you have that you exercised professional judgement?

These are audit questions. The organisations that cannot answer them are accumulating governance debt, and that debt compounds whether anyone is tracking it or not.

Consumption is not governance. Throughput is not oversight. A leaderboard is not an accountability structure. The organisations that will come through the AI era in good shape are the ones that can demonstrate, with evidence, that their controls held while they scaled.

In reality, governance never catches up later. Either it is built into the way people work, or it is absent.

Measure what matters. Govern what you measure.

References

[1] Roose, K. (2026, March 20). Tokenmaxxing: The New AI Arms Race Inside Companies. The New York Times. https://www.nytimes.com/2026/03/20/technology/tokenmaxxing-ai-agents.html

[2] Pearl, M. (2026, March 22). Tech Employees Are Reportedly Being Evaluated by How Fast They Burn Through LLM Tokens. Gizmodo. https://gizmodo.com/tech-employees-are-reportedly-being-evaluated-by-how-fast-they-burn-through-llm-tokens-2000736627

[3] Loizos, C. (2026, March 21). Are AI tokens the new signing bonus or just a cost of doing business? TechCrunch. https://techcrunch.com/2026/03/21/are-ai-tokens-the-new-signing-bonus-or-just-a-cost-of-doing-business/

[4] Tunguz, T. (2026, February). Will I Be Paid in Tokens?. tomtunguz.com. https://tomtunguz.com/inference-as-compensation/

[5] Goodhart’s Law. Wikipedia. https://en.wikipedia.org/wiki/Goodhart%27s_law

[6] Hoag, J.D. NLP Presupposition: The Map is Not the Territory. NLP Life Skills. http://nlpls.com/articles/mapTerritory.php

Assessed Intelligence delivers vCISO and vCRAIO leadership, ARISE Framework™ implementation, and continuous assurance through the OPERATE retainer. If your organization is deploying agentic AI and needs governance that operates at the speed of your systems, speak with an advisor.