čćďźâTokenmaxxingâďźäťŁĺ¸ć大ĺďźä˝ä¸şä¸ç§ĺĽćŞçć°čśĺż
Source: Pragmatic Engineer
Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from last weekâs The Pulse issue. Full subscribers received the article below seven days ago. If youâve been forwarded this email, you can subscribe here.
Inside Meta, an engineer created a âtoken leaderboardâ that ranks employees by token usage. Last week, The Information reported:
âEmployees at Meta Platforms who want to show off their AI superuser chops are competing on an internal leaderboard for status as a âSession Immortalââ or, even better, âToken Legend.â
The rankings, set up by a Meta employee on its intranet using company data, measure how many tokens â the units of data processed by AI models â employees are burning through. Dubbed âClaudeonomicsâ after the flagship product of AI startup Anthropic, the leaderboard aggregates AI usage from more than 85,000 Meta employees, listing the top 250 power users.
The practice is emblematic of Silicon Valleyâs newest form of conspicuous consumption, known as âtokenmaxxing,â which has turned token usage into a benchmark for productivity and a competitive measure of who is most AI native. Workers are maximizing their prompts, coding sessions and the number of agents working in parallel to climb internal rankings at Meta and other companies and demonstrate their value as AI automates functions such as coding.â
I spoke with a few engineers at Meta about whatâs happening, and this is what they said:
- Massive waste. Plenty of devs are running an OpenClaw-like internal agent that burns massive amounts of tokens for little to no outcome.
- Outages caused by AI overuse. A dev mentioned that some SEVs were caused by what looked like careless AI code generation; almost like a dev behind the SEV was more concerned with churning out massive amounts of code with AI than with product quality.
- Gamified leaderboard. Those at the top of the leaderboard produce throwaway, wasteful work. This is painfully clear to anyone who checks Trajectories (AI prompts), which can be viewed.
As per The Information, Meta employees used a total of 60.2 trillion AI tokens (!!) in 30 days. If this was charged at Anthropicâs API prices, it would cost $900M. Of course, Meta is likely purchasing tokens at a discount, but that could still come in at $100M+ â in large part from senseless âtokenmaxxingâ.
After backlash on social media, Meta abolished the internal leaderboard last week. One day after The Information revealed details about the incredible tokenmaxxing numbers, I confirmed that Meta has taken down its leaderboard; perhaps they realized that the incentive created enormous and unnecessary waste. If so, itâs a bit surprising that it took media coverage for the social media giant to reach that conclusion.
One engineer at Meta told me they think Meta had a different goal with the token leaderboard. A long-tenured engineer suspects increasing AI usage actually was the real goal. They said:
âPutting a leaderboard in place was always going to incentivize much more AI usage. And more AI usage means producing a lot more real-world traces. These traces can then be used to train Metaâs next-generation coding model better.
I believe this was the goal, even if no one said it out loud.
Itâs an expensive way to generate data for training, but if any company has the means to do so, itâs Meta.â
Microsoft: full-force tokenmaxxing
Similarly, Microsoft has had an internal token leaderboard like Metaâs since January, and it started pretty well, as I reported back at the time: thereâs an internal token dashboard that displays the individuals who use the most tokens in order to promote the use of tokens and experimentation with LLMs. At the Windows maker, this leaderboard is interesting:
- Very senior engineers â distinguished-level folks â are in the top 5 across the whole company, despite the fact that this group generally wrote little code in the past.
- VP-level folks make the top 10 and top 20, despite often being in meetings for most of the day and rarely writing code.
However, what starts as a metric for performance reviews or promotions can quickly become a target for devs. I talked with a software engineer at the Windows maker who admitted theyâre full-on âtokenmaxxingâ â not to get on the leaderboard, but rather because they donât want to be seen as using too few tokens:
âWe have internal dashboards and metrics tracking AI usage, token usage, percentage of code written by AI vs hand-written code.
I am conscious of not wanting to be seen as âuses too little AI,â and Iâm not ashamed to say I need to do tokenmaxxing to do this. Things I do to inflate my token usage metrics:Ask AI questions about the code already in the documentation. The AI pulls up the documentation, processes it, and gives me results 10x slower, but while burning lots of tokens. I could use âreadthedocsâ [an internal product], but then my token numbers would be lowerAsk the AI to prototype a feature that I have no intention of working on. Prompt it a few more times, then throw the whole thing awayDefault to always using the agent, even when I know I could do the work by hand much faster. Then watch it failâ
This engineer is relatively new at the company, so is concerned about job security, and is playing this game to avoid being tagged as insufficiently âAI-nativeâ by burning far more tokens than necessary.
Salesforce: burning tokens to hit âminimumâ & âidealâ targets
Elsewhere, Salesforce has created âtokenmaxxingâ incentives, as well. Talking with an engineer there, I learned that the company built two tools that effectively incentivize excessive spending on tokens:
- âMinimumâ incentives with a tracking tool. Thereâs a Mac widget that shows your own spend, updated every 15 minutes. It also displays minimum expected spend. Last week, the target was $100 on Claude Code, and $70 on Cursor.
- Showing everyoneâs spend. A web-based tool to see the token spend of any colleague. Itâs used to check where team matesâ usage is at.
- âMaximumâ spend limits that can be exceeded. Up to a week ago, there was also a maximum monthly limit of $250 for Claude Code and $170 for Cursor. However, this can be exceeded with the simple press of a button if the limit is reached. Iâve learned that last week, some engineering organisations at Salesforce had their âmaximumâ limit removed in order to âremove any friction from the development process.â
The message Salesforce sends to staff is clear: âuse a minimum of $170/month tokens or be flagged.â Who wants to get flagged for using too few tokens? The outcome is somewhat wasteful token spend:
- Burning tokens for nothing. Devs ask Claude or Cursor: âbuild me X,â where X is a project or product with nothing to do with their work, and not something theyâd ever ship. Itâs just a way to burn tokens
- Calibrating token spend to be above average. Plenty of devs browse peersâ token spend to figure out the slightly-above average point, then use the tokens needed to hit that mark
Shopify: an example on how to avoid tokenmaxxing
The first-ever token leaderboard that Iâm aware of was built by Shopify in 2025. And it worked well! Last June, the Head of Engineering at Shopify, Farhan Thawar, told me on The Pragmatic Engineer Podcast:
âWe have a leaderboard where we actively celebrate the people who use the most tokens because we want to make sure they are [celebrated] if theyâre doing great work with AI.
[And for the top people on the leaderboard,] I want to see why they spent say $1,000 a month in credits for Cursor. Maybe thatâs because theyâre building something great and they have an agent workforce underneath them!â
I asked Farhan for details on how itâs gone since. Hereâs what he told me:
âWe have since renamed the token leaderboard to usage dashboard: for obvious reasons, as we donât want to encourage âcompetingâ to make it to the top of this board. We have token spend on our internal wiki profile as well as on the usage dashboard.
We also have circuit breakers to catch ârunaway agents.â So if personal spend spikes within a day, we can cut off access immediately, and you can renew if the usage spike was deliberate, or if it was a runaway agent. The circuit breaker worked well for us: weâve not only caught runaway agents, but found bugs in our infra this way!â
Shopifyâs approach seems to have worked for a few reasons:
- The usage dashboard served as a âpushâ for devs to use AI tools, early-on. Last year, devs were mostly experimenting with AI tools because they were not as performant as today. The usage dashboard encouraged developers to try new tools, and highlighted power users.
- Circuit breakers helped. Cutting off spend when usage spikes helped catch ârunaway agents.â
- High usage is looked at. Farhan checks-in with top-spending individuals to understand the use cases. Any tokenmaxxing would likely have been spotted at this stage, which would have been a bit embarrassing for the user!
One more interesting learning Farhan shared with me: itâs more interesting to not look at âwho spent the most in overall token cost?â but instead, âwhose tokens cost the most?â Devs who generate tokens that come out as expensive have turned out to do in-depth work that was interesting to learn about!
Tokenmaxxing: great for AI vendors, bad for everyone else
I see very few rational reasons why incentivizing tokenmaxxing makes sense for any company. It results in increasing AI spend â by a lot! â in return for little to no value. Heck, in some cases it actually incentivises slower work â as shown by devs using the AI to answer questions when documentation is readily available â and encouraging âbusyworkâ where devs prompt projects that they donât even want to ship. Tokenmaxxing seems to push devs to focus on stuff that makes no difference to a business.
It feels to me that a good part of the industry is using token count numbers similarly to how the lines-of-code-produced metric was used years ago. There was a time when the number of lines written daily or monthly was an important metric in programmer productivity, until it became clear that itâs a terrible thing to focus on. A lines-of-code metric can easily be gamed by writing boilerplate or throwaway code. Also, the best developers are not necessarily those who write the most code; theyâre the ones who solve hard problems for the business quickly and reliably with â or without â code!
Similarly, the number of tokens a dev generates can easily be gamed, and if this metric is measured then devs will indeed game it. But doing so generates a massive accompanying AI bill!
â-
Read the full issue of last weekâs The Pulse, or check out this weekâs The Pulse. This weekâs issue covers:
- New trend: token spend breaks budgets â what next? In the past 2-3 months, spending on AI agents has exploded at many tech companies, and the ramifications of this are starting to dawn on engineering leaders. Weâve sourced details from 15 companies, including the different ways they are coping with this realization.
- New trend: more AI vendors canât keep up with demand. Related to massively increased spending, GitHub Copilot and Anthropic are starting to limit less-profitable individual users, so they can serve business users whose spend has easily 10xâd in the last few months. The exception is OpenAI and Codex.
- Morale at Meta hits all-time low? Business is booming but devs at Meta are furious and worried due to looming layoffs, and an invasive tracking program rolled out to all US employees.