For once we are not modeling a hypothetical. In a disclosed transaction, IREN — a bitcoin miner turned landlord — agreed to build a 200 MW (critical IT) data center, buy $5.8 billion of NVIDIA chips, and lease the whole thing to Microsoft for $9.7 billion over five years. One contract, two businesses: IREN earns a financed ~40% equity IRR being the landlord; Microsoft takes the chips and runs what the analyst calls an intelligence factory — capital in, tokens out. This issue walks the factory floor with a cost sheet, because the token rate (the price of a million units of machine thought) only means something against the cost of manufacturing one.
The framework: depreciation is the COGS of cognition. Electricity, the thing every headline worries about, is 8.1% of the factory's annual cost. The chip is everything else. A Blackwell server burns $10k of power and $40k of depreciation a year. So the token rate is not really a power price or a labor price — it is an amortization schedule with a markup. Keep that and every number below follows.
Now the revenue side, and a crucial honesty the workbook deserves credit for: it prices tokens twice. Once under theoretically perfect conditions — every FLOP utilized, which yields comic numbers ($41B of revenue from one site, 95% margins) and is labeled "not realistic" by the analyst himself — and once in the real world, where measured throughput (SemiAnalysis: ~10,000 actual tokens/sec/GPU on Llama-70B against ~70,700 theoretical) implies the fleet converts only 14.1% of its paper FLOPs into shipped tokens. The honest case still prints money: serving GPT-4o-class queries at $2.50/M in, $10/M out, the 135,743-GPU Microsoft site produces $5.8 billion of annual revenue at a 58% net margin.
Here is the part that should reorganize how you think about NVIDIA's roadmap. Hold the building constant — same 260 MW, same $100/MWh power, same Microsoft — and swap generations. The chips get fewer (135,743 Blackwells → 84,839 Rubins → 53,025 Feynmans), the power draw per box rises, and the economics go vertical:
And the landlord's side of the same contract: IREN puts up $1.36B of equity (after $2.5B of 7% chip debt and a $1.94B Microsoft prepayment), collects $1.94B a year, and even assuming the chips are worth only 19 cents on the dollar after five years, clears a ~39.5% pre-tax equity IRR — rising to ~45% when the same trade is run on Rubin. The implied rent is $1.63 per GPU-hour. Both sides of one piece of paper, and both sides win at current token prices. That is what a genuinely scarce input looks like; it is also what a top looks like, which is the entire art of this cycle.
| IREN (the landlord) | Microsoft (the factory) | |
|---|---|---|
| Puts in | $5.8B of chips; $1.36B net equity after debt + prepayment | $9.7B of lease payments over 5 years |
| Gets out | $1.94B/yr of rent + 19% chip residual | 135,743 B100s' worth of token production |
| Unit price | $1.63 per GPU-hour | $2.50–3.00/M input, $10–15/M output tokens |
| Annual economics | Margin on lease $1.43B (yrs 1–2) | $5.8B revenue on $2.1B all-in cost |
| The return | ~39.5% pre-tax equity IRR (45% on Rubin) | ~58% net margin; $3,386 revenue per $100 MWh |
| The risk held | Chip residual value; refinancing | Token-price deflation; utilization below 75% |
The 58–90% margins rest on assumptions that all lean friendly. 75% utilization is the big one — factories that run at 40% halve the answer. The pricing inputs ($10–15/M output) are list prices in a market where frontier rates have already fallen ~97% in three years; the model holds price fixed while costs fall, the most dangerous extrapolation in commodity history. The 14.1% efficiency figure is one benchmark on one open-source model. And mixture-of-experts architectures (the workbook's own Mistral column: 41B active parameters against GPT-4o's 1,700B) can produce 40x more tokens per FLOP — meaning the "same query" may soon be served from one-fortieth the silicon, by whoever chooses to start the price war.
If tokens reprice to marginal cost the way kerosene and steamboat fares did (No. 002), the factory margin compresses toward a utility return — and the $32M-per-MW capital stack is suddenly being amortized against utility revenues. The factory always makes money last; the question is whether it makes back the factory.
Margins survive deflation only if volume outruns it. Watch frontier-class output pricing on Azure/OpenAI/Anthropic rate cards against the 22.7x annual usage growth in the OpenRouter data.
The IREN lease implies $1.63/GPU-hr for Blackwells. If spot rental rates slip materially below contracted rates, every neocloud lease in the deal sheet is above market.
14.1% real-world inference efficiency is the margin's denominator. Better batching and speculative decoding raise it — bullish for factories, brutal for chip demand forecasts.
A 90% margin on synthetic thought is the closest legal thing to seigniorage — and like all seigniorage it is really a tax on trust, paid by people who would rather buy cognition than admit they need it. Nobody who calls the API wants tokens. They want what the token stands in for: the report finished, the question answered, the feeling of having thought without the hours of thinking. The factory sells relief from the oldest scarcity, and prices it like electricity because that is the only cost anyone can see.
The Wedgwood lesson from the archive applies: when the cost of a luxury collapses, the margin migrates from making the thing to knowing what to do with it. The factories will keep their spread for a while — scarcity and contracts see to that. The durable fortune goes to whoever stands between the token and the customer and convinces them the thinking was theirs all along.