members-only post Mar 07, 2026 10 min read

Tech write-ups

The AI data center bottlenecks map

What started as my personal notes to better understand the dynamics at play in the data center supply chain, turned into a full-blown data center bottlenecks map.

Introduction

Over the past weeks, I've done a lot of research to fully grasp key players at crucial positions in the AI data center supply chain. Why? Because it'll help to pick long term winners who are best positioned to benefit from what is arguably the largest infrastructure investment cycle in technology history.

There's an old investing adage: during a gold rush, sell shovels. In the AI gold rush, the shovels are GPUs. But the thing is: the shovels themselves now need shovels. It sounds a bit weird, but you get my point. We need to look at those who are needed to create the shovels. Because demand for them is insatiable, which trickles down throughout the supply chain.

Some mind boggling numbers

OpenAI has committed $300 billion to data center buildouts
Oracle is borrowing $100 billion over four years for Stargate
Hyperscaler CapEx is expected to top $610B in 2026

All this demand is leading to major bottlenecks in the supply chain. These constraints are what ultimately determine how fast this industry can scale. Understanding where they form, which businesses sit at critical positions, and how they evolve is, in my view, the investment framework for this cycle.

I wish I had done this research two years ago, given the number of outsized winners we've seen already, but I believe there's still a long runway ahead and numerous winners in the making. So far, the bottleneck story has already shifted multiple times:

2023 to 2024: GPU supply, NVIDIA couldn't make enough
2024 to 2025: Advanced packaging (CoWoS) and HBM memory
2025 to 2026: Power grid infrastructure and interconnects

As to the interconnects (optics), many names shot up 100% or more in the past few months. Most notable:

AXT (AXT)+176%
Applied Optoelectronics (AAOI)+158%
Lumentum (LITE) +102%

Each time a bottleneck partially resolves, the constraint just moves one layer deeper into the stack. And that's why this cycle keeps pulling more and more of the semiconductor supply chain into relevance.

The Numbers behind this supercycle

Let's first zoom out before getting into the details of the AI supply chain and look at the data around the semiconductor growth and AI infrastructure spending to judge how durable this buildout is.

Gartner: The semiconductor market is approaching $1 trillion: Global semiconductor revenue hit $793 billion in 2025 (+21% year over year), with AI processors alone contributing over $200 billion.

Deloitte projects the industry to reach $975 billion in 2026, with generative AI chips approaching roughly $500 billion, about half of global chip sales. Data center semiconductors specifically nearly doubled from $64.8 billion in 2023 to $112 billion in 2024, making data centers the second largest semiconductor market behind smartphones.

Data center spending keeps getting revised upward: Gartner's latest (February 2026) forecast puts data center systems spending at $653.4 billion for 2026, up 31.7% from $496.2 billion in 2025. Server spending alone is expected to grow 36.9% in 2026. Global IT spending overall is forecast at $6.15 trillion. Every single revision has been upward.

Hyperscalers: Gartner estimates hyperscalers accounted for over 70% of spending on AI, reaching $202 billion, double the amount spent on traditional server hardware. It is estimated that by 2028, hyperscalers will operate $1 trillion worth of AI optimized servers.

We're in a multi year infrastructure supercycle with spending accelerating

One important caveat

This party continues as long as the music plays. In other words: elevated spending by hyperscalers. When the music stops and CapEx spend starts to taper, all downstream players will be directly impacted. It's a cyclical industry by nature with booms and busts so it's important to be aware of this dynamic.

AI value chain: an airport analogy

I like to use analogies to make things more easy to understand: I think of an AI data center as a busy airport. GPUs are the planes. Data and memory are the passengers and luggage. Packaging, connectivity, and power infrastructure are the runways, gates, and taxiways. You can buy more planes. But if the runways are too short, the taxiways are congested, and the gates cannot handle the flow, the overall system still under performs.

In their Q4 2025 earnings call, Jensen (NVIDIA CEO) highlighted the importance of operational efficiency and economic return, specifically, tokens per dollar and tokens per watt. Where Compute = revenues. So the more efficient your compute is, the more revenue you can generate.

This is exactly why "inside the rack"¹ companies are a strategic asset. Modern AI data centers face data, network, and memory bottlenecks, and the connectivity, optics, memory, and physical infrastructure that sit between the GPUs determine the real throughput of the overall system.

💡

¹ When referring to ‘rack’, it simply means a tall metal cabinet that holds all the data center gear. You can picture it like a bookshelf for computers. With a data center being a library, filled with hundreds if not thousands of these bookshelves.

Takeaway

Underfeeding an AI cluster (too little memory, a congested network etc.) can turn expensive GPU's into an idle asset. In a world where infrastructure is ordered in billion dollar chunks, utilization becomes extremely important.

That's why companies solving these bottlenecks are doing so well, some examples:

High bandwidth memory (HBM) constraint: SK Hynix was perfectly positioned to benefit and both margins and revenue skyrocketed
Chip-on-Wafer-on-Substrate (CoWoS) constraint: TSMC advanced packaging revenue increased significantly
Optical transceivers scarcity: companies like Lumentum (LITE) and Applied Optoelectronics (AAOI) saw their revenue and outlook improve drastically

The AI Data Center Industry Map

The AI data center supply chain is a layered, interdependent system where each layer depends on the one below it. I've plotted all the major supply chain segments and visualized the constraint severity per segment.

How to read this map:

Top down: Demand originates from hyperscalers and flows downward. Every dollar Microsoft spends ultimately creates demand at every layer below
Bottom up: Constraints propagate upward. If ASML can't deliver enough EUV machines, then TSMC can't expand, then not enough CoWoS packaging, then not enough packaged AI chips, then deployments slow
The key takeaway: I expect businesses positioned at the major bottlenecks are likely to do very well over the years to come as long as these constraints exist

Bottleneck 1: Memory (HBM and DRAM)

Every modern AI accelerator needs memory. And lots of it.

NVIDIA's Blackwell GPUs, AMD's Instinct series, Google's TPUs: they all rely on High Bandwidth Memory (HBM). HBM is a specialized type of DRAM (Dynamic Random Access Memory) where chips are stacked vertically on top of each other and connected to the processor to deliver massive data throughput.

The problem however: HBM is sold out through 2026 across all major suppliers. SK Hynix has stated its HBM capacity has been selling out since 2023, and expects supply to remain constrained versus demand until 2027 as production ramps. DRAM contract pricing surged nearly 172% year over year as of Q3 2025 because HBM production gobbles up fab and packaging capacity that would otherwise serve conventional DRAM.

It's an "AI eats everything" dynamic: the entire memory supply chain is being reprioritized toward AI, and everyone else (PCs, smartphones, gaming) gets whatever's left over.

Who controls this bottleneck

SK Hynix (000660)
Samsung (005930)
Micron (MU)

The HBM market is projected to reach $100 billion by 2028, up from roughly $35 billion in 2025, a ~40% compound annual growth rate. Some forecasts suggest the 2028 HBM market will surpass the entire DRAM market of 2024.

Bottleneck 2: Networks and optics

Marvell: "The primary bottleneck in AI data center infrastructure has shifted from compute to connectivity."

Modern AI training runs on clusters of thousands of GPUs that must communicate constantly. Traditional copper cables are starting to hit their physical limits. They can't push enough data over the required distances, and they generate too much heat. In their Q1 2026 earnings, TSMC highlights that copper do remain the preferred interconnect for scale-up networking due to lower latency, lower power, and lower cost but optical is acknowledged as the standard for scale-out networking.

The industry is slowly starting to switch to optical interconnects, using light instead of electricity to move data between chips and racks. Copper does remain the go-to solution for inside-the-rack data transfer, with optics for data transfer outside the rack. In short: longer distances = optical, shorted distances = copper.

McKinsey expects 800G transceiver production to fall 40–60% short of demand through 2027, and next-generation 1.6T transceivers to be 30–40% short through 2029.