How patterns get drawn on silicon
Every transistor is built layer by layer, with each layer's shape projected onto the wafer through a stencil called a mask. The light that does the projecting determines how fine those shapes can be — and whether you need one exposure or four.
The three current tools
DUV (193 nm)Deep ultraviolet · the workhorse
SINCE 2002An argon-fluoride excimer laser pulses 193 nm light through a mask. Workhorse of the 28 nm → 7 nm era and still the only litho available to anyone the U.S. has sanctioned. To print smaller than the wavelength, fabs use immersion (water lens to raise NA from 0.93 to 1.35) and multi-patterning — print the same layer 2× or 4×. SMIC's whole 7/5 nm push runs on this.
EUV (13.5 nm)Extreme ultraviolet · standard NA 0.33
SINCE 2019Tin droplets are vaporized by a CO₂ laser into plasma, which radiates at 13.5 nm. That light bounces off mirrored optics — no glass works at this wavelength — onto the mask and down to the wafer. ASML is the only maker. ~$200M per machine. Cuts the 5 → 3 nm flow from dozens of DUV passes to a handful of EUV exposures.
High-NA EUV (13.5 nm)Same wavelength · NA pushed to 0.55
SINCE 2024Same 13.5 nm light, but the optics gather it at much steeper angles — anamorphic mirrors that funnel the cone tighter onto the wafer. Halves the smallest single-exposure feature. ~$380M per machine, ~150 tonnes, fills a small warehouse. Intel got the first one in late 2023; TSMC took delivery in 2024 but says it isn't cost-justified for them until ~2030. EXE:5200B is the production-grade tool, going into Intel's 14A fab now.
Why 13.5 nm light prints 8 nm features
You cannot draw a 5 nm line with a 5 nm pencil. Even when the lithography optics could deliver a 6 nm aerial image, the chemistry that records the image — long polymer molecules in the photoresist — has its own minimum size. This, not the optics, is what currently caps how small features can go.
How chips connect to other chips
If you can't make a single die bigger or denser, split it into pieces and stitch them together with very fast, very short wires. The art of how those pieces touch is now where most of the per-generation gain comes from.
The integration ladder
MonolithicOne die, one job
1DOne slab of silicon, made on a single process. Limited to ~858 mm² (the reticle limit — the largest area a litho tool can expose).
2D MCMMulti-chip module
2DTwo or more dies sit side by side on a regular organic substrate, connected by relatively coarse traces. Cheap. Slow links. Used for decades — Pentium Pro had it in 1995.
CoWoS-SSilicon interposer
2.5DLogic dies and HBM memory stacks sit on a passive sheet of silicon (the interposer), which carries thousands of tiny wires between them. NVIDIA H100, B200, MI300X — they're all this.
CoWoS-L · EMIBLocal silicon bridges
2.5DSame idea, smaller silicon. Tiny silicon "bridges" sit only where two dies need to talk — embedded in the substrate (Intel EMIB) or between the dies and substrate (TSMC CoWoS-L). Cheaper than a full interposer; almost the same bandwidth.
Foveros3D stacked logic
3DCompute dies stacked on top of a base die that handles I/O and power. Vertical wires (TSVs) connect them. Intel ships this on Meteor Lake, Lunar Lake, and now Panther Lake.
SoIC · X-Cube3D hybrid bonded
3D+No bumps. No microscopic balls of solder. Two dies are polished flat and direct-bonded copper-to-copper. Interconnect density jumps ~10×. AMD's 3D V-Cache uses TSMC SoIC; Samsung X-Cube targets the same tier.
Why this is now the bottleneck
A B200 GPU has more transistors than logically fit in a single reticle. So NVIDIA built it as two big dies stitched together with a CoWoS-L bridge that delivers 10 TB/s between them. The package, not the transistor, is what makes a 2026 AI chip what it is. Demand for CoWoS exceeded TSMC's capacity by 2x through most of 2025; that gap is what's pulled Intel's EMIB business off life support.
Cells that remember
Logic transistors only need to be on or off long enough to compute. Memory cells need to hold a state — sometimes for nanoseconds, sometimes for decades. That requirement bends the physics in different directions for each memory type, which is why we ended up with four very different chips for four very different jobs.
DRAMDynamic random access · 1 transistor + 1 capacitor
VOLATILE · FASTThe simplest memory cell anyone has ever built. The capacitor either holds charge (a 1) or doesn't (a 0). The transistor is a gate that lets you read or write that charge. Called dynamic because the capacitor leaks — charge bleeds away in milliseconds, so the chip refreshes every cell every ~64 ms whether you use it or not.
Here's why DRAM scaling is harder than logic scaling: the capacitor needs to hold enough electrons to be reliably distinguishable from noise — around 30,000 per cell. As you shrink the cell, you have to keep that capacitance in a smaller footprint, which means making the capacitor taller and narrower. Modern DRAM capacitors are skyscrapers ~1.5 µm tall on a sub-transistor footprint. They're why DRAM fabs look different from logic fabs.
DRAM stopped using clean nm numbers years ago. The naming goes 1x → 1y → 1z → 1α → 1β → 1γ → 1δ — the "10nm-class" generations, half-pitches now ~14nm. Both Samsung and SK hynix use EUV at the leading DRAM node; Micron held out longer with DUV before adopting EUV in 2024.
Famously cyclical — boom-bust every 2–3 years. The consolidation to three players happened precisely because the cycle kept bankrupting smaller ones. China's CXMT is the new entrant, ~3–4 years behind on technology but ramping fast in volume.
NAND FlashNon-volatile · floating gate · stacked vertically
NON-VOLATILE · DENSENAND solved non-volatility with a clever trick: instead of a leaky capacitor, it uses a floating gate — a sliver of conductor completely surrounded by insulator, sitting inside the transistor. Push electrons onto it by tunneling them through the thin oxide and they're trapped there for years. The presence or absence of those electrons changes the transistor's threshold voltage, which is what you read.
The big shift came in 2013 when Samsung introduced 3D V-NAND. They stopped trying to shrink cells horizontally and started stacking them vertically. Each layer is built at coarse geometry (~40 nm — easy to manufacture), and density comes from how many floors you stack. Progression: 24 layers in 2013 → 64 in 2017 → 128 in 2020 → 200+ now → 400+ on roadmaps.
Modern NAND also stores multiple bits per cell by reading how much charge is on the floating gate, quantizing into 4, 8, or 16 levels:
| TYPE | BITS / CELL | SPEED | LIFE | USE |
|---|---|---|---|---|
| SLC | 1 | fastest | longest | enterprise |
| MLC | 2 | fast | good | old SSDs |
| TLC | 3 | moderate | moderate | most current SSDs |
| QLC | 4 | slower | shorter | dense storage |
| PLC | 5 | slowest | shortest | rare, archival |
Samsung ~33% · SK hynix + Solidigm ~22% · Kioxia ~19% · Micron ~14% · Western Digital ~10% · YMTC uses a clever Xtacking architecture — periphery logic and memory array on separate wafers, then bonded together. Sanctioned but still shipping.
HBMHigh Bandwidth Memory · DRAM in a fancy stack
THE AI ERA · 2013 →Look at a B200 GPU and ask what's expensive. Roughly half the package by area, and a similar share by cost, isn't the GPU. It's eight towers of HBM next to it. A standard DDR5 stick gives you ~50 GB/s. A single HBM3E stack gives you ~1.2 TB/s. 24× the bandwidth, 1/100 the footprint, 1/3 the power per byte moved. How?
The trick is brute-force parallelism. A regular DRAM chip talks to the world through ~64 wires running fast. An HBM stack talks through ~1,024 wires running slower. Total bandwidth is pins × speed, and HBM wins on pins by a huge margin. But you can only fit 1,024 wires between two chips if they're millimeters apart. That's the whole reason CoWoS exists: to give you a silicon surface flat enough and wired finely enough to land 1,024 connections per stack.
The stack itself is 8 to 12 DRAM dies bonded vertically with TSVs (through-silicon vias — copper wires drilled through the silicon), sitting on a base die that handles I/O. SK hynix is now making custom base dies tailored to specific customers — that's how deep into vertical integration HBM has pushed.
| GEN | YEAR | BANDWIDTH / STACK | USED IN |
|---|---|---|---|
| HBM | 2013 | 128 GB/s | AMD Fiji |
| HBM2 | 2016 | 256 GB/s | NVIDIA P100 |
| HBM2E | 2019 | 460 GB/s | NVIDIA A100 |
| HBM3 | 2022 | 819 GB/s | NVIDIA H100 |
| HBM3E | 2024 | 1.2 TB/s | B200 · MI350 |
| HBM4 | 2026 | ~2 TB/s | Rubin · custom base |
SK hynix was first to qualify HBM3E with NVIDIA. Micron is the second source NVIDIA cultivated to avoid SK hynix monopoly pricing. Samsung has had qualification problems through 2024–25 and is fighting to catch up.
The math that makes HBM the real story of the AI era: a modern LLM inference workload is dominated by memory bandwidth, not compute. Reading model weights from memory once per token sets the speed limit. A B200 with 192 GB of HBM3E at 8 TB/s aggregate can serve a 100GB model at maybe 80 tokens/sec per stream. Without HBM, it'd be 5. The GPU is fast because the memory next to it is fast — not the other way around.
Emerging memoryThe perpetual five-years-away
RESEARCH · NICHEFor thirty years there's been a pitch: a single memory technology that combines DRAM speed with NAND non-volatility. Several have shipped. None have replaced anything.
| TECH | HOW | STATUS |
|---|---|---|
| MRAM | magnetic orientation | shipping as MCU embedded · still 100× DRAM cost |
| ReRAM | filament in oxide | promising for AI accel · cell does multiply-add |
| PCM | crystalline ↔ amorphous | was Intel Optane · killed 2022 · alive in research |
| FeRAM | ferroelectric flip | niche · industrial & aerospace |
The honest summary: the memory hierarchy has been remarkably stable for 25 years (registers → SRAM cache → DRAM → SSD) and probably will be for another decade. Emerging memories chip away at the edges, but the established types keep getting cheaper faster than the new ones can.
Why this matters more than it sounds
Every conversation about AI chip performance is really a conversation about memory bandwidth in disguise. NVIDIA's lead is partly compute, but mostly the system-level engineering that pulls 8 TB/s into a single package without melting it. China's AI ambitions hinge as much on getting domestic HBM working — CXMT is reportedly close on HBM2 in 2026 — as on getting SMIC to a smaller node. The packaging story and the memory story are really the same story.
Moving bits at every scale — from nanometers to kilometers
The wild thing about modern computing is that moving data has become more expensive than computing on it — both in time and in energy. That single fact reshapes everything. AI workloads in particular are bandwidth-bound at every level, so system design today is mostly about minimizing how far each byte travels.
The five tiersEach step up · ~10× more distance · ~10× more energy per bit
↓ ANIMATEDIn the 1990s, moving a bit one millimeter on-chip cost about as much energy as a logic operation. Today, a 64-bit fused-multiply-add costs ~25 pJ. Moving that result across a die: ~10 pJ. Off-die through HBM: ~200 pJ. Across a rack: ~1,000 pJ. Across a data center: ~10,000 pJ. Compute got cheaper much faster than wires did — and that's the deepest reason the industry pivoted to packaging and chiplets.
Tier 1 · On-die · the unsung hero
Inside a single die, transistors are connected by 15–20 stacked layers of copper wires called the back end of line (BEOL). Bottom layers are tiny (~30nm pitch, matching transistor scale) and run short distances. Each layer up is wider, taller, and runs farther. The very top layers carry power and clocks across the whole chip.
Two things matter about BEOL. First, it's roughly half the height of the chip — transistors live in the bottom 1µm; the wires occupy 5–10µm above. Second, it hasn't scaled like the transistors did. Wire pitch shrank ~2× over a decade while transistor density grew 10×. Increasingly the bottleneck on a chip is finding room for the wires between the transistors. This is part of why backside power matters: moving power rails to the bottom of the silicon frees up the top wiring for signals.
Tier 2 · Die-to-die · the chiplet revolution
When you split a chip into chiplets, you need them to talk almost as fast as on-die wires. UCIe (Universal Chiplet Interconnect Express) is the new industry standard — what USB or PCIe were for their tiers, an open spec so a chiplet from one vendor can plug into a package from another. Backed by Intel, AMD, Arm, TSMC, Samsung, Google, Meta, basically everyone except NVIDIA. Today: 16–32 GT/s per lane; roadmap 64+ GT/s.
UCIe is a protocol — it specifies how dies talk. The physical interconnect is whatever the package gives you. As physical density rises, the line between "two chips" and "one chip" blurs:
| PHYSICAL | BUMP PITCH | BANDWIDTH/MM EDGE | USED FOR |
|---|---|---|---|
| Organic substrate | ~100 µm | ~10 GB/s | cheap MCM |
| EMIB · CoWoS-L bridge | ~25 µm | ~80 GB/s | NVIDIA B200 |
| CoWoS-S interposer | ~10 µm | ~200 GB/s | H100 · MI300 |
| Hybrid bond · SoIC · Foveros Direct | ~3–9 µm | ~1 TB/s | AMD V-Cache |
Tier 3 · Chip-to-chip · PCIe vs NVLink
Outside the package but on the same board, the dominant standard is PCIe. Every CPU, GPU, NVMe, and NIC speaks it. Each generation doubles bandwidth: PCIe 3 (8 GT/s, 2010) → 4 → 5 (32 GT/s, current servers) → 6 (64 GT/s, shipping 2025) → 7 (128 GT/s, 2027+).
For GPU-to-GPU specifically, PCIe is too slow. NVIDIA built NVLink as a parallel proprietary fabric just for this. Watch the difference:
NVLink is what lets 8 GPUs in a server act as if they were one giant GPU with 8× the memory. Without it, splitting a trillion-parameter model across multiple GPUs would be impractically slow. Above NVLink sits NVSwitch, NVIDIA's switch chip that lets every GPU in a rack talk to every other GPU at full NVLink speed simultaneously.
The competing standards: AMD Infinity Fabric (one generation behind), UALink (open standard launched 2024 by AMD/Intel/Broadcom/Cisco/Google/Meta/Microsoft — the "anyone but NVIDIA" alliance, first products 2026), and scale-up Ethernet with Broadcom Tomahawk. Whether UALink can build a credible alternative ecosystem before NVLink lock-in becomes total is the strategic question.
Tier 4 + 5 · Rack and data-center scale
Once you've packed 8 or 72 GPUs into a rack with NVLink, you connect racks to each other with traditional networking. Two protocols dominate: InfiniBand (lower latency, NVIDIA-owned via Mellanox) and Ethernet (universal, cheaper, hyperscaler-preferred to avoid lock-in). Current speed grade: 400 Gb/s, with 800 Gb/s rolling out 2025–26 and 1.6 Tb/s on roadmaps for 2027+.
The Ultra Ethernet Consortium (AMD, Broadcom, Cisco, HP, Intel, Meta, Microsoft, Oracle) is taking Ethernet and making it as good as InfiniBand for AI. UEC 1.0 spec landed 2025; products 2026. This is the layer where Broadcom quietly makes a fortune — their Tomahawk and Jericho switch chips run most of the world's AI Ethernet fabric. Tomahawk 6 (2025) hits 102.4 Tb/s per chip.
Tier 5 · The optical transition
Beyond a few meters, copper runs out of steam. The signal degrades, the cable thickens, the SerDes burns too much power. So you switch to light, with optical transceivers at each end. Today these are pluggable modules; the next generation embeds them inside the chip package itself.
The frontier beyond CPO: on-package optics for chip-to-chip — using light to connect GPUs to each other or to memory directly, replacing copper of NVLink. Three startups to watch: Ayar Labs (silicon photonics chiplets, NVIDIA-backed), Lightmatter (passive photonic interposer "Passage"), Celestial AI (Photonic Fabric). None has shipped at hyperscale yet. If one does, it could be a bigger architectural shift than the move to chiplets.
NVIDIA GB200 NVL72Five tiers in one rack — animated
Take the current state-of-the-art AI rack and watch every interconnect tier light up at once: HBM bandwidth feeding the GPU, NVLink between GPU pairs, NVSwitch fabric across 72 GPUs, and InfiniBand fanning out to other racks.
| TIER | WHAT | BANDWIDTH |
|---|---|---|
| Within B200 package | HBM3E ↔ GPU die | 8 TB/s per package |
| Within Grace-Blackwell | NVLink C2C | 900 GB/s per pair |
| Across 72 GPUs | NVLink + NVSwitch | 130 TB/s aggregate |
| Between racks | InfiniBand · 800G | 100 GB/s per port |
| Across data center | Optical · switched | varies · highest latency |
Every layer here is the bleeding edge of its tier, and every layer is expensive. By some estimates ~30–40% of the cost of a modern training cluster is networking, not compute. The economics of training a frontier model are actually much more about the network than about the GPUs themselves.
The takeaway
The shape of modern computing is determined less by transistors than by interconnects. AI workloads are bandwidth-bound at every level — within the chip, across the package, across the rack, across the data center. The packaging story, the memory story, and the interconnect story are really one story: how to move data short distances at insane speeds, because moving it long distances is what costs.
Strategic chokepoints to watch: NVLink lock-in vs UALink/UEC ecosystem; the CPO and silicon photonics transition (TSMC, Intel, and the optics startups all circling); and Broadcom — quietly the second-most-important AI silicon company after NVIDIA, mostly via switches and custom ASICs.
The wall hit in 2005 — and how chips are still responding
Most architectural choices in the modern chip industry are responses to a single physical wall. From 1965 to 2005, shrinking transistors gave you smaller, faster, and lower power per operation. Then voltage scaling stopped. Every design decision since — chiplets, heterogeneous cores, backside power, liquid cooling — is a response to that one event.
Dennard scaling and its deathThe free lunch that ended in 2005
↓ ANIMATEDRobert Dennard at IBM showed in 1974 that if you shrink every dimension of a transistor by a factor of k, you can drop the operating voltage by k too — and power per area stays constant even though you've packed k² more transistors in. From the Intel 4004 (740 kHz, 1971) to Pentium 4 (3.8 GHz, 2005), clock speeds rose nearly 5,000×. Free lunch.
Then around 90nm, voltage scaling stopped. The reason is gate-oxide leakage: as you make the insulator under the gate thinner, electrons start tunneling through it even when the transistor is supposed to be off. Below a certain thickness, "off" current becomes a sizable fraction of total power. Voltage stuck around 1V and has barely moved since. Every node packs more transistors at roughly the same voltage — so power density goes up every generation. A modern leading-edge die runs at ~1W per mm² in active areas — the heat flux of a kitchen stove, just smaller and concentrated.
Where the power actually goesDynamic switching vs static leakage
PHYSICSPower on a chip splits into two pieces. Dynamic power is energy spent flipping bits — every transistor switch costs ½CV²f joules. Half of all chip design optimization is reducing one of those three knobs. Static power is leakage — current that trickles through a transistor even when it's "off." Twenty years ago this was a rounding error; on a 3nm chip it's roughly a third of total power.
Dynamic power scales linearly with frequency, but you need more voltage to clock faster, and dynamic power scales with V². Push frequency 30% higher and you might pay 60–100% more power. This is why modern chips don't really run faster generation-over-generation; they spread work across more cores at lower clocks instead. It's also why Apple, AMD, and now Intel ship heterogeneous CPUs: efficiency cores at low voltage for background work, performance cores that wake up only when needed.
Delivering 1,700 amps to a postage stampThe voltage step-down chain
↓ ANIMATEDA modern AI chip operates at ~0.7V — lower than a flashlight battery — but draws ~1,200 watts. By Ohm's law that's ~1,700 amps of current flowing into a piece of silicon roughly the size of a postage stamp. A household circuit breaker trips at 15 amps. A car starter pulls maybe 250 amps for a moment. A B200 GPU pulls 1,700 amps continuously.
Getting that current into the chip is its own engineering problem. The current arrives at the package via thick copper traces, gets distributed across the substrate, then climbs up through the chip's wiring stack to reach transistors at the bottom. Every step has resistance. Every step drops voltage. By the time current reaches a transistor, the voltage might have sagged from 0.75V to 0.72V — a 4% droop that meaningfully affects switching speed.
This is why backside power delivery matters more than its understated marketing suggests. Today, power and signals fight for the same wiring layers from the top side. With BSPD (Intel PowerVia, TSMC at A16), you flip the wafer and put power rails on the bottom of the silicon, separate from signal routing. The signal layers on top get more room. Power gets to transistors more directly. Performance bumps of ~5–8% are typical.
And then you have to remove all that heatThe cooling escalation · 250W → 23,000W per chip
↓ ANIMATEDA B200 GPU dissipates ~1,200W. A Grace-Blackwell superchip dissipates ~2,700W. An NVL72 rack with 72 GPUs dissipates ~120 kW. That's about 60 home microwave ovens running continuously, in one rack-shaped box. Most existing data centers were designed for ~10–15 kW per rack. AI broke that.
The transition matters because cooling capability now dictates compute capability. You can buy more GPUs than you can cool. Operators with liquid-cooled facilities can run dense racks; operators stuck on air can't. Microsoft, Meta, Google, Amazon are all racing to retrofit. There's also a water angle — a hyperscaler-scale data center can consume millions of gallons per day for evaporative cooling, which has become a serious community/regulatory issue in places like Arizona and Spain.
The 3D thermal trapWhy stacking compute is so hard
UNSOLVEDHeat doesn't spread evenly. A GPU running a kernel might have 90°C in active SM tiles and 60°C in idle areas. When you stack two compute dies vertically, the bottom one becomes a thermal prison — heat from both has to escape through the same path.
This is why hybrid bonding for compute dies (vs cache) is so hard. AMD's 3D V-Cache works because the cache layer doesn't draw much power — bottom die stays cool. CFET stacks transistors at the device level — even worse. Cooling solutions for 3D logic don't exist yet at scale. The whole 3D logic roadmap depends on solving this, with microfluidic cooling (etching channels into the silicon itself) as the leading candidate.
Re-reading the field guide through the thermal lens
Every architectural choice is a thermal choice in disguiseWhat you've already learned, re-read
The end of Dennard scaling means every architectural choice from now on is a thermal choice in disguise. Look back at what you've learned and you'll see it everywhere:
| WHAT | STATED REASON | REAL THERMAL REASON |
|---|---|---|
| Heterogeneous cores | efficiency | can't run all cores at peak voltage |
| Chiplets | yield · cost | spreads heat over larger area |
| Backside power | routing density | reduces power loss in wires |
| Optical interconnects | bandwidth | ~10× lower energy per bit moved |
| HBM next to GPU | bandwidth | shorter wires = less switching energy |
| CFET difficulty | manufacturing | thermal trap for stacked logic |
| NVLink lock-in | performance | only NVIDIA's system handles the heat |
| Liquid-cooled racks | density | air can't keep up |
The whole roadmap reads like a long, increasingly desperate response to a wall hit twenty years ago. You can think of the modern chip industry as having three major resource constraints: manufacturing (lithography, packaging), memory bandwidth (HBM, interconnect), and thermal envelope. The first two get most of the press; the third is the binding one for many operators today. You can have all the H100s in the world — if you can't cool them, they sit in boxes.
The chips that compute thought — and how to read their spec sheets
The phrase "AI chip" gets thrown around as if it names one thing. It doesn't. There are at least four distinct categories, and within each, the architectural choices map directly onto everything you've already learned: lithography determines core count, memory bandwidth determines inference speed, interconnect determines maximum cluster size, thermal envelope determines what you can actually run.
The four kinds of AI chip
Training
~80% NVIDIAUsed to train large models from scratch. Need huge memory, huge interconnect bandwidth, weeks of uptime. Compute-bound; tolerates higher precision (FP16/BF16). Cluster scale is the differentiator.
Inference
~65% NVIDIARuns trained models. Memory-bound; latency matters; tolerates very low precision (FP8/FP4/INT4). Workloads more diverse, CUDA lock-in weaker. Where startup wedges exist.
Edge AI
fragmentedOn-device — phones, cars, cameras, robots. Single-digit watts, low latency, often single-batch. Specialized accelerator blocks inside larger SoCs.
Specialized
nicheDrug discovery, weather, physics simulation. Often training accelerators repurposed; sometimes domain-specific (quantum, photonic, neuromorphic).
The precision economy
Each halving of bits roughly doubles everythingFP32 → FP16 → FP8 → FP4 → INT4
↓ ANIMATEDOne of the cleverest knobs in modern AI hardware is what kind of number you compute with. A 32-bit float is precise but slow; a 4-bit number is imprecise but eight times denser and faster. The realization driving the last decade: neural networks don't need much precision — they're noisy by nature, and small numerical errors get averaged out across billions of operations.
A B200's headline number is 20 PFLOPS at FP4. The same chip in FP16 is ~1.25 PFLOPS — sixteen times less. Both numbers are real; they just describe different workloads. When you read a benchmark, the format matters as much as the number. The B200 vs H100 jump is partly silicon improvement, partly just dropping from FP8 to FP4. Models are trained at higher precision and then quantized down for serving — GPTQ, AWQ, SmoothQuant are the tools that make this work.
CUDA · the actual moat
NVIDIA's hardware lead is real but maybe 6–12 months. The software lead is ~5 years and growing. CUDA is a stack — every layer optimized over 18 years, every layer assumed by every paper, model, tutorial, and open-source library. To displace it, AMD needs not just competitive hardware but competitive every layer of the stack, plus migration tooling, plus enough early customers for ecosystem effects.
The one place ROCm is winning is at the absolute top end. AMD's MI300X has more memory than H100 (192GB vs 80GB), so for the very largest models — 405B-parameter Llama, GPT-class — it can be the right hardware regardless of software friction. This is the wedge AMD is exploiting. For everyone else, CUDA still wins by default.
Training vs inference · more different than they look
Training
~30% of AI silicon $Process whole batches. Compute-bound. Long-running (weeks). Communication-heavy: gradients averaged across thousands of GPUs. Needs FP16/BF16 minimum, often FP32 for accumulation. Huge clusters, max NVLink, max HBM.
Inference
~70% of AI silicon $ (and growing)One or few queries at a time. Memory-bound: weights stream in for every token. Low latency: tokens out in milliseconds. Mostly local: a single chip or few. Tolerates FP8/FP4/INT4. Where startup wedges exist.
This divergence is why specialized inference chips can beat general training GPUs. Groq hits 500 tokens/sec because their architecture is only good at inference — they can't train. NVIDIA's H100 is good at both but optimal for neither. As inference workloads grow (and they're growing much faster than training right now), the inference-specialist niche grows with them.
Taalas — when the model becomes the chip17,000 tokens/sec on Llama 3.1 8B · because the weights are literally the silicon
Every chip we've discussed shares an assumption: model weights live in memory, and you spend most of your power moving them to compute units. Taalas threw the assumption away. Their HC1 chip, taped out at TSMC 6nm and unveiled February 2026, hardcodes the entire weights of Llama 3.1 8B directly into the silicon — etched into ROM at the mask layer. The model is the chip.
The architecture is what they call Mask ROM Recall Fabric — every weight is etched into ROM cells distributed across the chip, paired with SRAM cells that handle the per-prompt state (the KV cache, fine-tuned LoRA adapters). Computation happens inside the memory, not in a separate compute block. This is "compute-in-memory" taken to its logical extreme. There is no off-chip memory bus for weights, because there's nothing to fetch — the weights are already where the multiplications happen.
| METRIC | B200 GPU | TAALAS HC1 | RATIO |
|---|---|---|---|
| Tokens/sec/user | ~50 | ~17,000 | 340× |
| Hardware cost | baseline | ~1/20 | 20× |
| Power consumption | baseline | ~1/10 | 10× |
| Models supported | any | one (Llama 3.1 8B) | — |
| Time to retape for new model | n/a | ~60 days | — |
The catch: the chip can only run the specific model it was etched for. To support a new model — or a meaningfully fine-tuned version of the same one — Taalas has to tape out a new chip. They claim only two mask layers change per model, which keeps the redesign cost down, but it's still 60 days from "weights ready" to "boxes shipping." This is a fundamental conflict with the rapid iteration cycle of frontier AI models, and the reason this approach was considered impractical until very recently.
It also has architectural limits. Long context bottlenecks the SRAM-resident KV cache — the input sequence still has to be processed through softmax-attention layers whose intermediate state grows with sequence length. Taalas works brilliantly for short-context, single-user inference (chatbots, voice assistants, edge AI) and less well for code generation or document analysis where context lengths run into hundreds of thousands of tokens.
The bet they're making: not every model needs to change. Llama 3.1 8B-class workloads are a stable target. Customer support, voice assistants, simple agents, on-device language tasks — these ship trained models that get used for years. If you're running one of those at scale, paying NVIDIA's margin for general-purpose compute is the wrong tradeoff. A chip that can only run your model but does it 340× faster at 1/20 the cost is the right one.
The deeper meaning is what Taalas points at as a category. If model-specific silicon becomes economical, the chip industry restructures. A new layer emerges — call it an "AI foundry" — that takes weights as input and ships hardware as output. Hyperscalers stop building city-sized data centers because ordinary 12–15 kW racks are enough. Edge devices run frontier models locally with no cloud dependency. NVIDIA's $3T market cap rests on the assumption that flexibility is what people will pay for. Taalas is the first sharp argument that, for the workloads that dominate inference dollars, flexibility is the wrong product.
Whether this becomes a category-defining shift or a fascinating niche depends on three things: how fast LLM weights stabilize as a "release version" (like CPU instruction sets did), whether the 60-day tape-out cycle compresses, and whether the workloads where Taalas excels (short-context, high-throughput, single-purpose) really are most of inference revenue. The next 18 months will tell.
The competitive landscape
NVIDIA
~80% training · ~65% inferenceThe default. CUDA moat. H100/B200/Rubin roadmap. NVLink ecosystem. Owns InfiniBand via Mellanox. Trillion-dollar valuation rests on this lead persisting another generation.
AMD
distant #2MI300X / MI350X. More HBM than H100 (192GB), real wedge for largest-model deployments. ROCm catching up but still trails CUDA. Top customers: Microsoft, Meta.
Google TPU
internal scaleMost mature non-NVIDIA AI chip. Used to train Gemini. v7 Ironwood shipping. Designed by Google + Broadcom. Mostly internal, slowly opening to GCP customers.
Hyperscaler ASICs
growing fastAWS Trainium 3 · Meta MTIA v3 · Microsoft Maia 2 · all designed in partnership with Broadcom or Marvell. Self-supplying ~20% of internal compute now, projected ~40% by 2027.
Cerebras
wafer-scaleWhole wafer = one chip. WSE-3 has 900,000 cores, 23 kW. Surprising strength in inference (weights stay entirely on-chip). Strong in training niches; harder to scale economically.
Groq
deterministicLPU = Language Processing Unit. Pure SRAM, deterministic latency, no caches. ~500 tok/sec on Llama models when GPUs do 50. Inference-only.
Tenstorrent
Jim KellerRISC-V + programmable matrix units. Open-source software stack. Long bet on commodity-ifying AI silicon. Targeting both training and inference; much smaller than the leaders.
Taalas
Toronto · NEWHC1 chip — model literally etched into silicon. 17,000 tok/sec on Llama 3.1 8B. Raised $169M. The most architecturally radical bet currently in production silicon.
Etched · SambaNova · others
specialistsEtched: transformer-only ASIC. SambaNova: reconfigurable dataflow. Various other niche plays — neuromorphic, photonic, in-memory analog. Mostly speculative; one or two might win big.
Reading an AI chip spec sheet · the framework
You now have the framework to read any modern AI chip announcement. When a spec sheet says "5 PFLOPS FP8, 192 GB HBM3E, 8 TB/s memory bandwidth, 1.8 TB/s NVLink, 1,000W", here's what each number actually means:
| SPEC | WHAT IT TELLS YOU |
|---|---|
| 5 PFLOPS FP8 | ~2.5 PFLOPS FP16 · always check the precision |
| 192 GB HBM3E | holds a 384B model in FP4, or 96B in FP8 · memory caps model size |
| 8 TB/s memory bandwidth | can stream a 192GB model ~42×/sec · token rate is bounded by this |
| 1.8 TB/s NVLink | can sync gradients with other GPUs · determines max useful cluster size |
| 1,000 W | needs liquid cooling · existing air-cooled DCs can't take it without retrofit |
The interplay between these numbers IS the chip. Great FLOPS but weak memory bandwidth = bad at inference. Great memory bandwidth but weak interconnect = bad at large-model training. Reading the balance tells you what the chip is for.
Bringing it all together
Every modern AI cluster is the layered output of everything in this guideWhat you're actually buying when you buy AI compute
A training run for a frontier LLM uses every layer covered in this field guide:
| LAYER | WHAT'S USED |
|---|---|
| Lithography | TSMC N4P or N3 today, N2 next year |
| Architecture | FinFET today; GAAFET in 2 years |
| Packaging | CoWoS-L bonding compute dies and HBM |
| Memory | 8–12 stacks of HBM3E per chip · ~$5K each |
| Interconnect | NVLink 5 within nodes · 800G InfiniBand between racks |
| Power | 1.5–2 MW per rack · 48V vertical · integrated VRMs incoming |
| Cooling | Liquid cold plate or full immersion |
| Software | PyTorch → CUDA → SMs → Tensor Cores → HBM |
Every layer is contested, every layer is expensive, every layer matters. When you read that "Microsoft committed $X billion to data center buildout" or "Anthropic ordered Y trillion tokens of compute," what's actually being bought is access to this stack — depreciated over 4–5 years, sold by the hour.
The strategic chokepoints to watch: NVIDIA's CUDA moat erosion (when does ROCm or PyTorch-on-TPU become genuinely competitive?); custom ASIC scale (Google TPU and Trainium getting cheap enough to self-supply 30–50% of internal compute by 2027); inference-only specialists like Groq, Cerebras, and Taalas breaking out from niche to mainstream; and the precision war (FP4 today, FP2 maybe? Analog in-memory? Lower precision is the deepest cost lever).
If you've made it through every section of this guide, you can now read any chip news in 2026 with full context. The transistors, the litho, the package, the memory, the interconnect, the power, the AI stack — they're not separate stories. They're one story about moving electrons through silicon faster than physics seems to allow.
Where chips are made — and who controls the making
Every chip that runs the modern world is the output of a supply chain so concentrated that a single coastline could halt global manufacturing for years. The physics, the architectures, the packaging, the memory, the interconnect, the power, the AI stack — every layer is contested. But none of those technical details determine the shape of the industry. The shape is determined by where things are made, and who controls the making.
Taiwan · the silicon shield
Where leading-edge chips actually come from~90% of the world's most advanced silicon · one island · mostly one company
CONCENTRATIONThe phrase "leading-edge" matters. China makes huge volumes of automotive chips at 28nm; the US has Intel and GlobalFoundries running older nodes; Korea has Samsung. But for the actual frontier — chips that train Llama, run B200 GPUs, power iPhones — there is essentially one place. And within that place, essentially one company.
This is the result of three decades of compounding advantages. TSMC pioneered the pure-play foundry model — they only make chips for other companies, never compete with their customers. That gave them economies of scale beyond what any IDM could match. Their Hsinchu fab cluster has thirty years of accumulated process knowledge: cleanroom layouts, thin-film recipes, trained workforce, supplier relationships. None of this is replicable in the timeframe of a five-year geopolitical crisis.
The phrase Silicon Shield describes the strategic reality: Taiwan's chip dominance is, paradoxically, what protects it. A Chinese invasion or blockade would catastrophically disrupt global semiconductor supply, recessing the world economy and triggering a response from the US, Japan, and Europe so severe that even Beijing's hawks pause. The shield works only as long as Taiwan remains the unique source. Once it isn't, the shield is gone. Almost everything in chip geopolitics flows from this calculation.
The chokepoint web · no country can do it alone
Every leading-edge chip touches at least 6 countriesAnd no country has all the pieces
↓ ANIMATEDNo country has a complete supply chain. The US has design tools and a few fabs but no leading-edge chemicals or lithography. China has the chemicals capacity and lots of mature-node fabs but no leading-edge lithography or design tools. Europe has ASML but no fabs at scale and weak design. Japan has the materials and tools but exited fabs years ago. Taiwan has the fabs and packaging but everything else flows in.
This web is a feature, not a bug — at least it was. After WWII the chip industry naturally distributed because comparative advantage and free trade drove specialization. The result is a system that works perfectly when everyone cooperates and breaks catastrophically when they don't. Every export control, every tariff, every supply chain "decoupling" is a force trying to convert what was a feature into a fault line.
The exploding cost of a fab
Why only 3 companies can play at the leading edge$200M in 1990 → $40B in 2026 · 200× growth in 35 years
↓ ANIMATEDEUV lithography systems alone are ~$200M each, and a leading-edge fab needs 10–20 of them. The cleanroom standards for 2nm are vastly stricter than for 28nm — class-1 air, vibration isolation requiring foundations decoupled from the surrounding earth. The mask shop alone might cost $1B. Tens of billions before you produce a single wafer.
The economic consequence is brutal: only TSMC, Samsung, and Intel can afford to play at the leading edge. Everyone else either licenses, partners with, or gives up. Even SMIC, with massive Chinese state backing, isn't realistically catching up — they're trying to do it without EUV and getting 20–40% yields. There is no fourth competitor on the horizon.
This is also why the TSMC Arizona fab matters and why it's been so hard. Construction costs are 4–5× Taiwan's. Workforce shortages, regulatory friction, supply chain mismatches — all the reasons the chip industry concentrated in Taiwan in the first place are reasons it can't easily be unwound. The fab opening was originally targeted for 2024–2025; it's now 2028.
The chip war timeline · 2018 → 2026
Eight years of escalating restrictionsAnd then a sudden 2026 reversal
TIMELINEThe most consequential event was October 7, 2022 — the Biden BIS rules. Leading-edge chips and chipmaking equipment got blocked at the border. ASML couldn't sell EUV machines to China; even some DUV systems were restricted. NVIDIA had to design H800, A800, then H20 — China-specific cut-down GPUs that kept FLOPS below thresholds. China responded with the Big Fund III, accelerated SMIC investments, and a focused push on Huawei's Ascend chips.
Then came the 2025-26 reversal. The second Trump administration shifted the playbook. Tariffs replaced subsidies as the preferred mechanism. A 25% tariff on advanced AI chips like the H200. The US government took a ~10% stake in Intel — striking departure from American policy norms. NVIDIA and AMD now pay 15% of their China chip revenue to the US Treasury. Most consequential: the BIS rule change that moved the H200 from "presumption of denial" to "case-by-case review" for export to China — partial reopening of a market the previous administration had tried to close.
Internal critics, including former first-term Trump officials, argue this hands China a multi-year head start. Defenders argue tariffs and revenue capture do the same job as bans without forfeiting US share. Either way, the strategic posture has shifted from denial to transactional.
The CHIPS Acts arms race
Every major economy is pouring money in~$500B+ committed publicly · much more in state-directed flows
SUBSIDIESThe early evidence is mixed. The US CHIPS Act funded TSMC's Arizona expansion, Intel's Ohio fab, Samsung Texas, and various Micron projects. Construction is happening — slower and more expensive than promised. EU progress is even slower; Intel's planned Magdeburg fab in Germany was paused. The 20% global share goal looks unreachable. China's mature-node capacity is growing fast (worrying Western producers about 28nm oversupply), but it's not closing the leading-edge gap in any near-term timeframe.
The honest read: every major economy is hedging. Each country wants to make sure it isn't left holding nothing if the global supply chain fractures. The total spending is duplicative — the world doesn't actually need five TSMCs — but no one wants to be the country without a backup.
The Taiwan scenarios · the elephant
Four futures for TaiwanAnd what each means for global compute
| SCENARIO | WHAT HAPPENS | EST. ECONOMIC COST | LIKELIHOOD |
|---|---|---|---|
| Status quo | China keeps building military but doesn't act. Production gradually de-risks via Arizona, Japan, Germany. | baseline | most likely |
| Blockade | China interferes with shipping. TSMC fabs operate but exports constrained. Coalition response. | $1–3T globally | non-trivial |
| Invasion | Even without direct strikes, fabs inoperable 6–24mo due to chemical / tool supply cutoff. | $5–10T+ | low but planned for |
| Diplomatic resolution | Some new accommodation. Economic ties; Taiwan retains autonomy. TSMC stays. | positive | officially preferred |
The real-world planning behind this is dense. TSMC has reportedly preset rapid-shutdown procedures for its fabs in case of sudden conflict — a fab made temporarily inoperable is less valuable as a captured asset. The US has weighed (and per some reports, planned for) options ranging from sanctions to evacuation of key personnel to direct intervention. Japan has its own military planning given proximity. The diplomatic dance over the Taiwan question is the most consequential geopolitical issue of the 2020s, and chips are the reason.
What 2026 actually looks like
The strategic picture · early 2026How the layers stack up after eight years of contest
| ACTOR | POSITION · 2026 |
|---|---|
| Taiwan / TSMC | Indispensable · 90%+ leading-edge AI · 2nm shipping · A16 backside power H2 |
| USA | Partial nationalization · Intel stake · NVIDIA/AMD revenue share · tariff regime |
| China | Locked at ~5nm via DUV · adequate for many domestic uses · building everything else around it |
| S. Korea | Sustaining Samsung Foundry · dominating HBM via SK Hynix |
| Japan | Investing heavily in Rapidus · hosting TSMC Kumamoto · materials still dominant |
| Netherlands | Mostly an ASML story — one company keeps the country a critical actor |
| EU more broadly | Ambitions hampered · Intel Magdeburg paused · 20% goal unlikely |
| India | Emerging as packaging hub · TSMC investments · several domestic fabs early stages |
The decade-long bet of every major economy is that they won't be left holding nothing if Taiwan is somehow lost. Every CHIPS Act, every export control, every fab subsidy, every diplomatic visit to Taipei is part of that bet. Whether it pays off is a question with no good answer. The technology — the lithography, the packaging, the materials, the trained workforce — accumulates slowly. Geopolitical events can move fast. The mismatch is the entire problem.
If you've followed this far, you can read the chip news of 2026 with full context. The next time you see a headline about a tariff, an export control, a new fab, or a diplomatic move on Taiwan, you'll be able to place it in the technical reality that makes it matter — and the strategic reality that determines what it means.
Where most chips actually live — and why it matters
For all the attention on B200 GPUs and 2nm fabs, those represent maybe 30% of semiconductor revenue and well under 10% of unit volume. The rest — the chip in your car's engine controller, the sensor in your phone's camera, the power IC in your laptop's USB-C, the accelerometer in your earbuds — runs on processes that are 5–30 years old. This is the trailing edge, and it's where most chips actually live.
The iceberg
Leading-edge AI is the tip · trailing edge is the underwater 80%By units shipped, by volume, by industries served
↓ ANIMATEDA modern internal combustion car contains 1,500–3,000 chips. An EV has 3,000–5,000. A new iPhone has roughly 30. A hospital MRI has thousands. A weather satellite has thousands. The smart electric meter on your house has dozens. None of these are leading-edge.
This is also why the post-COVID chip shortage was such a shock — it wasn't about AI chips, it was about $1–5 microcontrollers used in car door modules. By 2022, factories worldwide were idle waiting for $0.50 chips. The trailing edge is invisible until it stops working.
Power semiconductors · SiC and GaN
Why every EV and datacenter is switching from siliconWide-bandgap semiconductors handle voltage and frequency silicon can't
↓ ANIMATEDThe Tesla Model 3 was the watershed: in 2017, Tesla switched its main inverter to a SiC module from STMicroelectronics. That single product validated SiC for mass-market automotive and triggered a billion-dollar investment wave from Wolfspeed, Infineon, ON Semi, ROHM, and others. Today, leading-edge EVs almost universally use SiC.
GaN is following a similar arc on a different timescale. GaN excels at high-frequency switching, which means smaller transformers and capacitors. The 100W laptop charger that's a third the weight of last decade's? GaN. Datacenter 48V → 0.7V conversion stages? Increasingly GaN. At hyperscaler scale the efficiency gains are hundreds of millions of dollars per year.
The geography is shifting. China is building enormous SiC and GaN capacity — both substrates and devices. Wolfspeed (the SiC substrate leader) is in financial trouble after over-investing during the EV bubble. Chinese SiC suppliers now produce ~30% of world wafer capacity. If you wanted to identify a part of the chip industry about to be dominated by Chinese manufacturers, power semiconductors is a good candidate.
CMOS image sensors · photons to bits
The other 3D stacking storySony invented HBM-style stacking for cameras five years before HBM was for AI
PARALLELSony makes ~50% of all smartphone image sensors. Samsung is second at ~20%. OmniVision third. Sony's lead is real — they pioneered the stacked CMOS sensor, which is structurally what HBM is for memory: pixels on top, logic on a separate die underneath, TSVs between. Newer sensors stack three layers — pixels / DRAM / logic — for global-shutter capture and on-chip object recognition before data even leaves the sensor.
There is no Moore's Law for image sensors. Just continuous incremental refinement of pixel size, quantum efficiency, color filter dyes, microlenses, on-chip processing — accumulating over decades. A 2026 iPhone sensor isn't 100× better than a 2010 sensor. It's maybe 5–10× better, and that's incredibly hard-won.
MEMS · chips that feel the world
Inside an accelerometerHow a phone knows which way is down
↓ ANIMATED · CLICK TO TILTMost MEMS fabs are old — 200mm wafers, processes from the 1990s, but with extremely refined recipes. The barrier to entry isn't the equipment — it's the recipe. You can't just buy MEMS tools and start making accelerometers.
Bosch
automotive · GERMANYThe absolute leader in automotive MEMS. Most airbag accelerometers, tire pressure sensors, and inertial units in cars worldwide.
STMicroelectronics
phones · FR / ITMassive smartphone MEMS portfolio. Apple's accelerometer/gyro supplier across iPhone generations.
TDK-InvenSense
phones · JAPANMajor gyroscope and motion sensor supplier. Bought by TDK in 2017. Alternative supplier into many phones and wearables.
Knowles
microphones · USAMEMS microphones in nearly every smartphone, AirPod, and laptop. Tiny chips that hear.
Cars · the chip-volume story
From 50 chips to 5,000 in three decadesAnd almost none of them are leading-edge
↓ DATAMost automotive chips come from specialists: Infineon (largest), NXP, STMicro, Renesas, ON Semi, TI. They run their own fabs at mature nodes with extreme reliability requirements — a car chip has to work for 15 years through temperature swings, vibration, and corrosion. Qualification cycles are years long. Margins are stable but unspectacular. The business is consistent and durable in a way leading-edge logic isn't.
The China overcapacity wave
The mature-node tsunamiWhat happens when subsidized capacity floods the market
2024-2027By some estimates, China will add more 28nm-and-above capacity in 2024–2027 than the rest of the world combined. The leading-edge sanctions don't apply — this isn't 5nm. It's exactly the nodes that build automotive chips, power management, image sensors, communications. The CHIPS Acts everywhere are about leading edge. The actual capacity wave is at trailing edge.
When China's mature-node capacity comes online, prices of those chips could fall sharply. Western automotive chip makers — Infineon in Germany, NXP in the Netherlands, ST in France/Italy, TI in the US — have to figure out how to compete with subsidized Chinese capacity that doesn't need to make a normal return. This is a slow-motion crisis that gets less press than AI chip wars but probably matters more for the long-term health of US and European chip industries.
The full node ladder · who plays at each level
Each node is its own industryThe leading edge has 1 dominant player. The 180nm world has thirty. Both have moats — different in kind.
↓ COMPLETE MAPOne of the most counterintuitive things about chips: mature doesn't mean commoditized. A 180nm BCD power process at TI or Infineon is just as defensible as TSMC's 2nm — the moat is just a different shape. At leading edge, the moat is capex and process know-how that takes 20 years to accumulate. At mature, the moat is customer qualifications that take 5 years to win, decades of accumulated design libraries, and product lifecycles measured in decades. Nobody is trying to disrupt the 180nm BCD business because the customers don't want to be disrupted.
Read this top-to-bottom and a structure emerges. The leading edge is a monopoly — TSMC has 90%+ at 3nm, and the moat is thirty years of process accumulation plus capex no one else can match. 5nm and 7nm are an oligopoly — TSMC + Samsung + (at 7nm) SMIC and Intel. 14nm down to 22nm becomes contested — four to six foundries, healthy margins, real competition. 40/65nm is diverse — eight or more foundries each holding their share. 90/130/180nm flips the model entirely — these aren't pure-play foundry markets; they're IDM territory where companies like Infineon, TI, NXP run their own fabs and don't sell to outsiders.
The 180nm moat is invisible from the outside. An IDM like TI has tens of thousands of part numbers, each qualified into customer products that have 20-year lifecycles. The fab process recipes encode 30+ years of accumulated tweaks. The customer engineers at Bosch or John Deere have spent careers learning that specific TI part's behavior at temperature extremes. Even if a competitor offered the same chip at half the price, the qualification cost on the customer side would dwarf the savings.
Specialty processes · entirely orthogonal to the node ladder
The node ladder above is for CMOS logic — the same kind of transistors used in CPUs, GPUs, and MCUs, just at different feature sizes. But large parts of the chip industry don't use CMOS logic at all, or use it as a layer alongside something else. These are specialty processes, and they have their own player ecosystems entirely separate from the foundry world.
Power · SiC
~5 players globallyWide-bandgap silicon carbide for high-voltage / high-temperature switching. EV inverters · solar · datacenter. Different substrate, different fab, different physics.
Power · GaN
emerging fieldGallium nitride for high-frequency switching. Phone chargers · data center 48V · 5G base stations. Eating silicon's market in fast charging.
CMOS Image Sensors
SONY ~50%Photodiode arrays + dedicated CIS process. Different from logic — needs deep photodiodes, color filters, microlenses. Stacked 2-3 dies via TSVs.
MEMS
recipes > equipmentMechanical structures on silicon. DRIE etching, sacrificial layers, hermetic seals. Each company has refined their own process; you can't just buy MEMS tools.
DRAM
3-player oligopolyTheir own DRAM-specific nodes (1z, 1α, 1β...). HBM is the high-margin variant. Capital-intensive, cyclical, brutal pricing dynamics.
NAND Flash
5-player race3D NAND with 200+ stacked layers. Different physics from logic — vertical channels, charge trapping. The other big memory market.
RF · Compound semis
specialistGaAs, GaN, SiGe for high-frequency RF. Phone front-ends, base stations, radar, satellite. Not silicon at all in many cases.
Photonics
emergingSilicon photonics — light-on-chip for datacenter optics. The future of inter-chip communication. Mostly TSMC + custom processes today.
Each of these is its own world, with its own dominant suppliers, its own process physics, its own customer base, and its own competitive dynamics. Sony has been the world's #1 image sensor maker for 15+ years. Bosch has been the world's #1 automotive MEMS maker for 25+ years. The Samsung-SK Hynix-Micron triopoly in DRAM has been stable for over a decade. None of these positions are being seriously challenged.
And this is what makes the chip industry so much wider than people realize. There isn't a chip industry. There's the leading-edge logic industry. The mature foundry industry. The IDM analog industry. The DRAM industry. The NAND industry. The image sensor industry. The MEMS industry. The power semi industry. The compound semi industry. The photonics industry. Each operates by different rules. Each has its own moats. Each has its own geopolitical dynamics.
When someone says "the chip industry," ask which one. The answer changes everything that follows.
Why the trailing edge matters
The silent 70%What you miss if you only watch AI silicon
| WHY IT MATTERS | WHAT IT MEANS |
|---|---|
| Where most chips actually are | Anything with a battery or a plug has trailing-edge silicon inside |
| Where most chip-industry jobs are | Not at TSMC fabs in Taiwan — at Infineon in Germany, ST in France/Italy, TI in Texas, Renesas in Japan |
| Where physics is most diverse | Power semis, image sensors, MEMS — entirely different processes optimized for different physical phenomena |
| Where the geopolitical risks are most concrete | Chinese mature-node capacity is real, growing, and will reshape markets in ways subsidies haven't addressed |
| Where the post-COVID shock actually hit | $0.50 microcontrollers idled $30,000 cars · the 2021–22 shortage was nearly all trailing-edge |
If you only watch the AI silicon story, you're watching maybe 30% of the industry while the other 70% restructures around you. The leading edge is where the marquee money is. The trailing edge is where the actual electrons live.
A useful exercise: pick up the nearest electronic device. Mentally inventory its chips. The CPU/SoC, if any, might be leading-edge. Almost everything else — the WiFi radio, the touchscreen controller, the audio codec, the battery management IC, the various sensors, the display driver, the power regulators — comes from this section's industry. The supply chain you've now seen extends all the way down to that level.
The frontier — honest about timelines
The chip industry has spent sixty years scaling silicon CMOS. That run is reaching physical limits: leakage at sub-1nm gate widths, thermal density past 1 W/mm², copper resistance at 5nm wires, the breakdown of Dennard scaling we covered earlier. The question isn't whether successors exist — many do, in labs. The question is which ones make it out of labs, and when. This section tries to be honest about timelines: what's shipping in 2026, what's mid-decade, what's late-2020s, and what may never happen.
A horizon, not a roadmap
What's actually shipping whenSorted by realism, not by hype
↓ TIMELINEFive categories matter: photonic interconnect (already shipping), photonic compute (early commercial 2026), neuromorphic (edge AI now, mainstream by 2027), in-memory analog compute (mid-decade), quantum (advantage by ~2026, fault tolerance ~2029-30, real impact later), and 2D materials and exotic semiconductors (still labs). Each gets a section. None will replace silicon for general-purpose compute soon — they will displace it for specific workloads where they have clear physical advantages.
Photonics · interconnect first, compute later
The split everyone missesLight has already won at moving data · whether it wins at computing data is open
↓ ANIMATEDPhotonics is two stories, not one. Photons moving data — chip-to-chip, rack-to-rack, building-to-building — is decisively winning. Long fiber runs in datacenters have been optical for decades. What's new is optics moving inside the package itself: co-packaged optics (CPO). NVIDIA's Spectrum-X and Quantum-X switches, shipping late 2025 into 2026, integrate optical engines directly with the switch ASIC, removing the pluggable transceiver entirely. Power per bit drops ~3-4×. Photons computing data is the harder, separate story. Both use the same materials and similar fabrication, but they're not the same problem.
Three material platforms compete: silicon photonics (SiPh) — leverages CMOS infrastructure, dominant for transceivers. Indium phosphide (InP) — best for lasers and high-power optical sources. Silicon nitride (SiN) — ultra-low loss, good for passive routing. Thin-film lithium niobate (TFLN) — emerging for high-speed modulators (Q.ANT's compute play). TSMC's silicon photonics offering went into volume in 2024-25; the photonics fab is now a real business.
The CPO transition matters strategically. Pluggable transceivers are a $12B+/year market. Companies like Coherent, Lumentum, InnoLight dominate transceivers. CPO threatens that ecosystem because the optics live inside the switch package now — the transceiver vendors get displaced unless they pivot. Meanwhile, switch silicon designers (Broadcom Tomahawk, NVIDIA Spectrum) absorb the optics business. This is the largest unsung supply-chain restructuring of the AI era.
Photonic compute · the harder betLight is great at multiplying. Less great at everything else.
EARLY COMMERCIALThe case for computing with light: a beam of photons passing through a Mach-Zehnder interferometer can perform a multiplication essentially for free, at the speed of light, with negligible energy. Stack many of these and you have a matrix multiplier — exactly the operation AI workloads need most. Lightmatter, Q.ANT, Celestial AI, Lightelligence, and others have working photonic matrix multipliers shipping (early 2026) or in development.
The catches:
1. Precision. Photonic compute is fundamentally analog. You can't easily get more than 4-8 bits of precision out of an interference pattern. That's fine for inference (FP4/INT8 territory), bad for training (FP16+ usually needed). Most photonic compute startups are inference-only as a result.
2. Nonlinearity. Neural networks need nonlinear activations (ReLU, GeLU). Light is naturally linear. Most photonic accelerators do the linear ops in light, then convert to electrical for the nonlinearity, then back. Each conversion costs energy and latency. Pure-photonic compute would need on-chip optical nonlinearity, which remains a research problem.
3. Memory. Light is fundamentally a flow, not a store. There is no "photonic RAM." Weights still have to be loaded from DRAM via electrical interfaces, then converted to phase-shift settings, then held there. The von Neumann bottleneck doesn't disappear — it shifts.
Despite all this, photonic compute is real and shipping. Q.ANT's NPU launched H1 2026 in collaboration with the Jülich Supercomputing Centre, using thin-film lithium niobate. Lightmatter has working systems. Celestial AI is building photonic fabric for AI clusters. The bet isn't that photonics replaces GPUs — it's that for some specific workloads (transformer inference, optical neural networks for sensor processing) it offers 10-100× efficiency gains while being just barely good enough.
Neuromorphic · brain-inspired computing
Loihi 3 and NorthPole · the brain-shaped bet1,000× efficiency on the right workloads · janky software · real shipments
SHIPPING JAN 2026The brain runs on roughly 20 watts. A B200 GPU running similar tasks (image recognition, real-time inference) runs on 1,200. The factor of 60× isn't because brains have better silicon — it's because they compute fundamentally differently. Spiking neural networks only fire when there's information to communicate (sparse, event-driven), and memory and compute are colocated (no von Neumann bottleneck).
Neuromorphic chips try to capture both properties on silicon. Intel Loihi 3, commercial release January 2026, has digital implementations of spiking neurons with on-chip learning. IBM NorthPole, in production for 2026, takes a different approach: co-locate memory directly with compute units, eliminate the off-chip DRAM bus, achieve up to 25× the energy efficiency of an H100 for image recognition. BrainChip Akida is the commercial edge leader, shipping in real products since 2023.
The catch is software. Loihi and NorthPole don't run PyTorch out of the box. Spiking neural networks require training and deployment in spike encodings — temporal patterns of binary events rather than dense floating-point activations. Tools have improved (Intel's Lava framework, IBM's NorthPole SDK) but it's a different mental model. Most ML engineers have never written a spiking model.
Where neuromorphic wins clearly: edge AI with hard latency or power constraints. AR glasses where battery life is everything. Robotics that need sub-millisecond reaction. Always-on sensor processing where the device sleeps until something interesting happens. Mercedes and BMW are reportedly integrating neuromorphic vision into autonomous braking. It's not replacing the datacenter GPU. It's filling a category the datacenter GPU was never well suited for.
In-memory analog compute · the deepest bet
Compute where the data livesThe von Neumann bottleneck addressed at the device level
MID-DECADEEvery architecture we've covered moves data: from HBM to compute, from registers to ALUs, from cache to cores. Each move costs energy. The end-state of "moving data is the bottleneck" is the realization that you shouldn't move it at all. Compute should happen inside the memory.
Three technologies pursue this:
ReRAM (Resistive RAM). Memory cells whose resistance can be set to many values, not just 0 or 1. Apply a voltage, read the current — by Ohm's law, current = voltage × conductance. If conductance encodes a weight, then a single ReRAM crossbar performs a matrix-vector multiply analog-style, in a single physical step. No clock cycles, no data movement. Companies: Mythic AI, Crossbar, Weebit Nano.
PCM (Phase-Change Memory). Cells made of chalcogenide alloy whose crystal/amorphous state encodes data. IBM has built PCM-based analog accelerators that achieve 10-100× efficiency on transformer workloads. Same principle as ReRAM: use the device physics itself as the multiplier.
Ferroelectric (FeFET). Transistors whose threshold voltage shifts based on a ferroelectric layer's polarization. Non-volatile, fast-switching, integrate well with CMOS. The "what if NAND flash and SRAM had a child" technology. Several startups; major players (TSMC, Imec) have R&D programs.
The performance numbers are wild. Mythic's M1076 analog matrix processor claims 25 TOPS at ~3W — order-of-magnitude better than digital equivalents. IBM's analog AI work has shown ~14× efficiency improvements over GPUs for transformer inference.
The catches are equally serious:
Precision. Analog computation accumulates noise. ReRAM cells drift over time. Most in-memory analog accelerators top out at 4-8 bits effective precision. Inference only.
Programming. Each weight has to be carefully written into a memory cell with precise resistance. Slow, sometimes destructive (PCM cells wear out after ~10⁹ writes). Models are loaded once and run for a long time.
Process integration. ReRAM, PCM, and FeFET all require materials and processing steps that don't exist in standard CMOS fabs. Mass production requires retooling, which limits scale until volume justifies it.
Mid-decade is the realistic timeframe. If Taalas's mask-ROM approach is the most extreme version of this idea, in-memory analog is the more flexible (but slower-to-program) cousin. The two together represent a serious challenge to the GPU-plus-HBM model — the kind of challenge that's invisible until it's not.
Quantum computing · the long bet
Where the field actually is in 2026Verified advantage targeted by year-end · fault tolerance still ~2029-30 · most "use cases" still hype
↓ ROADMAPQuantum computing has been "five years away" for decades. The reason it's no longer a pure punchline is that the milestones have actual dates now. IBM's roadmap is the most concrete: Loon (2025) demonstrated qLDPC building blocks. Kookaburra (2026) is the first QEC-enabled module — quantum memory and logic combined. Cockatoo (2027) connects modules. Starling (2029) — the target of the whole roadmap — runs 100 million quantum gates on 200 logical qubits. That's the regime where quantum computers can do things classical machines cannot.
The honest assessment of quantum in 2026: we are right at the inflection between "interesting research" and "experimentally useful." Verified quantum advantage — beating classical computers on a problem people care about — is plausible by year-end 2026. Useful quantum computing for real applications (chemistry, materials, optimization) requires logical qubits with low enough error rates to run long algorithms. That's the Starling 2029 milestone. Even Starling at 200 logical qubits won't break RSA encryption (which needs ~4,000+ logical qubits at much lower error rates).
Three competing platforms. Superconducting qubits (IBM, Google) — most mature, requires near-absolute-zero cryogenics. Trapped ions (Quantinuum, IonQ) — slower but better fidelity, easier connectivity. Photonic (PsiQuantum, ORCA) — operates at room temperature, scales differently, harder to do gates. Topological (Microsoft) — Microsoft's bet that exotic Majorana fermion qubits will be inherently error-corrected. Still mostly research.
What quantum computing won't do, even in 2030: replace your laptop. Solve general AI. Crack encryption tomorrow. It will probably be useful for specific problems with quantum structure — molecular simulation, certain optimization classes, quantum chemistry — where the speedup is exponential and the problems are otherwise intractable.
Beyond CMOS · 2D materials and exotic semiconductors
What replaces silicon when silicon really runs outThe "after CMOS" research, honestly assessed
RESEARCHSilicon CMOS still has 5-10 years of meaningful scaling left through CFET, backside power, and packaging. But the long-term question is what happens past atomic limits. The leading candidates are 2D materials — atomically thin sheets that can act as semiconductors with much better electrostatics than 3D silicon at extreme scales.
MoS₂ · WS₂
leading 2D candidateTransition metal dichalcogenides. Atomically thin (3 atoms thick), real bandgap (~1.8 eV), high on/off ratios. Could enable transistors below silicon's electrostatic limits. Imec, Stanford, MIT working on integration.
Graphene
disappointed2010s "wonder material" with extraordinary mobility — but no bandgap, so it can't be turned off. Still useful for high-frequency RF, sensors, and contacts. Not the logic-transistor material it was once promised to be.
Carbon nanotubes
long researchTiny rolled-up graphene cylinders with great transport properties. MIT/Stanford have built CNT-based microprocessors. Manufacturing at scale remains the wall — placing billions of nanotubes precisely is unsolved.
Spintronics
incrementalUse electron spin instead of charge to encode information. MRAM (a commercial product) is the visible piece. Logic spintronics — full computation via spin currents — remains research with no clear commercial path.
Mott insulators
exoticMaterials whose conductivity changes dramatically at electric fields or temperatures — could enable low-power switches. Vanadium oxide is the classic. Decade-plus of research, no commercial logic devices yet.
Halide perovskites
emergingStrong in solar (15-25% efficiency in research cells) and LEDs. As transistors, defect tolerance is interesting but stability is a problem. Currently more an optoelectronic story than a logic story.
The honest read: none of these will replace silicon for general-purpose logic in the 2020s. Possibly not in the 2030s either. But the same was true of FinFET in 2000 and EUV in 2010. The transition from "lab" to "production" takes 15-25 years even when the material is well-understood. What matters is which 2D material's manufacturing processes mature first, and what specific niches they win before they win general logic.
Backside power, CFET, advanced packaging, and the AI-specific architectural moves we covered in earlier sections will carry mainstream silicon through 2030 and possibly to 2035. After that, the field branches: photonics for some workloads, neuromorphic for others, in-memory analog for a third, quantum for a fourth, and exotic materials for whatever's left at the bottom of the transistor stack. "The chip industry" stops being one industry.
What this all means
The post-silicon era is many industries, not oneAnd the time to learn them is now, while they're still labs
The clean storyline of chip history — Moore's Law, smaller is better, one curve to track — is breaking apart. The honest map of 2030 has multiple specialized stacks running in parallel:
| WORKLOAD | BEST 2030 SUBSTRATE | WHY |
|---|---|---|
| Frontier model training | Advanced silicon · CFET · HBM4/5 | Need flexibility · precision · interconnect · capital is a feature |
| Inference at scale | Mask-ROM (Taalas) · in-memory analog · photonic compute | Stable workloads can use specialized silicon · 10–100× efficiency wins |
| Datacenter networking | Co-packaged optics · TSMC SiPh | Energy/bit collapses past 10cm · already shipping |
| Edge AI / robotics | Neuromorphic · Loihi 3 · NorthPole | Sparse · event-driven · sub-ms latency · 1,000× efficiency |
| Materials · molecular | Quantum · 200+ logical qubits | Quantum problems · classical can't reach |
| Long-tail logic | Mature CMOS · 28nm and above | Cheap · qualified · proven · enormous installed base |
The unifying thread is specialization. Sixty years of "one architecture, smaller every year" gave us general-purpose computing as a category. The next twenty are about workload-specific substrates — chips for inference, chips for graph problems, chips for chemistry, chips for sensor processing. The chip industry isn't dying. It's diversifying so violently that "the chip industry" becomes the wrong unit of analysis.
This is why every section of this guide had its own moats, its own players, its own physics, its own roadmap. The future isn't a single ladder anymore. It's a portfolio.
If you've read this far — every section, every diagram, every sub-architecture — you now have the framework to read any chip news in any of the futures it gestures at. The transistor, the litho, the package, the memory, the power, the AI stack, the geopolitics, the trailing edge, the mechanics, the deep future — they're not separate. They're one industry's slow self-fragmentation, and the better part of the 2020s and 2030s will be watching which forks win.
The map of the industry circa 2026
Six countries matter. One company makes the lithography (Netherlands). Three foundries lead manufacturing (Taiwan, Korea, U.S.). One country is locked out and improvising hard (China). Japan is trying to re-enter through Rapidus. Below: a temporal view of who reached what, when.
The race, by year of first volume production
Where each player actually stands
The three branches in one paragraph
Western frontier: TSMC ships 2 nm GAA in volume now (Apple has half) and runs the only at-scale advanced packaging line. Intel matches at the transistor level (18A with RibbonFET + PowerVia) and leads on High-NA EUV, betting that lithography and packaging — not capacity — are the next moat. Samsung shipped GAA first in 2022 but trails on yield. Rapidus is Japan's late re-entry, doing 2 nm trials now, full production targeted 2027.
The China branch: SMIC is doing 7 nm-class on DUV multi-patterning at low yield, and is in pilot for a "5 nm-class" node — same wavelength tools, more passes, more defects. No EUV access. Domestic startups (SiCarrier, AMEC, Naura) building DUV alternatives. Huawei sits on top as the system designer pulling everything together. Roughly two generations behind; not closing.
The lithography monopoly: ASML is the only company that makes EUV scanners. The U.S. (lenses, lasers), Japan (chemicals, masks), and Germany (Zeiss optics) supply the parts. This bottleneck is the geopolitical lever.
The four kinds of companies that build a chip
Building one chip used to mean one company doing everything from sketch to silicon. That model is dead at the leading edge. Today the industry splits into specialists, each owning one slice of the value chain — and that split, more than any technology choice, explains who is winning, who is stuck, and why TSMC matters more than its products would suggest.
The unbundling started in the late 1980s when Morris Chang founded TSMC as the first pure-play foundry — a fab that made nobody's chips but everyone else's. Suddenly a chip designer didn't need a billion-dollar factory to compete. NVIDIA, Apple, AMD, Qualcomm, Broadcom — none of them own a fab. They are their designers and their software ecosystems. Intel and Samsung are the last great IDMs at the leading edge, and Intel is mid-pivot to also being a foundry.
Company timelines · 1960 → today
Pinch or scroll horizontally. Each row is a company's business history — founding, IPO, key acquisitions, strategic pivots, and setbacks. Diamond colours match the legend below.
Where they are now · 2026
IDM design + fab + test, all in one company
Fabless design only · ship designs to a foundry
Foundry manufacture for hire · no products of their own
EDA · IP · Equipment the tools and licences everyone needs
How chips actually get made
Most descriptions of the chip industry stop at the chip. They tell you what TSMC's 2nm node does, but not how the wafer that becomes it spent four months traveling between machines, or why the polymer that defined its smallest features came from a single factory in Yamaguchi prefecture, or what the cleanroom looks like at four in the morning when the fab is running. The mechanics of how chips get made is its own discipline — operational, physical, and in many ways the real moat.
The wafer journey · sand to chip in four monthsEight stages · multiple companies · multiple countries
↓ ANIMATEDA bare 300mm wafer costs ~$100. About $90 of that is the polishing — Sumco and Shin-Etsu polish to atomic flatness, RMS surface roughness <0.1nm. The wafer surface is one of the flattest things humans manufacture.
The fab step (#6) is where the bulk of the time and cost concentrates: 1,000–1,500 individual process steps over 3–4 months. Lithography, etch, deposition, ion implant, planarization, inspection, repeat. Each step is a chance for a defect that ruins the chip.
Inside a fabThree vertical levels · class-1 air · 100 MW continuously
CROSS-SECTIONThe cleanroom is organized in bays — each bay holds equipment for a specific process type. Wafers move between bays inside FOUPs (Front Opening Unified Pods), sealed plastic carriers that hold 25 wafers each. FOUPs travel along ceiling-mounted rails — Overhead Hoist Transport — that thread through the cleanroom like a model train system.
People wear bunny suits because humans shed five million skin particles per minute. Modern fabs are increasingly automated; TSMC's most advanced fabs have entire wings that operate without staff inside during normal operation. The yellow light is because UV would expose photoresist.
The mask shop · the bottleneck nobody talks about$30–50M per chip in masks alone · why "respinning silicon" is dreaded
PHOTOMASKThe mask itself is made by an electron beam writer — a multi-million-dollar machine that draws the pattern one pixel at a time, taking 8–24 hours per mask. EUV masks are an order of magnitude more delicate than DUV due to their multilayer molybdenum/silicon reflectors. A speck of dust on the mask would print as a defect on every chip on every wafer — which is why pellicles exist, even though EUV pellicles remain technically extremely difficult.
Only a handful of mask shops can do leading-edge work: TSMC's internal mask shop, Samsung's, Intel's, plus merchant suppliers Photronics (US) and Toppan and DNP (Japan). Mask shop queue time can affect tape-out schedules. For all the talk of EUV scanners, the mask shop is its own bottleneck.
Specialty chemicals · why Japan?~90% of advanced photoresist comes from four Japanese companies
↓ ANIMATEDPhotoresist for a leading-edge fab has to be cleaner than almost any liquid on Earth. Particulate counts under 5 nanometers are tracked. Metal contamination is measured in parts per trillion. The factories that make this stuff are themselves cleanrooms.
JSR engineers work alongside TSMC engineers on every new node. The qualification cycle for a new resist on a new node is 18–24 months. A new entrant doesn't just need to make the chemicals; they need to convince TSMC to spend two years qualifying them. Japan's photoresist position is the chip industry's most underappreciated chokepoint — and it's why Japanese export controls on photoresist (briefly imposed on Korea in 2019) caused a global chip industry panic.
The yield curve · why a new node takes 3–4 years to rampThe economic indicator that determines fab profitability
↓ ANIMATEDIdentifying yield killers requires inspection — every wafer is inspected at multiple steps. Statistical analysis correlates defects with specific tools, recipes, even individual machines. Engineers track yield down to specific FOUPs and identify root causes. The deepest moat in fab manufacturing is institutional knowledge of yield improvement. TSMC has been doing this for thirty years across thirteen process generations. The accumulated playbooks — which tools tend to drift, which materials degrade in storage, which inspection metrics correlate with which yield killers — exist nowhere else.
This is also why Intel's manufacturing struggles have been so hard to reverse. They had the playbooks once. Stumbling at 10nm in the late 2010s broke their cadence; restarting requires both fresh capex and rebuilding institutional muscle that atrophied. Samsung is in a similar place with leading-edge yields — they have the equipment but not the consistent yield outcomes TSMC produces.
The equipment ecosystem · the big sixSix companies sell most of the tools in every leading-edge fab
CONCENTRATIONASML
NETHERLANDS · €40B+ revLithography. EUV scanners (~$200M each, ~50/year), High-NA EUV ($380M each), DUV. Effectively zero competition. The single most important equipment company in the industry.
Applied Materials
USA · $30B revLargest US equipment company. Etch, deposition, ion implant, CMP, inspection — basically everything except lithography. Dozens of tool types in every fab.
Lam Research
USA · $17B revEtch and deposition specialist. Particularly dominant in 3D NAND (the high aspect-ratio etches 3D NAND requires are Lam territory). Major in plasma etch.
Tokyo Electron (TEL)
JAPAN · ¥2T revEtch, deposition, wet processing, coater/developers. Strong in track tools that pair with ASML scanners. Japanese tool ecosystem anchor.
KLA
USA · $10B revInspection and metrology. The "eyes of the fab." Patterned wafer inspection, defect review, overlay metrology, mask inspection. Effectively a monopoly in some inspection categories.
ASM International
NETHERLANDS · $3B revAtomic layer deposition specialist. Critical for advanced gate stacks and high-aspect-ratio fills. Smaller but holds a deep technical lead in ALD.
Below these six, hundreds of smaller specialists: Hitachi High-Tech, Lasertec (Japan, EUV mask inspection), Advantest and Teradyne (testing), Ebara (CMP and pumps), SCREEN (wet), Axcelis (implant), Onto Innovation (inspection), AIXTRON (Germany, MOCVD). Each leading-edge fab uses tools from 50+ different equipment companies.
The trade-control implications are clear: blocking ASML EUV doesn't only block the ~$200M scanner. It indirectly blocks all the upstream Japanese and US tools co-designed around EUV processes. The supply chain isn't a chain — it's a mesh. Cutting any leading-edge fab off from ASML cuts it off from the entire ecosystem of tools that have been jointly engineered around EUV's process windows.
Why this is the deepest moatYou cannot buy your way past forty years of compounding
The thing hardest to convey about chip manufacturing is the compounding nature of accumulated knowledge:
| WHAT | HOW LONG IT TOOK |
|---|---|
| ASML's EUV machine | 25 years of R&D · thousands of engineers · tens of billions of dollars |
| JSR's photoresist | 30+ years of polymer chemistry · 5+ decades of co-development with foundries |
| TSMC's yield playbooks | 30+ years across 13 process generations · accumulated tacit knowledge |
| Sumco's wafer surface finish | 50+ years of refinement · sub-0.1nm RMS roughness |
| TEL's coater/developer fluidics | 40+ years of process engineering · embedded in every fab globally |
None of these are replicable in five years no matter how much money you throw at them. This is why the CHIPS Acts everywhere are both important and insufficient. The capex they fund is necessary but not sufficient. What you cannot buy with subsidies — accumulated tacit knowledge, decades of supplier relationships, a workforce trained over generations — is what actually determines whether a fab works.
It also bears noting that almost all of this lives outside the United States. ASML is Dutch. Tokyo Electron, JSR, TOK, Shin-Etsu, Sumco are Japanese. ASM International is Dutch. imec, Infineon, NXP, Merck KGaA, BASF, Siltronic, AIXTRON, Soitec, Carl Zeiss SMT are European. The European semiconductor industry is bigger and deeper than most Americans realize, especially in materials, equipment, and analog/power. The EU Chips Act money is flowing into all of them, and most have substantial engineering teams hiring throughout Germany, the Netherlands, Belgium, France, and Italy — including a growing number of remote and hybrid roles.
When you read about chip-industry investments — fabs, factories, capex, subsidies — remember that the chip you'll eventually hold is the output of a system that's been compounding for sixty years. The leading-edge moat isn't the building. It's everything that took decades to learn how to put inside it.
The deeper bottlenecks
EDA — a software duopoly+1 you cannot route around
To design any modern chip you need a tool from one of three companies: Synopsys, Cadence, or Siemens EDA (formerly Mentor). They are the GDB and gcc of silicon — every fabless company runs on them. Most of the industry pays both Synopsys and Cadence because designs end up using a mix.
Export controls cut China off from the latest versions in 2022. This is a quieter constraint than the EUV ban but possibly more binding: you can build a fab without ASML by spending tens of billions on alternatives, but you cannot meaningfully design a 3 nm-class chip without one of these three companies' software.
Equipment — five companies own most of the fab
A modern fab has hundreds of tool types from many vendors, but five names dominate the bill: ASML (litho · NL), Applied Materials (deposition + etch · US), Lam Research (etch + 3D NAND · US), KLA (inspection · US), Tokyo Electron (coat / develop · Japan).
Most equipment is dual-use across nodes, but the highest-end variants are sanctioned for China. Mid-tier tools still flow freely, which is how SMIC keeps ramping despite the headlines.
The CUDA moat — why NVIDIA is hard to displace
NVIDIA's edge is not silicon — TSMC fabs everyone's silicon. The edge is CUDA, the software platform launched in 2006 that turned GPUs into general compute. Fifteen years of libraries, tools, drivers, and developer mindshare. Every hyperscaler is building its own AI accelerator (TPU, Trainium, MTIA, Maia) precisely to escape this lock-in, and even with billions invested they're still gaining ground only slowly.
Memory and HBM — the other half of every AI chip
Every AI accelerator is roughly half logic and half memory. The memory half is HBM (High Bandwidth Memory) — DRAM dies stacked on top of each other and bonded to the logic die through a CoWoS interposer. Three suppliers exist worldwide: SK hynix (dominant), Samsung, Micron. Through 2024–25 SK hynix held the lion's share of NVIDIA's HBM3E business; in early 2026 supply is still tighter than logic supply.