A Field Guide

Roboticsthirteen layers of metal, signal, and silicon

From the actuator to the humanoid moment — every layer of the modern robot, animated. A bottom-up walk through the industry that the press keeps trying to cover as one industry.
§1 · Foundations

Rigid actuatorsthe decision that decides everything else

Pick the actuator and you've already picked most of what's possible above it — the robot's weight, its safety profile, its control bandwidth, its battery life, its form factor, even its business model. Software is fungible. Sensors can be swapped. The actuator is the constraint that propagates upward through every layer of the stack and out into the market the robot can serve. This section walks the rigid tree: BLDC motors, the reduction problem, the harmonic-drive moat, the QDD revolution, and the place hydraulics still earns its keep.

1.1   The BLDC motor — the modern atomic unit

Brushless commutation, two geometries

same physics, opposite ends of the design space ↓ ANIMATED

A brushless DC motor is, mechanically, a ring of permanent magnets rotating inside (or outside) a ring of electromagnetic coils. A controller switches current through the coils in three-phase sequence, the rotating magnetic field drags the magnets around, and torque comes out the shaft. There are no brushes, no commutator, no sliding electrical contacts — the commutation is electronic, which is the entire reason BLDCs displaced everything else for serious robotics work after the 1980s.

Two design choices dominate everything downstream. First, the position of the rotor.

Inner-rotor (left) puts magnets on a small central shaft inside stationary coils — fast, low inertia, low torque. Outer-rotor (right) puts magnets in a rotating "can" around stationary internal coils — slower, but the longer lever arm yields dramatically more torque per kilogram. This single geometric choice splits the modern robotics motor market in half.
INNER-ROTOR · HIGH SPEED · LOW TORQUE Industrial arms, drones, surgical · Maxon, Faulhaber N S N S ~10,000 RPM · 4-pole rotor · 12 stator slots OUTER-ROTOR · LOW SPEED · HIGH TORQUE Legged robots, humanoids · MIT Cheetah lineage ~150 RPM · 14-pole rotor · 12 stator slots

Outer-rotor BLDCs (sometimes called "pancake motors" or "torque motors") are the geometry that quietly enabled the entire legged-robot revolution. The magnets sit at a larger radius, so the same magnetic force produces more torque. The same physics also produces a mechanical penalty — high rotational inertia — but for a robot's hip joint, which moves a few hundred RPM at most, that's a feature, not a bug. Inner-rotor motors went into precision applications and outer-rotor motors went into power applications, and the gearbox decision below is downstream of which one you started with.

Suppliers split by tier: Maxon (Switzerland) and Faulhaber (Germany) for surgical and precision; Kollmorgen (US) for industrial; T-Motor and MAD (China) for low-cost legged and drone — though increasingly the major humanoid players design their own motors in-house, because the off-the-shelf options leave 30%+ torque-per-kilogram on the table.

1.2   The reduction problem — why the gearbox is half the actuator

The strain wave gear

an unreasonable mechanism that makes precision robotics possible ↓ ANIMATED

A BLDC motor's natural operating point is fast and weak: tens of thousands of RPM, single-digit Newton-meters. A robot joint needs the opposite — a few RPM, hundreds of Newton-meters. Closing this gap is the gearbox, and three families dominate. Planetary gears (sun + planets + ring) are cheap, efficient, and tolerate misalignment, but have backlash — a few arc-minutes of "slop" you can feel. Cycloidal drives (an eccentric input wobbling a lobed disc) have very high torque density and shock resistance. But the real story is the strain-wave gear, sometimes called a harmonic drive.

The mechanism is genuinely strange. An elliptical "wave generator" deforms a thin-walled flexible cup — the flex spline — pressing its external teeth into the internal teeth of a rigid outer ring, the circular spline, at exactly two diametrically opposite points. The flex spline has two fewer teeth than the circular spline. Each full rotation of the wave generator advances the flex spline by exactly two teeth.

Watch the elliptical wave generator (orange) rotate at input speed. The flex spline (cyan) deforms with it, engaging the rigid circular spline (steel) at the two narrow ends of the ellipse. With two fewer teeth on the flex spline, each input rotation advances the flex spline by just two teeth — yielding 100:1 reduction in a single stage, with zero backlash, in a package thinner than a hockey puck.
INPUT · 3000 RPM OUTPUT · 30 RPM · 100:1 circular spline (rigid · 102 teeth · fixed) flex spline (100 teeth · deforms · output) WAVE GEN 3000 rpm tooth contact point 2-tooth difference · zero backlash · 100:1 in one stage

You get reductions of 30:1 to 320:1 in a single stage, with essentially zero backlash, in a package thinner than a hockey puck. This is the actuator that makes precision robotics possible. Harmonic Drive Systems (Tokyo, founded 1970) and its German arm Harmonic Drive SE together effectively defined the category and still hold the highest-precision tier — about 18% of the global strain-wave market by revenue, with the rest split across Nabtesco, Sumitomo, and a wave of Chinese fast-followers led by Leader Drive.

The moat isn't the design — the design is freely published — it's manufacturing tolerance accumulated over thirty years. Strain-wave gears are to industrial robots what EUV lithography is to chips: one company plus a fast-following second tier, with ten years of catch-up time for anyone outside.

1.3   Quasi-direct-drive — the Cheetah's gift

Inverting the gear ratio

big motor, tiny gearbox, sudden compliance PHYSICS

For thirty years, the industrial-robot orthodoxy was: small fast motor + big gearbox (50:1 to 200:1) = high torque, high precision, but stiff. If the robot bumps into something, the gearbox transmits the impact directly into the load — or, in reverse, snaps a tooth. The robot can't feel what it touched, because the gearbox masks the joint torque behind reflected inertia and friction.

The MIT Cheetah project, starting around 2013 under Sangbae Kim, inverted the equation. Take a much larger outer-rotor BLDC motor — built for high torque at low speeds — and pair it with a very small reduction, typically 6:1 to 10:1 single-stage planetary. The motor itself does most of the work; the gear is just a final multiplier.

Same external impulse applied to both joints. On the left, the geared servo blocks the impact — the small motor is masked behind 100:1 of gear inertia and friction; the energy has nowhere to go but the gear teeth. On the right, the QDD's big motor is only 8:1 away from the output, so the impact backdrives the rotor. The motor itself becomes the shock absorber, and the controller can read the joint torque straight off the motor current.
TRADITIONAL · SMALL MOTOR + 100:1 GEAR Stiff. Cannot feel impact. MOTOR ~50 W 100:1 GEARBOX output joint impulse in BLOCKED QDD · BIG MOTOR + 8:1 GEAR Compliant. Motor IS the torque sensor. QDD MOTOR · ~500 W 8:1 impulse BACKDRIVES

The robot can sense impact forces through the motor itself, no joint torque sensor required. It can fall, absorb impact through the actuator's own compliance, and stand back up. Mini Cheetah ships ~6–7 Nm nominal torque per joint at ~1 kg actuator mass; Wensing's custom Cheetah motors push 38 Nm peak. Boston Dynamics' Spot, Unitree Go2 and H1, Tesla Optimus, Figure 02, 1X Neo, Apptronik Apollo — every legged robot shipping in 2026 is built on QDD or close variants.

The catch: QDD trades torque ceiling for backdrivability. A QDD hip can run; a QDD finger cannot pinch hard enough to open a stiff jar. Which is why the same humanoid uses harmonic-drive-based actuators in its wrists and fingers, where the joint is small, slow, precise, and doesn't need to feel impact. The same robot is two completely different actuator families above and below the elbow.

1.4   Cycloidal & planetary — the other gearboxes

The wobble-disc workhorse

Nabtesco's quiet thirty-year monopoly on robot wrists ↓ ANIMATED

The harmonic drive isn't the only zero-backlash gearbox. Its main rival is the cycloidal drive — an eccentric input shaft wobbles a lobed disc against a ring of pins, and the slight count mismatch between disc lobes and pins produces a high reduction. Cycloidal drives can take more shock than harmonic drives (no thin-walled flex spline to fatigue) and produce more torque per kilogram, at the cost of slightly more vibration and slightly less precision.

An eccentric input shaft (orange) drives a lobed disc (cyan) into a ring of pins (steel). The disc has one fewer lobe than the ring has pins, so for each input rotation, the disc advances by one pin position — the same trick as a strain wave gear but expressed mechanically rather than elastically. Nabtesco of Japan owns roughly 60% of this market for industrial robot wrists and shoulders.
ring of pins (fixed) cycloidal disc · 11 lobes vs 12 pins eccentric input drives wobble

Nabtesco (Japan, formed in 2003 from the merger of Teijin Seiki and Nippon Air Brake) holds roughly 60% of the global cycloidal market for industrial robot joints. Industrial arms — Fanuc, ABB, KUKA, Yaskawa — buy Nabtesco RV-series reducers for shoulder and elbow joints, and harmonic drives from Harmonic Drive Systems for wrists. The supply chain is essentially a Japanese duopoly serving the entire industrial robotics industry, and has been for thirty years.

For everything outside precision robotics — drones, AGVs, mobile robot wheels, low-cost cobots — planetary gears rule. They have backlash, but they're cheap, efficient, and tolerate misalignment. The default reducer for 95% of motorized things in the world is still planetary; harmonic and cycloidal are the precision-robotics premium tier.

1.5   Hydraulics & pneumatics — when fluid still wins

The retreat of hydraulic humanoids

force density that electric motors can't touch — at unacceptable system cost PHYSICS

Before BLDC + QDD, the only way to get serious force out of a small joint was hydraulics — pressurized oil pushed through valves into pistons. The original Boston Dynamics Atlas, BigDog, the LS3 — all hydraulic. The strengths are real: hydraulics deliver force densities (force per kilogram of actuator) that electric motors still can't touch, especially under shock loads. A small hydraulic cylinder can output forces a motor of equivalent mass simply cannot.

The weaknesses ended their humanoid run. Pumps, reservoirs, valves, hoses — infrastructure that often weighs more than the actuators themselves. Noise (the BigDog "lawnmower" howl). Leaks (not great in a kitchen, terrible in a hospital). Efficiency (most of the pump's energy ends up as waste heat in the hydraulic fluid). Boston Dynamics' Atlas went all-electric in April 2024, the symbolic end of the hydraulic humanoid era.

A typical hydraulic actuator system: pump pressurizes a reservoir of oil, a servo valve gates flow to one side of a piston, the piston pushes its rod, and the displaced oil returns via the other valve port. Force density at the piston: high. System mass and complexity: also high. The pump and reservoir don't fit on the robot — they fit on the cart the robot is tethered to.
RESERVOIR oil PUMP 200 bar SERVO VALVE CYLINDER · PISTON · ROD ~25 kN force at 200 bar load Actuator on robot ← → pump infrastructure on cart · total system: 50–200 kg

Hydraulics now retreat to where they always belonged. Heavy industrial manipulators (Caterpillar excavators are robots — just call them by the wrong name). Marine and subsea, where electric motors can't easily seal against pressure. Aerospace flight surfaces, where the redundancy and force density justify the complexity. Suppliers: Bosch Rexroth (Germany), Parker Hannifin and Eaton (US), Moog (US), Caterpillar's in-house hydraulics group.

Pneumatics — compressed air instead of oil — are softer, cleaner, faster, weaker. They power factory pick-and-place grippers, pneumatic cylinders for assembly automation, and the entire branch of soft robotics that we'll cover in §2. Festo (Germany) and SMC (Japan) own this market. Pneumatic is also the bridge to the soft tree — every McKibben muscle, every fluidic elastomer actuator, and every soft pneumatic gripper traces its lineage back to the same compressed-air infrastructure that feeds factory automation worldwide.

Synthesis

The three rigid actuator industriesthat the press keeps treating as one

"The robot motor industry" is a category error. There are at least three industries here, with different physics, different players, different moats, different roadmaps, and different geographies — and a single humanoid robot in 2026 is, in actuator terms, all three of them operating inside one chassis.

Industrial-arm actuators Legged / humanoid actuators Heavy hydraulic
Substrate Inner-rotor BLDC + harmonic drive or cycloidal Outer-rotor BLDC + low-ratio planetary (QDD) Hydraulic pump + servo valve + cylinder
Where it lives Factory floor — Fanuc, ABB, KUKA, Yaskawa arms Hips, knees, shoulders of every legged robot Excavators, ships, aircraft, military
Dominant suppliers Harmonic Drive (JP/DE), Nabtesco (JP), Sumitomo (JP), Leader Drive (CN) Mostly designed in-house by humanoid OEMs; T-Motor, MAD, Keya for off-the-shelf Bosch Rexroth (DE), Parker Hannifin (US), Eaton (US), Moog (US)
The moat 30 years of manufacturing tolerance Custom motor design + thermal management Qualification, redundancy, certification cycles
Geography Japan, Germany, China rising US, China — fragmented, fast-moving Germany, US, Japan — mature, slow-moving
Market maturity Mature · ~$5B · 8.5% CAGR Fast-growing · pre-mass-production Mature · enormous · low growth

The most underrated fact in robotics is that the actuator decision is mostly already made by the time the AI team gets to write code. A humanoid built on QDD will move beautifully and have trouble pinching a key. A humanoid built on harmonic-drive arms will pinch precisely and never run. Roboticists do not pick the AI first; they pick the actuator first, and the AI must work within it. Every news cycle that frames humanoid progress as a software story is missing the layer below software where most of the engineering is actually happening.

§2 · Foundations

Soft actuatorsthe parallel tree the press keeps calling "the future"

The rigid tree solved precision: motors and gearboxes that move to a target position with arc-minute accuracy and hold it. The soft tree solves something different — compliance. The property that lets a robot work next to a human, grip a strawberry without crushing it, push a catheter through a coronary artery, or wear a robot the way you wear a sweater. McKibben muscles are from 1957. Soft pneumatic grippers have been in Festo's catalog for 25 years. Twisted-coiled fishing-line fibers are from 2014. Calling soft robotics "the future" undersells it. It is a parallel evolutionary track that has been quietly maturing alongside the rigid track for seven decades, and the public conversation just hasn't caught up.

2.1   The compliance problem — why a different actuator is necessary, not optional

The spectrum from rigid to soft

where each actuator family lives, and why the choice is structural ↓ ANIMATED

A rigid actuator can be made compliant by adding sensors and feedback control — measure the joint torque, soften the response, get behavior that feels compliant. This is what series-elastic actuators (a spring between motor and output) and impedance-controlled QDD do. It works, but it is fundamentally a software simulation of softness running on top of stiff hardware, and it fails at the limit: shut off the controller and the actuator is stiff again. A soft actuator's compliance is intrinsic. Cut the power, push it with your hand — it gives. The compliance is a property of the material, not a property of the control loop.

The actuator landscape arranged on a stiffness axis. The rigid tree from §1 lives on the left, with series-elastic actuators as its softest expression. The soft tree spans the middle to far right, with pneumatic and electrohydraulic muscles in the center and stimuli-responsive fibers at the soft extreme. Color encodes branch: orange for electromagnetic, cyan for fluidic, purple for electrostatic, gold for thermal/electrochemical.
RIGID SOFT stiffness ← → compliance RIGID TREE (§1) SOFT TREE (§2) Industrial servo harmonic QDD cheetah SEA spring inline McKibben 1957 FEA soft pouches HASEL 2018 Electrofluidic fiber 2026 Twisted fiber 2014 electromagnetic fluidic electrostatic thermal/elastic

Three things make compliance non-negotiable for an entire class of robot. Human contact: a robot working a meter from a person needs to fail soft. A QDD humanoid is dramatically softer than an industrial arm but still hits like a truck if a control loop misbehaves. Unknown grasping: picking up a tomato, a glass figurine, or a wriggling fish requires the gripper to conform to the object before it knows the object's shape. Inside-the-body: a catheter, an endoscope, a surgical tool that must navigate vessels and viscera cannot be made of stiff metal joints — the geometry alone forbids it.

Soft actuators didn't come from someone trying to "improve" the rigid tree. They came from people trying to do things the rigid tree fundamentally couldn't. The two trees are not competitors. They are answers to different questions.

2.2   McKibben muscles & FEAs — the pneumatic origin tree

The braid that converts radial to axial

a 1957 prosthetics invention that quietly became Festo's bestseller ↓ ANIMATED

A McKibben muscle is a rubber bladder inside a braided mesh sleeve, with the braid threads running off-axis at a fixed pitch angle. Inflate the bladder; it tries to expand radially; the braid resists radial expansion and converts it geometrically to axial contraction, exactly like skeletal muscle. The mechanism was invented in 1957 by Joseph McKibben, an Atomic Energy Commission physicist whose daughter had polio — he built the muscle as a powered orthosis for her hand. The design sat in research obscurity for thirty years until soft robotics revived it in the 1990s.

Watch the braid pitch angle change as the bladder pressurizes. Below ~54.7° (the "magic angle"), inflation produces axial contraction. Above it, axial extension. The McKibben sits below — pressurize, the diameter swells slightly, the braid threads rotate toward perpendicular, and the muscle shortens by 25–40%. Force-to-weight ratio matches or exceeds biological muscle, and the device is intrinsically safe — hit a hard stop and the pressure simply bleeds through the braid pattern.
DEFLATED · PRESSURE = 0 length = L₀ · Ø = d₀ pitch angle ≈ 30° INFLATED · PRESSURE = 4 BAR air length = 0.7L₀ · Ø = 1.3d₀ pitch angle → 50° (toward magic angle) contraction strain: 30% · force scales with pressure × cross-section

The McKibben's properties are remarkable on paper: contraction strains of 25–40%, force-to-weight ratios that match or exceed skeletal muscle, completely silent operation, intrinsically compliant by construction. Festo sells the Fluidic Muscle as a catalog product (model DMSP, since around 2002). The Shadow Robot Hand uses arrays of miniature McKibbens for finger actuation. Soft exosuits for stroke rehabilitation — Harvard's Wyss Institute, Roam Robotics — are built on McKibben arrays with low pressure and long stroke.

The next branch off the same tree is the fluidic elastomer actuator (FEA): a soft elastomer body with internal chambers that, when inflated, bend or twist in pre-programmed ways. The Harvard octopus arm (Whitesides and Wood, 2011) is the canonical demo. Soft Robotics Inc. (founded 2013, spun out of Whitesides' Harvard lab) commercialized FEAs as food-handling grippers — three or four soft fingers around a bell pepper, no force feedback, no machine vision required, just inflate and grip. The company was acquired by Berkshire Grey in 2022.

The catch is the catch all fluidic actuation has had for 70 years: you need an external pump. A compressor, a tank, hoses, valves — infrastructure that often weighs more than the actuators and tethers the robot to a cart. Untethered soft robots have been built, but they carry a CO₂ cartridge or a small compressor that dominates the mass budget. The McKibben branch has been waiting for a pump small enough to live inside the muscle for 25 years. We come back to that in §2.5.

2.3   HASEL — electrostatics squeezing dielectric fluid

The kilovolt branch

Christoph Keplinger's 2018 invention, Artimus Robotics' 2026 commercial path ↓ ANIMATED

HASEL — Hydraulically Amplified Self-healing ELectrostatic — was invented in 2018 at Christoph Keplinger's lab at the University of Colorado Boulder, building on three decades of dielectric elastomer work that started with Stanford's Ron Pelrine in the 1990s. The mechanism feels almost like a magic trick: take a flexible plastic pouch, fill it with a dielectric liquid, put two flexible electrodes on opposite sides of one region of the pouch, and apply 5–10 kilovolts across them. The electrodes attract each other electrostatically and "zip" together from one end inward, displacing the dielectric fluid into the rest of the pouch. Geometry choices turn that fluid displacement into linear contraction, expansion, or rotation.

Voltage off (left): the pouch is uniformly filled. Voltage on (right): the electrodes zip together at one end, fluid is displaced into the unconstrained portion, the pouch contracts axially. The "self-healing" name comes from the dielectric breakdown behavior — punch a tiny hole, the surrounding plastic film flows around it, and the device keeps working. Artimus Robotics is the Boulder spinout commercializing this.
VOLTAGE OFF · V = 0 flexible electrodes fluid distributed evenly · pouch at rest length length = L₀ VOLTAGE ON · V = 8 KV + fluid flows contraction fluid displaced rightward · pouch length = 0.8 L₀

HASEL is genuinely strange: electrically driven (no pump), silent, self-healing, and biologically muscle-like in its compliance. The trade-off is the kilovolt drive. Running a HASEL needs a small high-voltage DC-DC boost converter and a switching network, which adds weight and cost — though the actuator itself can be a sub-gram strip of plastic.

The commercialization vehicle is Artimus Robotics, a Boulder spinout founded in 2018 by Eric Acome and others from Keplinger's lab. The company is small — about seven employees, roughly $4.5M in mostly-grant funding from NSF, DOE, and the UK's ARIA agency, with eight filed patents. In February 2026 they announced their next-generation HASEL with more than twice the mechanical output of the previous version, fully encapsulated for safer integration into robotic systems, and are seeking partners across humanoid robotics and industrial automation. Keplinger himself moved to a Max Planck directorship in Stuttgart in 2021, splitting the lab between Boulder and Germany.

HASEL has a real path: humanoid finger and forearm actuators, where the silent, compliant, lightweight properties are worth the high-voltage complexity, and where the hand's volume is too small for traditional motor + harmonic drive at the per-finger torque levels needed. Whether it ships in the next humanoid generation or the one after is the open question.

2.4   Twisted-coiled polymer fibers — the Baughman surprise

Fishing line as artificial muscle

Ray Baughman's 2014 Science paper that took the field by surprise ↓ ANIMATED

In 2014, Ray Baughman's group at the University of Texas at Dallas published a finding the soft-robotics field genuinely did not see coming: ordinary nylon fishing line, when tightly twisted into a coil and heated, contracts by 30% or more along its length. No motor, no pump, no kilovolt supply — just heat. The mechanism is a quirk of polymer physics called anisotropic thermal expansion: the polymer's molecular chains are aligned along the fiber's length, so heating the polymer causes its chains to relax and the fiber expands radially while contracting axially. Once the fiber has been twisted into a tight coil, that small radial expansion is geometrically forced to manifest as large axial contraction of the coil — the same way that pulling on the diagonals of a Chinese finger trap shortens its overall length.

Three-stage construction. Stage 1: a straight fiber of nylon, polyethylene, CNT yarn, or shape-memory polymer. Stage 2: twist under tension until the fiber starts to want to coil. Stage 3: continued twisting forces the coil. Then heat applied — the fiber's diameter expands ~5%, the coil's length contracts ~30%. Inverse Chinese finger trap, in muscle form.
1 · STRAIGHT 2 · TWISTED 3 · COILED + HEATED precursor fiber nylon · CNT · PE · LCE twisted under tension heat heated → contracts 30%

The same trick works with several material families. Nylon fishing line (heated externally, simplest demo). Carbon nanotube yarns (electrically heatable, very low thermal mass, fastest cycle). Shape-memory polymers (different mechanism but same coiling-amplification trick). Conducting polymers driven electrochemically by a redox reaction. Liquid-crystal elastomers driven by light or heat. The Baughman lab has spent a decade taxonomizing the variants. Performance is impressive on paper: tensile strokes of 30%+, peak stress generation roughly 100× skeletal muscle, mechanical robustness measured in millions of cycles.

The catch is the catch that has dogged thermal soft actuators forever: thermodynamics is slow. Heating a fiber and waiting for it to cool sets a fundamental cycle-rate ceiling. CNT yarn muscles do better because their thermal mass is tiny, but the efficiency is brutal — most thermal artificial muscles dissipate 95%+ of input energy as waste heat. They're great when you want slow, silent, distributed actuation: morphing textiles, smart valves, prosthetic-finger curl, programmable facial expressions in animatronics. They're largely useless for legged locomotion or fast manipulation, where the joint cycles dozens of times per second.

Smart-textile applications are where this branch has actually shipped. Lintec of America (the US arm of Japan's Lintec, working closely with Baughman's lab) has commercialized CNT yarn artificial muscles for select industrial uses. The technology is also showing up in soft prosthetic and orthotic hands, and in research-grade morphing fabrics. The honest reading: this is the part of soft robotics that will probably end up inside clothing rather than inside robots.

2.5   Electrofluidic fiber muscles — the March 2026 closing of the McKibben loop

The pump moves inside the muscle

MIT + Politecnico di Bari, Science Robotics, six weeks before this guide was written ↓ ANIMATED

On March 25, 2026, soft robotics had its first genuinely uncomfortable moment in a long time. A team led by Ozgun Kilic Afsar at the MIT Media Lab and Vito Cacucciolo at Politecnico di Bari published in Science Robotics what reads, on first encounter, as a category violation: artificial muscle fibers that combine McKibben actuators with miniaturized electrohydrodynamic pumps in a sealed fluid loop, requiring no external reservoir, no compressor, no external pump of any kind.

The mechanism is genuinely new. An electrohydrodynamic (EHD) pump is a solid-state device that pumps liquid by injecting electric charge into a dielectric fluid and accelerating the resulting ions with a longitudinal electric field. There are no moving parts — no impeller, no diaphragm — just charge injection and field-driven flow. The MIT/Bari group built EHD pumps thin enough (~2 mm) and light enough (a few grams) to be part of the muscle fiber itself: a short pump segment in series with a McKibben segment in a closed loop, with the displaced fluid returning through a parallel channel. The muscle is electrically driven, untethered, and silent.

An antagonistic pair: two electrofluidic fibers sharing a closed dielectric fluid circuit. Apply voltage to fiber A's helical electrode pump section; ions accelerate through the dielectric, fluid flows from fiber B into fiber A's McKibben segment, A contracts, B extends. Flip polarity; the flow reverses. Nothing exits the system. Power density 50 W/kg — the same order as skeletal muscle.
FIBER A FIBER B EHD PUMP · helical electrodes McKIBBEN SEGMENT A contracts Vapplied = 6 kV closed dielectric fluid loop · no compressor, no external reservoir 50 W/kg · 20% strain · 0.3 s response · 900 kPa per meter of pump · 4 kg lift / 30 mm stroke (200× weight)

The numbers, importantly, are real. Power density of 50 watts per kilogram, comparable to skeletal muscle. Contraction strain of 20%. Response time of 0.3 second. Fiber pumps generate up to 900 kPa per meter of pump length. Demos include an antagonistic bundle that lifts 4 kg with a 30 mm stroke — about 200× its own weight. The paper's authorship list includes the Tangible Media group at MIT under Hiroshi Ishii (better known for haptic interfaces) and Cacucciolo's RoboPhysics Laboratory in Bari. The work was co-funded by a European Research Council grant.

This is what closes the McKibben loop. The 70-year tether problem — pump infrastructure heavier than the actuator — was solved, plausibly, last month. Whether it scales to humanoid-relevant force levels at acceptable efficiency is unproven. The Afsar/Cacucciolo paper shows fibers, pairs, and small bundles. A humanoid forearm needs hundreds of fibers in coordinated actuation, with thermal management, redundancy, and high-voltage drive electronics that aren't yet productized. But the fundamental impossibility — fluidic actuation without external infrastructure — has been removed from the impossibility list. Whether you read this as "soft robotics finally has a viable humanoid actuator" or "an interesting research result that needs a decade of engineering" depends mostly on temperament.

2.6   The other branches — DEAs, IPMCs, SMA, magnetic, hydrogels

The taxonomic completion

five more soft mechanisms that earn niche commercial use TAXONOMY

The four branches above cover the bulk of where the field's energy is going, but the soft tree is wider. For taxonomic completeness, five more mechanisms deserve a paragraph each.

Dielectric elastomer actuators (DEAs) are HASEL's parent technology. A thin elastomer film is sandwiched between two compliant electrodes; voltage compresses the film through the air and squeezes it out laterally, producing in-plane area expansion. Stanford's Ron Pelrine pioneered this in the late 1990s. DEAs achieve up to 380% area strain in the lab but are limited by dielectric breakdown and fatigue. They're the "pure" electrostatic soft actuator; HASEL is the engineered descendant that traded peak strain for mechanical robustness by adding the dielectric fluid.

IPMCs (ionic polymer-metal composites) bend in response to a few volts of applied current. The mechanism is electrochemical — a Nafion-like polymer plated with metal electrodes shifts ions when voltage is applied, swelling one side and contracting the other. Slow, weak, and biocompatible. The honest application space is biomedical — ingestible robots, microcatheter steering, where being soft and operating at low voltage matters more than force or speed.

Shape-memory alloys (SMAs) — primarily Nitinol, a nickel-titanium alloy — change crystal phase at a transition temperature, contracting by 4–8%. Heat the wire (electrically or otherwise) above ~70 °C, it shortens. SMAs have been commercial for decades in medical stents, orthodontic wires, and the wing-flap actuators on satellites; in robotics they're used where you need a small, simple, self-contained linear actuator and don't care about cycle speed.

Magnetic soft actuators embed magnetic micro- or nanoparticles in an elastomer and use external magnetic fields to deform the body in programmed ways. The Wood Lab at Harvard and Kim Lab at MIT have built capillary-scale magnetic robots that navigate cerebral blood vessels under fluoroscopy guidance. Niche, important, very early commercial.

Hydrogels swell or shrink in response to water, pH, or temperature. They are the slowest of the soft actuators (minutes to hours), and they require a wet environment. Their applications are almost entirely biomedical: drug-release scaffolds, soft contact lenses, implantable sensors that change conductance under physiological cues.

None of these will be the actuator of the next humanoid. All of them have real, live commercial applications somewhere — usually inside the body, on the skin, or at the smallest scales the rigid tree can't reach.

Synthesis

The four soft actuator industriesdifferent physics, different players, different geographies

The mistake to avoid: treating "soft robotics" as one industry that will gradually mature. It is at least four parallel industries, with different mechanisms, different academic ancestors, different commercial vehicles, and very different time horizons. Two of the four have shipped commercial product for over a decade. One is six weeks old as of this writing.

Pneumatic soft HASEL / electrohydraulic Twisted-coiled fiber Electrofluidic fiber
Origin 1957 (McKibben) 2018 (Keplinger lab, Boulder) 2014 (Baughman lab, UT Dallas) March 2026 (MIT + Bari)
Drive mechanism External compressor → bladder + braid kV electrostatic zipping of dielectric fluid Heat (electrical, photonic, electrochemical) EHD ion injection → closed-loop fluid pressure
Tether? Yes — compressor and hoses No — only HV electronics No — heat applied locally No — sealed fluid loop, electrically driven
Status Mature commercial Early commercial Niche commercial Research · weeks-old paper
Players Festo (DE), SMC (JP), Soft Robotics Inc. (US, now Berkshire Grey), Shadow Robot (UK), Roam (US) Artimus Robotics (US, ~7 ppl), Keplinger lab dual US/DE Lintec of America, Baughman lab (UT Dallas), Otherlab spinouts MIT Media Lab (US), Politecnico di Bari (IT), Shea lab (CH) on EHD pumps
Where it ships Factory grippers, soft exosuits, prosthetic hands Haptics, evaluation kits, demo humanoid fingers Smart textiles, prosthetic finger curl, animatronics Lab demos only
Geography DE/JP industrial, US Boston/Bay Area US Colorado, DE Stuttgart US Texas, JP/US joint US East Coast, IT South, CH

A humanoid robot shipping in 2030 will, plausibly, contain three rigid actuator industries (§1's harmonic-drive arms, QDD legs, planetary-geared manipulator wrists) and two soft actuator industries (HASEL or electrofluidic fingertips, pneumatic exosuit assists) operating side by side in one chassis. The "robot industry" framing in mainstream coverage is a category error five layers deep. The robot is a federation of substrates, and the most productive question to ask of any new robot announcement is: which actuator industry is this — actually?

§3 · Geometry of motion

Kinematics & mechanismjoints into chains, the math that decides what a robot can physically do

After the actuator, the next layer up is the kinematic chain — the geometric structure that turns local rotation into reaching a point in space. This layer doesn't sell. There's no Apple, no NVIDIA, no Stripe of kinematics. The mathematics was largely settled by the 1980s, the textbooks haven't changed much, and most working roboticists treat it as plumbing. That treatment is wrong. The kinematic topology decision — six-axis serial vs Delta vs Stewart, six DOF vs seven, parallel vs serial — determines a robot's workspace, its singularities, its speed, its precision, its dexterity, and its price ceiling. Like the actuator decision, it propagates upward through every layer above. This section walks the geometry from a single joint to a six-DOF arm to the parallel manipulators that invert the whole topology.

3.1   Joint primitives — every joint in every robot is one of two things

Revolute and prismatic

the alphabet from which every kinematic chain is spelled ↓ ANIMATED

A revolute joint rotates one body relative to another about a fixed axis. A door hinge. A human elbow (approximately). A robot shoulder. A prismatic joint slides one body along a linear axis. A drawer slide. A telescoping antenna. A 3D printer's gantry axis. Each contributes one degree of freedom — one independent way the joint can move. There are exotic joint types in mechanical engineering textbooks (universal, spherical, helical, cylindrical), but in working robotics, almost everything decomposes into chains of revolute and prismatic. Two letters in the kinematic alphabet, and every robot in the world is spelled from them.

A revolute joint rotates one rigid body about a fixed axis (left); a prismatic joint slides one rigid body along a fixed axis (right). Each contributes exactly one degree of freedom. Industrial robot arms typically use only revolute joints — six of them in serial chain. Cartesian gantries use only prismatic — three perpendicular axes. SCARAs mix: three revolute and one prismatic.
REVOLUTE · R · 1 DOF θ parameter: angle θ PRISMATIC · P · 1 DOF d parameter: distance d

The shorthand goes: an arm with all revolute joints is "RRRRRR" (six revolutes in serial — the standard industrial six-axis arm). A SCARA is "RRPR" (rotate, rotate, slide, rotate). A Delta robot is "3-RRR" parallel (three identical RRR chains converging on a moving platform). The notation is dry, but it's how kinematicians read off a robot's topology at a glance, the way an electrical engineer reads "R-LC" off a circuit.

The kinematic vocabulary also has spherical (S, 3 DOF — like a ball in a socket), universal (U, 2 DOF — two perpendicular revolutes at a point), and cylindrical (C, 2 DOF — coupled revolute + prismatic). These appear in parallel manipulators (the Stewart platform's struts end in S-P-S chains) but rarely in serial arms, because each of them is mechanically equivalent to a small chain of R and P joints, and machinists prefer building chains from joints they can manufacture cleanly.

3.2   The kinematic chain — why six joints, and where they go

Three for position, three for orientation

the math behind why six axes became sacred in industrial robotics ↓ ANIMATED

To position a rigid body anywhere in 3D space with any orientation, you need exactly six independent parameters: three for position (X, Y, Z) and three for orientation (roll, pitch, yaw — or any other three-angle parameterization). This is a theorem of rigid body mechanics, not a robotics convention. Six is the magic number because three-dimensional space plus orientation is six-dimensional. A kinematic chain with fewer than six joints cannot reach every pose in its workspace; one with more has redundancy — multiple joint configurations map to the same end-effector pose, which is sometimes a feature and sometimes a complication.

A six-DOF serial arm reaching toward a target. Watch the joints add capability one at a time. The first three (waist, shoulder, elbow — the arm's "regional" joints) place the wrist in the right region of space. The last three (the wrist roll/pitch/yaw — the "orientation" joints) point the end-effector at the target with the correct orientation. This division — three regional, three local — is universal across industrial six-axis arms.
J1 waist J2 J3 elbow J4 J5 J6 target pose three regional joints J1 J2 J3 — place the wrist three wrist joints J4 J5 J6 — orient the tool 3+3 = 6 DOF · sufficient for arbitrary pose

The "three regional + three wrist" decomposition is also why the strain-wave gear from §1 and the QDD from §1 can coexist in the same humanoid. Regional joints (waist, shoulder, elbow) move slowly and need high torque — perfect for QDD. Wrist joints (small, fast, precise) need zero backlash and high stiffness — perfect for harmonic drives. The actuator industry split that §1 closed with is the kinematic split this section opens with.

Six DOF became a near-universal industrial convention by the late 1980s. Fanuc, ABB, KUKA, Yaskawa, Kawasaki, Universal Robots — every general-purpose industrial arm in the world has six revolute joints in serial chain. The exception that proves the rule is the seven-DOF redundant arm — Franka Emika's Panda, KUKA's LBR iiwa, Kinova's Gen3 — built specifically for human collaboration, where the extra joint lets the elbow swing around an obstacle without the wrist losing position.

3.3   Forward & inverse kinematics — the easy direction and the hard direction

From angles to space, and back

one direction is matrix multiplication, the other took fifty years to solve generally PHYSICS

Forward kinematics answers: given the joint angles, where is the end-effector? Compose the transformation matrix of each joint, multiply them in chain order, read off the resulting position and orientation. For a six-DOF arm this is a few matrix multiplies — milliseconds on any processor. Settled by Denavit and Hartenberg's 1955 convention. The hard part is the bookkeeping, not the math.

Inverse kinematics answers the reverse: given a desired end-effector pose, what joint angles get you there? This is genuinely hard. The answer is non-unique — most reachable poses have multiple joint configurations that satisfy them ("elbow up" vs "elbow down" vs "shoulder flipped"). The answer might not exist (target outside the workspace). The mapping has singularities (configurations where derivatives blow up). And the analytic solution depends on the specific arm geometry — most six-DOF arms with a spherical wrist (J4, J5, J6 axes intersecting at one point) have a closed-form solution; most others don't and require iterative numerical methods.

A two-link planar arm reaching the same point with two different joint configurations — "elbow up" and "elbow down". Forward kinematics maps each configuration to the same end-effector position. Inverse kinematics, asked the reverse question, must choose: which configuration? The choice is the kinematicist's problem, and good control software picks based on continuity (don't suddenly flip elbows) and obstacle avoidance.
FORWARD KINEMATICS · ANGLES → POSE θ₁ = 50° θ₂ = -90° ∏ Tᵢ matrix multiply x, y θ P one input → one output INVERSE KINEMATICS · POSE → ANGLES x, y θ solve non-unique {50°, -90°} {-25°, +90°} elbow up elbow down P one input → multiple solutions (or none)

Inverse kinematics for industrial six-axis arms with a spherical wrist was solved analytically by Pieper in 1968 — the wrist's three intersecting axes let the problem decouple into "where do you put the wrist" (regional inverse, 3-DOF) and "how do you orient it" (wrist inverse, also 3-DOF). This is one of the deepest reasons six-DOF arms with spherical wrists became universal: they admit closed-form IK solutions, and closed-form IK solves at full robot control rates (1 kHz) without any iterative method.

For arms without spherical wrists — most humanoids, redundant arms, mobile manipulators — the IK problem is solved iteratively, often with damped least-squares or with optimization (CasADi, Drake, KDL). The Jacobian is the matrix at the heart of all of these methods, which is the next subsection.

3.4   The Jacobian — the most important matrix in robotics

Velocities, ellipses, manipulability

the matrix that maps how fast the joints move to how fast the end-effector moves ↓ ANIMATED

The Jacobian is the matrix that relates joint velocities to end-effector velocities at a given configuration. If you spin joint 1 at 1 rad/s, how fast does the end-effector move in X? Y? Z? Roll? Pitch? Yaw? The answer is the first column of the Jacobian. The full matrix tells you, for every joint, how its velocity contributes to the end-effector's velocity. The Jacobian is a local linearization of the forward kinematics — it changes as the arm moves.

Why does it matter? Three reasons. First, it's how you do velocity control — given a desired end-effector velocity, invert the Jacobian to get the joint velocities that produce it. Second, it's how you do force control — by the principle of virtual work, the joint torques required to produce an end-effector force are the Jacobian transpose times that force. Third, it tells you when the arm is in a singular configuration — when the Jacobian loses rank, certain end-effector motions become impossible no matter how you spin the joints.

A two-link planar arm, with the manipulability ellipse drawn at the end-effector. The ellipse shows how a unit-magnitude joint velocity vector (any combination of θ̇₁ and θ̇₂) maps to end-effector velocity space. Long axis = direction the arm moves easily; short axis = direction it moves with mechanical disadvantage. Watch the ellipse rotate and squash as the arm sweeps through configurations. When the arm is fully extended, the ellipse collapses to a line — that's an elbow singularity.
manipulability ellipse = image of unit joint-velocity sphere under Jacobian J v_ee = J(q) · q̇ end-effector velocity = Jacobian × joint velocity long axis: direction arm moves easily short axis: mechanical disadvantage direction when fully extended → ellipse collapses to a line → singularity

The size of the manipulability ellipse is the manipulability index — a scalar measure of how dexterously the arm can move at the current configuration. Large ellipse = arm moves easily in many directions. Small ellipse = arm is near a singularity, mechanically disadvantaged. Modern motion planners use manipulability as a cost function: when there are multiple inverse kinematics solutions, prefer the one whose manipulability ellipse is largest in the direction of the next intended motion.

If you remember one matrix from robotics, remember the Jacobian. Velocity control, force control, redundancy resolution, singularity avoidance, force/position hybrid control — every modern manipulation algorithm runs through it. Tomas Lozano-Pérez at MIT, Oussama Khatib at Stanford, Roy Featherstone at ANU — the field's foundational figures all built their careers around what the Jacobian and its variants tell you about a robot.

3.5   Singularities & redundancy — what breaks, and why a 7th joint helps

The wrist singularity, and how Franka escapes it

why Franka Emika's 7-DOF Panda costs three times what a 6-DOF cobot costs ↓ ANIMATED

A singularity is a configuration where the Jacobian loses rank — meaning the arm has six joints but the end-effector can only move in five (or fewer) directions. There are three classes. Wrist singularity: when joints 4 and 6 align (their axes become collinear), the wrist temporarily has only 2 effective rotational DOF instead of 3. Elbow singularity: when the arm is fully extended, the end-effector can no longer move radially outward. Shoulder singularity: when the wrist passes directly over the base axis, J1 becomes ineffective.

Watch a six-DOF arm sweep through a wrist singularity. As joints 4 and 6 align, the manipulability ellipsoid (drawn here as just an ellipse for clarity) collapses along one axis — the arm can still rotate the tool, but only in a degenerate plane. To pass through the singularity, the controller must either accept very high joint velocities (which exceed actuator limits) or use damped least-squares to gracefully reduce end-effector tracking accuracy near the singularity.
SINGULARITY J(q) loses rank manipulability → 0 +J7 6-DOF arm · passes through singularity 7-DOF arm · swings elbow around it

A seven-DOF arm has redundancy with respect to the standard six-DOF task. It can hold the end-effector at a fixed pose while the elbow swings around an obstacle (or away from a singularity). This is exactly what humans do when reaching past a coffee cup to grab a sandwich — the wrist stays put, the elbow moves. Six-DOF industrial arms can't do this; seven-DOF redundant arms can. Franka Emika's Panda (Munich, 7-DOF, ~$30K) is the canonical example. KUKA's LBR iiwa (Augsburg, 7-DOF, ~$80K) is the industrial-grade version. Kinova's Gen3 (Montreal) is the lighter cobot equivalent.

Redundancy isn't free. Seven joints means more actuators, more cabling, more cost, more failure modes, and a harder inverse-kinematics problem (the IK now has a one-dimensional null space — a continuous set of valid configurations rather than a finite set). The mathematical machinery to resolve the null space gracefully — pseudoinverse Jacobians, weighted least squares, gradient projection methods — is mature, but it's still computationally heavier than a closed-form 6-DOF IK. Seven-DOF arms are a tax you pay for safety and dexterity in human-shared spaces. The tax is roughly 2–3× the price of an equivalent 6-DOF arm.

3.6   Serial vs parallel — the topological inversion

The Stewart platform and the Delta robot

what happens when you flip the chain inside-out ↓ ANIMATED

Every kinematic chain we've discussed so far has been serial: a single chain of links, base at one end, end-effector at the other. Each joint contributes its DOF, errors stack along the chain, and the structure cantilevers — meaning the base joint carries the full weight and torque of everything above it. The largest industrial six-axis arms can handle 1000 kg payloads, but they weigh 5 tons and the base joint is enormous.

The parallel manipulator inverts this. The end-effector is a moving platform, supported by multiple independent chains running in parallel from the base. Each chain shares the load. Errors don't stack — they're constrained by the geometric closure of the platform. Stiffness is far higher per kilogram. The trade-off: workspace shrinks dramatically (the legs interfere with each other), and the math gets harder (forward kinematics becomes the hard direction now; inverse is easy).

Two ways to move a moving platform to a target. Left: a six-DOF serial arm — one chain, cantilevered, large workspace, low stiffness. Right: a Delta robot — three parallel kinematic chains converging on a triangular platform. The Delta moves dramatically faster (300 picks per minute is routine) and stiffer, but its workspace is a small cone-shaped volume, and it can only translate the platform — orientation is fixed.
SERIAL · CHAIN · 6-DOF ARM large workspace ~1.5m radius cantilever · errors stack · base carries everything PARALLEL · 3-RRR DELTA small workspace, ~30 cm cone 300 picks/min · high stiffness · fixed orientation

The Stewart platform (D. Stewart, 1965, originally for tire-testing rigs) is the canonical six-DOF parallel manipulator: six prismatic struts connecting a fixed base to a moving platform via spherical and universal joints. Flight simulators use Stewart platforms to throw a 5-ton cockpit through 6-DOF motion at high frequency — no serial arm could do this. Machine tool spindles use them for stiffness. Telescope mounts use them for fine pointing.

The Delta robot (Reymond Clavel, EPFL Lausanne, 1985) is the three-DOF translational parallel manipulator — three identical chains converging on a moving platform that can only translate (no rotation). ABB's FlexPicker (IRB 360) commercialized it in 1999 and dominated high-speed pick-and-place ever since: 300 picks per minute is routine, 600+ achievable. The 2010s consumer-electronics boom built on Delta robots and SCARA arms running 24/7. Codian Robotics (Netherlands), Adept Quattro, and a wave of Chinese builders now compete in this space.

SCARA (Selective Compliance Assembly Robot Arm) is the four-DOF serial-parallel hybrid: three revolute joints in a horizontal plane plus a vertical prismatic. Hiroshi Makino at Yamanashi University invented it in 1981. It became the workhorse of electronics assembly because it's fast, cheap, and the horizontal plane is exactly where most pick-and-place happens. Epson, Yamaha, Mitsubishi, and Omron dominate SCARA today. Topology is destiny: pick the right kinematic structure for the workload, and a $30K SCARA out-performs a $300K six-axis arm at the task it was made for.

Synthesis

Topology is destinymatching kinematic structure to workload

The pre-AI architecture decision in any robot project: pick the kinematic topology before anything else. Get it wrong and no amount of clever software recovers the mismatch. Get it right and the rest of the engineering is in service of the geometry. The seven canonical topologies map to seven distinct industrial niches.

Topology DOF Workspace Speed Where it ships Dominant suppliers
6-DOF serial arm 6 Large hemisphere Moderate General industrial: weld, paint, assemble Fanuc (JP), ABB (CH/SE), KUKA (DE), Yaskawa (JP)
7-DOF redundant arm 7 Same as 6-DOF + null-space dexterity Moderate Cobots, surgery, lab automation Franka Emika (DE), KUKA LBR iiwa (DE), Kinova (CA)
SCARA 4 Cylindrical, planar fast High in plane Electronics assembly, packaging, dispensing Epson (JP), Yamaha (JP), Mitsubishi (JP), Omron (JP)
Delta (3-RRR parallel) 3 Small inverted cone Very high (300+ pick/min) Food, pharma, electronics pick-and-place ABB FlexPicker, Codian (NL), Adept (US, now Omron)
Stewart platform (6-UPS) 6 Small workspace, full DOF High, very stiff Flight simulators, machine tools, telescope mounts Bosch Rexroth, Moog (US), specialty builders
Cartesian / gantry 3 (P-P-P) Rectangular volume Moderate 3D printers, CNC, large-volume assembly Custom, IGUS, Adept
Humanoid biped ~25–30 Locomotive — unbounded by chain Slow today Demos, factories, eventually homes Boston Dynamics, Tesla, Figure, 1X, Apptronik, Unitree

A humanoid robot in 2030 will be a federation of these topologies inside one chassis: bipedal multi-chain serial for the legs, two redundant 7-DOF serial arms for manipulation, parallel-actuated waist (the Stewart-platform inversion has shown up in the latest Boston Dynamics Atlas), and SCARA-like wrists optimized for in-plane fine motion. The kinematic design space is wider than any one company's mental model. Reading any new humanoid announcement, the most useful question to ask before "what AI runs it" is "what kinematic chain does it implement, and why."

§4 · The input layer

Sensorsfive industries the robot needs at once

The actuator decides what's possible. The sensor decides what's perceivable. Between them sits the controller (§5), and around them sits the loop that makes a robot a robot rather than a machine. This section walks the input layer — the transducers, physical-to-electrical converters, that the rest of the stack reads from. We deliberately stop at the transducer. What's done with the sensor data — SLAM, scene understanding, vision-language-action models, the algorithmic stack that turns pixels and point clouds into beliefs about the world — is §8 Perception. Sensors are hardware here. Algorithms come later.

4.1   Encoders — the most fundamental robot sensor

Two principles, one job

knowing where the joint is, with arc-second precision ↓ ANIMATED

Before a robot can do anything intelligent, it has to know where its joints are. The encoder answers that question, and almost every joint in every robot in the world has one. Two flavors. Incremental encoders emit a pulse train as the shaft rotates — count pulses to track motion, but power-cycling loses absolute position and you have to "home" the joint on startup. Absolute encoders report the actual angle directly, even after a power cycle, by encoding position into a multi-track binary or single-track Vernier pattern read all at once.

Optical encoders (left) shine an LED through a glass disc with etched radial slits onto a photodiode — count crossings to track rotation, generating a pulse train. Magnetic encoders (right) read the field angle from a small magnet on the shaft directly — output is the angle itself, continuous and absolute. Heidenhain dominates the precision optical tier; AMS and RLS lead the chip-scale magnetic tier that ships inside every modern QDD actuator.
OPTICAL · HEIDENHAIN/RENISHAW LED photodiode output incremental pulses · 4096 per revolution MAGNETIC · AMS / RLS N S HALL IC AS5048 · 14-bit angle θ absolute angle · continuous output

Three sensing principles dominate. Optical encoders (Heidenhain in Germany, Renishaw in the UK — the precision tier) shine an LED through a glass disc with etched lines, count line crossings with a photodiode array, and reach 28-bit resolution at the top end. The moat is the glass — Heidenhain has spent five decades perfecting line-pitch tolerance on chrome-on-glass scales. The technology survives on factory floors but doesn't survive shock. Magnetic encoders (AMS in Austria, RLS in Slovenia, the AS5048 chip family) put a small magnet on the shaft and read its field with a Hall-effect or AMR sensor on the PCB, hitting 14–16 bits at a tenth the cost. They survive shock and vibration and are now standard inside every QDD actuator on every legged robot. Capacitive encoders are the rising third option for cost-sensitive integrations.

The encoder you cannot see is the one inside the motor itself. Modern QDD actuators integrate a magnetic encoder onto the same PCB as the motor controller — it reads rotor angle for commutation, joint angle for position feedback, and sometimes joint torque indirectly via current sensing. One chip, three jobs, all because the sensor sits in the right place.

4.2   IMUs — which way is up

Coriolis force in a 2 mm² die

the same physics that deflects ocean currents, miniaturized PHYSICS

An Inertial Measurement Unit is three sensors in one chip: a 3-axis accelerometer, a 3-axis gyroscope, and (often) a 3-axis magnetometer. It tells the robot which way gravity points, how fast it's accelerating, how fast it's rotating, and roughly which compass direction it's facing. The MEMS revolution made this cheap. A modern accelerometer is a microscopic proof mass suspended on silicon springs, with comb fingers that change capacitance as the mass deflects under acceleration. A gyroscope drives a proof mass into oscillation along one axis, then measures the Coriolis force that appears on the perpendicular axis when the chip rotates.

A MEMS gyro's proof mass is driven into oscillation along one axis (cyan, vertical). When the chip rotates about the third axis, Coriolis force pushes the oscillating mass sideways (orange, horizontal). Comb-finger capacitors detect the perpendicular displacement; the signal magnitude reads angular rate. Same physics that deflects winds and ocean currents, in a 2 mm² die.
MEMS GYRO · TOP-DOWN VIEW drive Coriolis sense Ω rotation rate drive ⊥ sense ⊥ rotation · three orthogonal axes inside one chip

Three price/performance tiers, with roughly 1000× separation between adjacent ones. Consumer-grade (Bosch BMI270, TDK InvenSense ICM-42688, ST LSM6DSO): $1–$5, gyro bias drift ~10°/hour. Phones, drones, every humanoid robot. Tactical-grade (Honeywell HG1700, KVH 1750): $1,000–$10,000, drift ~1°/hour. Missile guidance, surveying, marine. Navigation-grade (Northrop Grumman LN-100G, Honeywell HG9900, Thales ring laser gyros): $50,000–$500,000, drift <0.01°/hour. Submarines, ICBMs, ships that need to hold position without GPS for weeks.

Even the best IMUs drift. Integrate noisy acceleration once for velocity, twice for position, and the error accumulates without bound. So every robot fuses IMU data with something else that doesn't drift — encoders, cameras, GPS, LIDAR — through Kalman-style filtering. The most underrated fact about IMUs is that nobody trusts them alone, and the entire field of state estimation exists because of this.

4.3   Force / torque sensors — feeling what the joint feels

The Wheatstone bridge

turning microns of deflection into millivolts of signal ↓ ANIMATED

A robot needs to know not only where its joints are but what they're pushing against. The fundamental sensing element across both architectures below is the strain gauge Wheatstone bridge: four resistors arranged in a diamond, two stretching and two compressing under applied force. The differential voltage across the bridge converts microns of mechanical deflection into millivolts of electrical signal, with parts-per-million sensitivity.

Force applied to the flexure bends it slightly; gauges R1 and R3 (in tension) increase resistance, while R2 and R4 (in compression) decrease. The Wheatstone bridge cancels common-mode drift and amplifies the differential, producing a clean millivolt output proportional to applied force. This is the primitive that lives inside every joint torque sensor, every 6-axis F/T sensor, every load cell.
FLEXURE · strain gauges bonded to surface R1↑ R3↑ R2↓ R4↓ F deflection: ~10 µm at full scale tension on top, compression on bottom WHEATSTONE BRIDGE R1 R2 R3 R4 V+ V− A B V_AB millivolt output ∝ applied force

Two architectures dominate. Joint torque sensors sit in series with the actuator output, measuring strain on a flexible disc. Resolution: typically 0.01–0.1 Nm in joints handling 100+ Nm. Every Franka Panda, Kuka iiwa, and most modern collaborative robots have these in every joint. 6-axis F/T sensors sit at the wrist between arm and end-effector, measuring three forces and three torques simultaneously through a Maltese-cross flexure with 6 or 8 bridges, calibrated by a 6×6 matrix. ATI Industrial Automation (North Carolina, 1989) has owned this market for thirty years; their Mini40 and Gamma sensors are the de-facto reference for academic robotics. Bota Systems (Swiss spinout, ~2020) is the newer entrant targeting humanoid integration. Robotous (Korea) and OnRobot (Denmark) round out the credible alternatives.

The catch with F/T sensing is bandwidth-vs-noise. Strain gauges drift with temperature, the bridge's millivolt signals are easily corrupted by motor EMI, and the calibration matrix needs annual recharacterization. A good 6-axis F/T sensor costs $5,000–$15,000. A humanoid that wants them on both wrists and both ankles is committing $20,000–$60,000 to F/T sensing alone — which is one reason some humanoid programs replace them with current-sensing-based torque estimation through the QDD actuators (cheaper, less accurate).

4.4   Tactile sensors — the unsolved bottleneck

Five competing technology branches

the active research frontier and the honest bottleneck for dexterous manipulation TAXONOMY

Tactile sensing is where the academic energy and the venture capital are flowing in 2026, because no robot manipulates dexterously without it. Five competing branches, no clear winner.

Five fingertip cross-sections, same scale. Each branch trades resolution against speed against robustness against cost. The vision-based gel sensors (GelSight, Digit 360) lead on spatial resolution; magnetic and barometric lead on robustness; capacitive and piezoresistive lead on response speed. The right choice depends entirely on what the robot is supposed to grasp.
VISION GelSight camera 0.1 mm spatial ~30 Hz · 1 mN Meta · GelSight Digit 360 · 2024 MAGNETIC ReSkin MAG magnetometer ~3 mm spatial ~100 Hz · robust Meta FAIR open-source CAPACITIVE array grid ASIC readout chip ~1 mm spatial ~1 kHz · fast Pressure Profile Systems · 15 yr PIEZORESISTIVE resistor grid ASIC scan rows + cols ~0.5 mm spatial ~500 Hz · fatigue Tekscan, XSensor force-sensing pads BAROMETRIC pressure cap trapped air BAR ΔP single-cell no spatial ~1 kHz · cheap FingerVision SynTouch (orig.)

The vision-based branch is the most active. The GelSight family — originated in Edward Adelson's MIT lab, commercialized by GelSight Inc. (CEO Youssef Benmokhtar) — uses a transparent silicone gel coated with reflective paint, illuminated from inside by colored LEDs, with a camera below capturing micron-level deformation under contact. The Meta–GelSight Digit 360, announced October 2024, packs 18+ sensing modalities into a fingertip-shaped puck and detects forces as small as 1 millinewton. High resolution; slow response (camera-framerate-limited); bulky.

The other branches occupy different points on the trade-off surface. Magnetic sensors (Meta FAIR's open-source ReSkin) embed magnetic micro-particles in elastomer and read field changes — cheap, robust, deliberately open. Capacitive arrays (Pressure Profile Systems) have shipped into PR2, Barrett hands, and academic platforms for fifteen years. Piezoresistive arrays (Tekscan, XSensor) trade fatigue life for sub-mm spatial resolution. Barometric sensors are the cheapest robust option for industrial grippers and prototype humanoid hands.

Status check on humanoid integration: Figure 03 features palm cameras and fingertip sensors detecting 3-gram forces; Tesla Optimus Gen 3 has tactile sensing in all fingers; almost every humanoid program has rolled its own fingertip sensor stack. No clear winner has emerged, and probably won't — different grasping tasks pick different sensor branches, and a future humanoid hand is likely to combine two or three of these modalities into a multi-layer "skin" rather than choosing one.

4.5   Range sensors — depth perception hardware

LIDAR, depth cameras, and the vision-only debate

the contested architectural decision in 2026 humanoid robotics ↓ ANIMATED

A robot also needs to know where the rest of the world is. Five technologies divide this layer. Spinning LIDAR sweeps a laser beam mechanically through 360° and measures time-of-flight per return — Velodyne–Ouster (US), Hesai (China), Robosense (China). MEMS / solid-state LIDAR steers the beam with a tiny mirror or optical phased array — Livox Mid-360, the canonical humanoid-friendly LIDAR at ~$700, ships on Figure 01. FMCW LIDAR measures range and velocity per pixel using frequency modulation — Aeva, SiLC, the next-generation alternative. Stereo / structured-light depth cameras — Intel RealSense D-series, Microsoft Kinect lineage, Orbbec — are cheap and indoor-only at <2 m range. Time-of-flight depth cameras use Sony IMX-series sensors in the Kinect successors and many humanoid heads.

A spinning LIDAR sweeps a laser beam through 360°, with each return generating a single 3D point. Over one rotation (~100 ms typical), tens of thousands of points accumulate into a sparse map of room geometry. The point cloud is what's actually delivered to the perception stack — the LIDAR doesn't see "walls" or "doors," it just emits points where the laser hit something.
LIVOX MID-360 10 Hz · 200,000 pts/s RESULTING POINT CLOUD obstacle ~30,000 points per rotation · room geometry inferred from sparse returns

The interesting current-state question is whether humanoids need LIDAR at all. Tesla's vision-based perception system operates without LIDAR — Optimus carries 8 cameras and an FSD-derived neural stack; 8 cameras generating over 576 megapixels of data per second. Figure 01 uses a multi-sensor approach combining the Livox Mid-360 LIDAR with Intel RealSense D435i depth cameras for omnidirectional environmental perception. Agility's Digit has a spherical sensor head with LIDAR, depth cameras, and IMUs combined.

The vision-only-vs-fusion bet is one of the actively contested architectural decisions in humanoid robotics in 2026. Tesla's argument is that humans navigate with vision alone, that LIDAR adds cost and reliability burden, and that camera-fed neural nets benefit from the same data scale and infrastructure that drives Tesla's automotive FSD program. The counter-argument is that LIDAR works in low light, sees through reflective and transparent surfaces that confuse vision, and dramatically simplifies SLAM. We return to this in §8 Perception, where the algorithmic stack that consumes these sensors lives.

4.6   The per-robot sensor budget — what's actually mounted, where

The federation, drawn

every sensor type, plotted at its actual mounting location on a 2026-class humanoid DIAGRAM

The five-industry framing in the closing case study reads as taxonomy. The same content drawn on a humanoid silhouette reads as architecture. Below is the sensor manifest of a notional 2026 humanoid — counts and locations roughly match the publicly disclosed configurations of Figure 03, Tesla Optimus Gen 3, and Apptronik Apollo, with a vision-only variant overlayed on the same chassis to show the LIDAR architectural choice.

A 2026-class humanoid carries roughly 30 encoders at every joint, 2–6 IMUs distributed across head and torso, 4 6-axis F/T sensors at wrists and ankles, 30+ tactile pixels across fingertips and palms, 6–8 cameras, and sometimes a LIDAR on top of the head. The visual density is the argument: a single robot is sourcing from five sensor industries simultaneously, and the procurement choices made here propagate upward into perception (§8) and out into the price tag.
LIDAR (optional) SENSOR MANIFEST Encoders ~30 · every joint $1k IMUs 2–6 · head, torso, limbs $30–200 F/T sensors 4 · wrists + ankles · 6-axis $20k–60k Tactile ~30 fingertips + palms $0.5k–50k Cameras 6–8 · head, palms, torso $200–2k LIDAR 0–1 · head · architectural choice $0 or $700+ TOTAL SENSOR BUDGET Vision-only · cost-tier: ~$3k Multi-sensor · premium: ~$80k+ CONTESTED IN 2026 Tesla Optimus · Apptronik: 8 cameras · no LIDAR Figure · Agility · 1X: Cameras + LIDAR + depth → §8 Perception 2026-CLASS HUMANOID 28-DOF · 5'8" · vision + tactile + F/T

Two facts the diagram makes legible at a glance. Encoders are everywhere — every joint has at least one, and a 30-DOF humanoid has 30 of them, plus another 10–20 if the fingers are independently encoded. They're cheap and invisible, which is why the casual observer thinks of "robot sensing" as cameras and LIDAR while the actual sensor count is dominated by encoders ten-to-one. F/T sensors are the budget swing. A 4× ATI 6-axis F/T integration is a $20–60k procurement decision; replacing them with current-sensing torque estimation through the QDD actuators (the Tesla approach) drops that cost to nearly zero, at the price of accuracy and bandwidth. The cheap robot and the expensive robot can have nearly identical sensor diagrams at this resolution — the differences are in the parts numbers, not the topology.

Synthesis

The five sensor industrieseach with its own physics, players, and moat

Same framing that worked in §1 and §2: "the sensor industry" is a category error. There are at least five, with different physics, different players, different moats, different geographies. A humanoid robot in 2026 carries all five, simultaneously procured into one chassis.

Encoders IMUs Force/Torque Tactile Range
Substrate Glass + photodiode (optical) or magnet + Hall IC MEMS proof mass with capacitive comb fingers Strain gauge Wheatstone bridge on flexure Gel + camera, magnetic, capacitive, piezoresistive, barometric Time-of-flight or FMCW laser, structured light, stereo
Where it lives Every joint, integrated into actuator Chassis, head, sometimes each limb Each joint and/or wrists/ankles Fingertips, palms, forearms (some) Head; sometimes torso or none
Dominant suppliers Heidenhain (DE), Renishaw (UK), AMS (AT), RLS (SI) Bosch (DE), TDK InvenSense (US), STMicro (FR/IT); KVH and Northrop for tactical ATI Industrial Automation (US), Bota (CH), Robotous (KR), OnRobot (DK) GelSight + Meta (US), Pressure Profile Systems (US), Tekscan (US); fragmented Velodyne–Ouster (US), Hesai (CN), Robosense (CN), Livox/DJI (CN), Intel/Sony (depth)
The moat Manufacturing tolerance on glass scales (optical) or fab know-how (magnetic IC) Foundry-scale MEMS process; tactical/nav-grade is calibration + qualification Multi-axis calibration matrices; 30-yr customer trust Open research; no clear winner; IP is the leverage Optical engineering + lasers; for FMCW, photonic IC integration
Geography DE/UK/AT premium; CN cost tier rising DE/US/JP/FR consumer; US-heavy at the top tiers US-dominant (ATI); CH/KR/DK challenging US-academic origin, fragmented; Meta is the integrator US/CN duopoly; CN aggressive on cost
Status Mature · stable · slow-moving Mature consumer · slow movement at top Mature · stable supplier mix Active research → early commercial Fast-moving · contested architecture
Per-robot cost $15–$50 × ~30 joints = ~$1,000 $3–$30 × 1–6 IMUs = $30–$200 $5,000–$15,000 × 4 wrists/ankles = $20,000–$60,000 Highly variable: $50–$5,000 per fingertip × ~10 $700 (Livox) to $10,000 (premium spinning) or $0 (vision-only)

The per-robot sensor budget tells a story the actuator section already prefigured. A $20,000 humanoid (Tesla's target) cannot afford ATI 6-axis F/T sensors at every wrist and ankle, nor a $10,000 spinning LIDAR. A $250,000 Agility Digit can. The sensor procurement decisions made at this layer propagate upward into what perception is possible (§8), what manipulation is reliable (§9), and what the robot can be sold for. Like actuators, sensors are a federation of substrates, not one industry — and reading any humanoid spec sheet by sensor budget tells you more about the program's design philosophy than any number of demo videos.

§5 · The loop

Controlfive timescales running at once

The actuator decides what motion is possible. The sensor decides what's measurable. Between them sits the controller — the algorithm that closes the loop, reading sensors at one frequency and commanding actuators at another, doing this 100–10,000 times per second for the rest of the robot's operating life. Without this loop, a robot is dead; the actuators sit there, the sensors emit data, nothing intelligent happens. As with §1–§4, the framing that matters: there isn't one "control system." There are at least five layers, each running at a different timescale, each solving a different problem, each requiring its own hardware substrate to meet its deadlines.

5.1   PID — the 100-year-old workhorse

The most-deployed control algorithm in human history

three terms, eight characters of math, runs everything from kettles to humanoids ↓ ANIMATED

The Proportional–Integral–Derivative controller is the most-deployed control algorithm in human history — running in cruise control, reactor neutron flux, the temperature loop in a coffee machine, and the current loop of every humanoid robot's actuators. The math is elementary. Take a setpoint (where you want the system to be), subtract the measured value (where it actually is), and you have an error. Multiply that error by three constants and add: P (proportional to current error), I (proportional to integrated past error), D (proportional to error's rate of change). Sum the three. That's your control output.

The signal flow of a PID loop. Setpoint enters, measured output is subtracted to produce error, error fans out to three parallel terms (P scales it, I integrates it, D differentiates it), the three terms sum into a control signal that drives the plant, and the plant's output feeds back to close the loop. Each term addresses a failure mode: P alone leaves a steady-state offset, I fixes the offset but introduces oscillation, D damps the oscillation by anticipating the slope.
SETPOINT + e(t) Kp PROPORTIONAL Ki INTEGRAL Kd d/dt DERIVATIVE Σ u(t) PLANT motor / joint y(t) FEEDBACK step response:

Tuning the three gains is the entire art — Ziegler-Nichols heuristics from 1942, Cohen-Coon, the modern auto-tuners that ship inside every motor driver. The history runs deeper than most engineers realize: Nicholas Minorsky published the first PID design in 1922, applied to USS New Mexico's automatic ship steering. The same algorithm 104 years later runs in the current loop of every humanoid robot's actuators.

The honest reading: PID is not a sophisticated controller, and most modern robotics textbooks introduce it almost apologetically. But it's everywhere because it works on systems whose dynamics aren't well-modeled, requires no model at all to deploy, and degrades gracefully when conditions change. Every more sophisticated control technique below is, at the lowest level, still calling PID loops as primitives.

5.2   Cascaded control & field-oriented control — three loops, three timescales

The nesting that mirrors the physics

position outside, velocity middle, current inside — each ten times faster than the one above ↓ ANIMATED

A robot joint doesn't run a single PID loop. It runs three, nested. The innermost loop is current control — given a torque commanded by the layer above, what current must flow through the motor windings? This runs at 10–20 kilohertz on a dedicated microcontroller inside the motor driver, using a transformation called Field-Oriented Control that mathematically rotates the 3-phase AC signal of a BLDC motor into a 2-axis DC representation, where it can be PID-controlled trivially. (Park and Clarke transforms, 1929 and 1943 respectively, are why this works.) The middle loop is velocity control at ~1 kHz. The outer loop is position control at 100–1000 Hz.

Three nested loops, each running at roughly 10× the rate of the one outside it. The position loop (outermost, slowest) commands a velocity setpoint to the velocity loop. The velocity loop commands a current setpoint to the current loop. The current loop commands voltages to the motor. Each layer treats the layer below as a primitive that responds essentially instantly. The cascade is not optional — different bandwidths, different physics, different latency tolerances.
POSITION LOOP · 100–1000 Hz · embedded Linux VELOCITY LOOP · 1 kHz · motor driver CURRENT LOOP · 20 kHz · FOC + PWM Park / Clarke transforms 3-phase AC → 2-axis DC → PID ~1 ms cycle ~1 ms 50 µs each layer runs ~10× the rate of the layer outside it cascade exists because the physics decompose into nested timescales above position sits impedance · above impedance sits MPC · above MPC sits motion planning

The cascade is not optional. Each layer has different bandwidth, different physics, different latency tolerance. Trying to control torque directly from position commands at the outer loop's slow frequency would saturate or oscillate. The cascade exists because the underlying physics decompose into nested timescales, and the controller architecture mirrors the physics. This nesting is recursive — above the position loop sits impedance, above impedance sits MPC, above MPC sits motion planning, above motion planning sits task planning. Each runs slower than the layer below, and each treats the layer below as a primitive it can call.

5.3   Impedance control — making rigid hardware feel compliant

Hogan, 1985: command a relationship, not a target

the conceptual leap that makes cobots possible on stiff motors PHYSICS

Neville Hogan published "Impedance Control: An Approach to Manipulation" in 1985 with one core insight: force control alone fails when contact dynamics are uncertain. If a robot is told to push with a fixed force and the wall it's pushing on suddenly moves (or isn't there), the robot accelerates uncontrollably. If a robot is told to move to a position and there's an obstacle in the way, the position controller tries to drive through it, breaking either the obstacle or the robot.

Impedance control rephrases the problem. Instead of commanding "go here" or "push with this force," command a virtual mechanical relationship between robot and environment. Make the joint behave like a virtual spring of stiffness K and damper of coefficient B around a setpoint position. Push the robot, it gives like a spring. Hit a wall, the spring compresses against it without breaking. Let go, it returns to setpoint with damped oscillation.

A robot end-effector connected to its commanded setpoint by a virtual spring (stiffness K) and damper (coefficient B). The spring and damper exist only in software — there is no physical spring on the robot. An external force perturbs the end-effector; the controller computes the spring/damper restoring force and commands the actuator to exert it. The robot's behavior matches that of a real spring-mass-damper system, even though the underlying hardware is a stiff motor + harmonic drive.
setpoint θ_des virtual spring · K virtual damper · B end-effector · θ F_ext CONTROL LAW τ = K(θ_des − θ) + B(θ̇_des − θ̇) commanded torque = virtual spring force + virtual damper force REQUIRES · fast joint torque sensing or estimation · backdrivable actuator (low reflected inertia) · ~1 kHz inner loop running underneath

The math is straightforward in principle: τ = K(θ_des − θ) + B(θ̇_des − θ̇). Compute desired joint torque from position and velocity error, scaled by virtual stiffness and damping. The torque goes to the cascaded current loop below. What's tricky is that this requires fast and accurate joint torque sensing or estimation — without it, the virtual spring can't be enforced. Which is why this control mode lives natively on robots with joint torque sensors (Franka Panda, KUKA iiwa) or on QDD actuators where current sensing approximates joint torque (every modern legged robot).

Impedance control is what gives modern cobots their "soft" feel even on rigid hardware, and what allows a quadruped to recover from being kicked without falling. The legged robotics control stack from MIT Cheetah onward is fundamentally impedance-controlled, and the entire idea of "the actuator should be backdrivable" (§1.3) traces back to making impedance control hardware-feasible.

5.4   MPC & whole-body control — solving optimization 1000 times per second

The receding horizon

plan ahead, execute one step, throw the rest away, replan PHYSICS

The next level up is where modern robotics genuinely differs from the 1980s. Model Predictive Control is conceptually a hammer: at each timestep, given a model of the robot's dynamics and a description of what you want it to do, solve a constrained optimization for the optimal sequence of actions over the next N timesteps, execute only the first action, throw away the rest, then re-solve next timestep. The horizon "rolls forward" with each control cycle.

A robot's center of mass tracks a path. At each control instant, the controller plans N future steps (faded preview), executes only the first one (committed, solid), then re-plans next instant with a fresh prediction. The unused tail of each plan is discarded; the robot never commits to a trajectory longer than one step. This rolling-horizon structure is what makes MPC robust to disturbances the model didn't anticipate.
goal now t + 50 ms t + 100 ms (planning horizon) CoM now predicted N=10 steps · only first will execute QP SOLVER · OSQP / hpipm min Σ ‖x_k − x_ref‖² + λ‖u_k‖² s.t. dynamics, joint limits, friction cones state x_k plan u_0..u_N re-solve every 10 ms · execute only u_0 · discard u_1..u_N · advance one step · repeat

The math is a quadratic program (QP) — minimize a quadratic cost (tracking error plus control effort) subject to linear constraints (joint limits, friction cones, contact forces, no-collision). Modern solvers (OSQP, qpOASES, hpipm, ProxQP) handle millions of these per day at submillisecond latency. Convex MPC at 1 kHz on quadrupeds was the 2018-era breakthrough that made Spot, ANYmal, and the Cheetah lineage robust to terrain disturbances they'd never seen.

For humanoids, the architecture has converged on a two-layer pattern: MPC operates over a receding planning horizon (~100 ms) while WBC resolves per-joint torques at every control timestep (~1 ms), with state estimation feeding current robot state to both layers — the standard two-layer architecture documented in multiple 2024–2025 patents. The MPC layer plans where the robot's center of mass and feet should be; the whole-body controller below it solves an instantaneous optimization at every torque cycle, distributing the desired motion across all joints subject to dynamics, contact, and joint-limit constraints.

The contested layer is where reinforcement learning enters. Recent advances using sim-to-real RL have demonstrated humanoids robustly executing walking, jumping, parkour, dancing, and fall recovery — usually trained on large-scale human motion datasets or teleoperation. The 2024–2026 trend: a strong trend toward incorporating reinforcement learning to address the limitations of classical WBC — either by learning policies that output targets consumed by downstream WBC layers, or by replacing portions of the optimization solver itself. The honest current picture: classical MPC + WBC works and ships; RL-based controllers train faster on novel skills but generalize poorly outside their training distribution; hybrid architectures (RL policy producing targets for analytical WBC) are quietly winning in 2026 but no consensus has formed. We come back to this in §8 Perception, where the same architectural debate plays out one layer up.

5.5   The real-time stack — why ordinary Linux doesn't run robots

Hard real-time is a property of the whole system

PREEMPT_RT, Xenomai, ROS 2, and the federation of operating systems SOFTWARE

All of the above demands deterministic execution. A control loop running at 1 kHz must produce its output within 1 ms of receiving its input, every single cycle, forever. A 5 ms hiccup once an hour means the robot tips over once an hour. This is hard real-time, and it's a property of the whole software stack, not just the algorithm.

Standard Linux is not hard real-time. The kernel can preempt any user process to run housekeeping (filesystem flushes, network stack, USB hot-plug events, the page cache, you name it), and these preemptions can take tens of milliseconds. PREEMPT_RT is a long-running set of patches that has, after a 20-year merge process, become mainline as of Linux 6.12 in late 2024 — and it makes most kernel paths preemptible, getting jitter down to single-digit microseconds on commodity hardware. Xenomai is the older alternative, a co-kernel approach that runs a real-time scheduler alongside Linux. QNX and VxWorks are commercial RTOS options that ship in safety-critical applications.

The robot middleware standard is ROS 2, which replaced ROS 1's centralized broker with a distributed publish/subscribe layer based on DDS (Data Distribution Service, originally an OMG telecommunications standard). DDS gives ROS 2 deterministic real-time messaging across nodes — different parts of the robot's software can run on different machines, coordinate through DDS topics, and meet timing deadlines that ROS 1 couldn't promise. Almost every research humanoid in 2026 runs ROS 2; commercial humanoid programs increasingly ship custom middleware on top of similar primitives.

The hardware substrate splits the load. The motor driver microcontroller runs FOC at 10–20 kHz on bare metal or a tiny RTOS — STM32, ESP32, or a custom ASIC. The embedded Linux board (NVIDIA Jetson, Intel NUC, or custom ARM SoC) runs PREEMPT_RT and handles position/impedance/MPC at 100–1000 Hz. The application processor (often a separate x86 or larger ARM) runs perception, planning, and AI policies at 10–60 Hz on standard Linux. Three boards, three operating systems, three timescales, all talking to each other through DDS or custom protocols. The hardware is a federation matching the control hierarchy's federation. We unpack the compute substrate itself in §6.

Synthesis

The timing pyramidfive orders of magnitude, five hardware substrates

Like sensors are a federation of industries (§4.6), control is a federation of timescales. Five layers, each running roughly an order of magnitude slower than the one below, each solving a different problem, each running on a different hardware substrate whose latency matches the workload. A humanoid robot operates at all five simultaneously, and the architecture is layered specifically because no single algorithm could span them.

The control stack drawn as a timing pyramid. The fastest layer (motor commutation, ~1 µs) runs at the hardware boundary on a bare-metal microcontroller. The slowest (perception and policy, hundreds of ms) runs on a GPU-attached application processor. Each layer pulses at the correct relative rate; the visual rate ratio is the argument that no single processor or algorithm could span this range.
PERCEPTION · POLICY VLA models · scene understanding · task planning 100 ms – 1 s application Linux + GPU MPC · MOTION PLANNING trajectory optimization · receding horizon 10–100 ms embedded Linux IMPEDANCE · WHOLE-BODY CONTROL torque computation · QP solve · per-joint distribution ~1 ms embedded Linux + PREEMPT_RT CURRENT CONTROL PID on Iq/Id · field-oriented control transforms ~50 µs motor driver MCU MOTOR COMMUTATION PWM generation · 3-phase switching · gate drivers ~1 µs bare-metal MCU / ASIC six orders of magnitude · five hardware substrates · one robot

The robot's control stack is not a hierarchy of algorithms — it's a hierarchy of physics-imposed timescales whose architecture maps onto hardware substrates whose latencies match. Try to run MPC at 20 kHz and the QP solver can't keep up. Try to run FOC at 100 Hz and the motor whines and stalls. Try to run perception at 1 kHz and the GPU melts. Each layer fits where it does because the underlying physics and the available silicon agree — and the entire stack collapses if any layer misses its deadline. The most underrated fact about robot control is that the speed of the slowest layer determines the speed of the slowest behavior, but the safety of the entire robot depends on the determinism of the fastest layer.

§6 · The binding constraint

Power & thermalwhere physics stops being negotiable

A 175 cm humanoid carries roughly 1–2 kWh of battery. A human's "battery" — fat tissue and glycogen — stores about 100,000 kWh of metabolic energy. We are 50,000 times more energy-dense than our robots. This single fact governs more about humanoid design than any algorithm. Of all the layers in this guide, power and thermal are the layer where physics is least negotiable — you can't software your way around the energy density of lithium-ion or the thermal conductivity of aluminum. This section walks the four sub-problems: battery chemistry, the watt budget, thermal management, and the charging story — which is where the architectural choices made above all collide with operational reality.

6.1   Battery chemistry — what 250 Wh/kg actually buys you

The slow march of pack-level energy density

three decades of incremental gains, one promised step-change DATA

Modern humanoid robots run on lithium-ion, almost universally. Within Li-ion, several chemistries optimize different parts of the trade-off space. NMC (Nickel-Manganese-Cobalt) sits at 250–300 Wh/kg cell-level, ~200 Wh/kg pack-level after enclosure, BMS, and cooling — the high-energy-density default for Tesla Optimus, Figure, most Western humanoids. NCA (Nickel-Cobalt-Aluminum) edges slightly higher at the cost of cycle life — Tesla's 4680 cells. LFP (Lithium Iron Phosphate) drops to ~160 Wh/kg pack-level but trades that for safety (no thermal runaway), longer cycle life (3000+ cycles vs. 800–1500 for NMC), and lower cost. Common in Chinese humanoids and industrial AGVs. Solid-state has been promised for years and is still pre-commercial. Toyota and CATL have announced production lines for late 2027–2028. Theoretical 400+ Wh/kg, intrinsically safe, fast-charging.

Pack-level energy density over three decades. NMC has gained ~30% since 2010 and has plateaued. LFP has caught up but won't exceed ~180 Wh/kg without new chemistry. Solid-state is the projected step-change for 2028+, theoretical 350–400 Wh/kg. The vertical axis is the variable that closes most aggressively against humanoid mass budgets — every doubling of pack capacity comes with a doubling of pack mass at constant chemistry.
0 100 200 300 400 Wh / kg · PACK LEVEL 1990 2000 2010 2020 2030 2040 YEAR 2026 NMC ~200 Wh/kg NCA Tesla 4680 LFP ~160 Wh/kg solid-state projected 2028+ 350–400 Wh/kg Sony 1991 first commercial Li-ion ~30% gain since 2010 · plateauing step-change if it ships

The math that matters: a humanoid that wants 4 hours of work at an average draw of 400 W needs 1.6 kWh of usable energy. At 200 Wh/kg pack-level, that's 8 kg of battery — about 12% of the robot's mass. Push for 8 hours, and the battery doubles to 16 kg, eating into payload and forcing structural reinforcement. The battery weight is the variable that closes most aggressively against everything else in the design.

The hidden variable is C-rate — how fast the battery can deliver energy relative to its capacity. A 1 kWh battery at 1C can deliver 1 kW continuously. Humanoid actuators draw 2–5C in bursts (a hip joint accelerating its own leg can pull 600 W instantaneously from a 200 W average draw). Battery sizing is rarely capacity-limited; it's peak power-limited, and the specs you'll see (0.5–2 kWh pack capacity) are usually picked to deliver the peak watt requirements first, with runtime coming out as the consequence.

6.2   The watt budget — where the energy goes

400 watts, broken down

locomotion 50%, compute 25%, manipulation 12%, the rest DATA

Where does the 400 W average actually go in a walking humanoid? Roughly: locomotion actuators 200 W, compute 100 W, manipulation actuators 50 W, sensors and comms 30 W, standby and BMS overhead 20 W.

A walking 2026-class humanoid's watt budget. Locomotion dominates because the robot is fighting gravity continuously through hips, knees, and ankles — even a "standing" robot draws 30 W per leg for gravity compensation. Compute is the rising fraction as VLA models move onboard. The "irreducible" wedge — sensors, comms, BMS — doesn't go away when the robot is idle, which is why standby power isn't zero.
TOTAL 400 W WATT BUDGET BREAKDOWN Locomotion actuators hips, knees, ankles · walking against gravity 200 W · 50% Compute Jetson · custom inference SoC · GPU for VLA 100 W · 25% Manipulation actuators shoulders, elbows, wrists, fingers 50 W · 12.5% Sensors & comms cameras, IMUs, LIDAR, Wi-Fi/5G 30 W · 7.5% Standby + BMS overhead never zero · doesn't go away when idle 20 W · 5% peak draw · during dynamic motion: 800–1200 W idle draw · standing still: ~80 W

Three observations fall out of this. Standing still is cheap, walking is moderate, dynamic motion is expensive. A humanoid demoing parkour pulls 800–1200 W; the same robot waiting for a command pulls 80 W. The ratio between "doing nothing" and "doing the impressive thing" is 10–15×, which is why demo videos are short — battery, not just thermal. Compute is the rising fraction. As robots move from teleoperation to onboard VLA models, compute power draw climbs from 5% of the budget to 25–30%. The cost of the AI revolution in robotics is not just chip prices — it's watts, which means it's runtime. Inefficiency is everywhere. Motor efficiency 80%, gearbox efficiency 60–75% (harmonic drives are particularly bad — 60% peak, falling steeply at light loads), DC-DC converter efficiency 90%, BMS overhead 2–3%. Of the energy leaving the battery, less than half typically arrives at the joint as useful mechanical work. The rest becomes heat — which is the next subsection.

6.3   Thermal management — where the heat goes

The heat path from joint to ambient

200 W of waste heat distributed across 28 actuators · the under-discussed engineering problem ↓ ANIMATED

Energy that isn't motion is heat. A humanoid drawing 400 W average is dissipating 200–250 W as heat continuously. That heat has to go somewhere, and the heat path through a robot is one of the most under-discussed engineering problems in the field. The dominant heat sources are the actuators — specifically the motor windings, where I²R losses concentrate in copper coils packed into a few cubic centimeters. A QDD motor running at 200 W mechanical output is dissipating 30–60 W into its housing. Multiply by 28 actuators and you have 1–2 kW of heat distributed across the robot's joints, with each joint having maybe 200 cm² of surface area to shed it.

The heat path inside a QDD actuator. Copper windings (red, the I²R loss source) heat the stator iron, which heats the motor housing, which conducts into the structural aluminum, which radiates and convects to ambient air. Each interface is a thermal resistance; the cumulative resistance from winding to ambient is what limits sustained torque. Liquid cooling (right inset) shortcuts most of the path by carrying heat directly to a remote radiator.
PASSIVE · CONVECTION + CONDUCTION → ambient air (~25°C) chassis Al · ~40°C motor housing · ~60°C stator iron · ~85°C windings · ~120°C max ~30% of peak torque sustainable indefinitely LIQUID COOLING · ATLAS-STYLE in ~30°C out ~50°C PUMP radiator ~80%+ of peak torque sustainable

Three thermal management approaches in current humanoids. Passive convection is the default — heat flows through the motor housing, into the structural aluminum, and radiates from the chassis. Works for low duty cycles. Limits sustained workload to ~30% of peak. Forced air adds a small fan that pushes air across motor housings or chassis vents, costing 2–5 W of fan power and increasing sustained workload to ~50% of peak; adds noise. Liquid cooling uses a coolant loop with pump, cold plates on actuators or compute, to a radiator. Adds 30–50 W of pump power but enables 80%+ sustained workload. Boston Dynamics Atlas (electric) uses liquid cooling on its actuators; most others don't.

The thermal limit shows up as the continuous-vs-peak torque divergence. A humanoid actuator's spec sheet might say "400 Nm peak / 80 Nm continuous." That 5× ratio is set by thermal: peak torque can be sustained for seconds before the motor windings overheat; continuous torque is what's sustainable indefinitely without exceeding the insulation's thermal class. The fact that humanoid demos are usually short — 3 to 10 minute clips — is partly thermal, not just battery.

The compute side is its own thermal problem. An NVIDIA Jetson AGX Orin needs ~40 cm² of heatsink + active fan to sustain its 60 W. The Tesla Optimus integrates a custom inference chip with a vapor-chamber cooler. The chip and the joint motors compete for the same thermal budget — both want to dump heat into the same chassis air, and both throttle if the chassis temperature climbs above ~50 °C. Thermal coupling between compute and actuation is the under-noticed design constraint that pushes all 2026 humanoid designs toward distributed compute (some on the SoC near the camera, some near the motor drivers, none in a single thermal hotspot).

6.4   Charging — the operational story

Three architectures, three operational profiles

wall plug, hot swap, inductive — the difference between 6 hours and 22 DATA

The runtime number on the spec sheet is half the picture. The other half is what happens when runtime ends. Three architectures observed in 2026.

Wall-plug charging is the simplest — robot returns to a charging dock, plugs in (or rolls onto a contactless pad), charges for 1–2 hours, resumes. Apollo, Digit, Optimus all default to this. The constraint: 4 hours of work + 1 hour of charging means a single robot covers about 80% of an 8-hour shift; two robots rotating shifts cover continuous operation.

Hot-swappable battery packs have the robot carry a removable pack, with a human or another robot swapping it in 1–3 minutes. Atlas uses autonomous belly-mounted battery hot-swap, which takes about 3 minutes per swap; Apollo requires a human to physically change its battery packs. The constraint: pack interchangeability requires standardization, and the humanoid industry has none — every program ships its own pack form factor.

Inductive / wireless charging has the robot stand on a charging plate; induction transfers power without contacts. Figure 03 uses inductive foot-coil wireless charging at 2 kW with 10 Gbps mmWave data offload, allowing it to autonomously dock, charge, and resume work. The most operationally elegant solution; the most expensive to manufacture. Efficiency 85–92%, vs. 95%+ for cabled.

Specific runtime numbers from the 2026 fleet: Apollo targets 3PL, retail and manufacturing with a 160-pound frame, swappable 4-hour battery and 55-pound payload. Figure 01's 864 Wh H1 battery pack delivers approximately 5 hours of operational runtime — one of the longest battery lives in the full-size humanoid category. Tesla Optimus Gen 2 carries a ~2.3 kWh pack targeting "1 day on light tasks." Boston Dynamics electric Atlas: ~1 hour at full duty, several hours light duty.

The gap between runtime and wall-clock availability is the operational payoff. A 4-hour-runtime robot with 1-hour charging gets ~80% wall-clock duty. A 5-hour-runtime robot with hot-swap gets effectively 100%. The architectural decision about charging — wall plug vs. hot swap vs. inductive — is the difference between a robot that does 6 hours of useful work per day and one that does 22.

Synthesis

The energy gap50,000× — the physics that governs everything else

Humans are absurdly more energy-dense than humanoids. A human carries roughly 100,000 kWh of metabolic energy and a peak sustained output of ~250 W (an athlete). A humanoid carries 1–2 kWh of usable battery and a peak sustained output of ~400–600 W. The robot has more peak power; the human has 50,000× more endurance. This isn't a quirk of current battery chemistry — it's a fundamental energy-density gap between Li-ion (~250 Wh/kg) and the chemistry inside us (glucose oxidation in mitochondria, ~4,000 Wh/kg of fat, 16× denser).

The energy gap drawn at scale. The human's energy-storage tank dwarfs the humanoid's pack by four orders of magnitude. Peak power output is roughly comparable — a humanoid can sprint as hard as we can, briefly. Sustained power and runtime are where the gap shows. Solid-state batteries close the chemistry gap to ~10×. Nothing on the horizon closes it past 5×.
HUMAN energy storage: ~100,000 kWh peak power: 250–400 W walking sustained: ~80–100 W refuel time: ~minutes HUMANOID · 2026 Li-ion energy storage: ~1–2 kWh (orange sliver at left = scaled true) peak power: 400–600 W walking sustained: ~300–400 W refuel time: 1–3 hours
Human 2026 humanoid Ratio
Energy storage ~100,000 kWh (fat + glycogen) 1–2 kWh (Li-ion pack) 50,000–100,000×
Peak power output ~250–400 W (athlete sprint) ~400–600 W (peak duty) ~1× (rough parity)
Walking sustained ~80–100 W ~300–400 W 0.25× (humanoid worse)
Refuel time ~minutes (food) 1–3 hours (charge) 30× slower
Useful work / "charge" ~16 hours waking life ~4 hours runtime 4× shorter

Two observations make the table land. The humanoid is roughly comparable to a human on peak power and worse on everything else; robots can sprint as hard as we can, briefly, but the endurance gap is what they can't close with current chemistry. And the energy-density gap is mostly fundamental physics — lithium-ion stores ~250 Wh/kg; glucose oxidation in mitochondria stores ~4,000 Wh/kg of fat. We are running our robots on a chemistry that stores 16× less energy per kilogram than the chemistry inside us. Solid-state batteries close this to ~10×. Nothing on the horizon closes it past 5×. Every architectural decision above this layer is in some sense a workaround for the energy-density gap. QDD actuators (§1.3) trade torque ceiling for efficiency. Vision-only perception (§4.5) trades robustness for watts saved. Hybrid RL+WBC controllers (§5.4) trade explainability for compute efficiency. The robotics industry has spent ten years finding ingenious ways to do more with 1 kWh, and the next ten years of progress are likely to track battery chemistry as much as algorithm progress.

§7 · The brain

Compute & software stackthree computers, four operating systems, one robot

§5 ended with a federation of timescales. §6 ended with a federation of watts. §7 picks both up: the silicon that runs each timescale, and the software stack that ties them together. The framing that matters here, again, is federation. A 2026 humanoid is not "a Jetson with some peripherals." It's a small distributed system — typically three or four discrete compute substrates, three or four operating systems, four or five middleware protocols, all running simultaneously on a single robot. The most underrated fact about humanoid software is that "the AI" is one process out of dozens, and the rest of the stack — drivers, RT control, state estimation, networking, telemetry — is what determines whether the robot ships.

7.1   Onboard SoCs — the silicon for physical AI

The Jetson Thor moment

2070 TFLOPS at the edge · the chip humanoids waited four years for DATA

The compute substrate decision is the most visible architectural choice on a humanoid. For most of 2020–2024, onboard inference meant NVIDIA Jetson AGX Orin — 275 TOPS at INT8, 60 W, 32 GB memory. Adequate for classical perception and small policy networks; emphatically inadequate for running 7B-parameter VLA models at conversation speed. The 2024–2026 transition has been about closing that gap.

The single most important hardware event of 2026 for humanoid programs was the August 2025 release and 2026 general availability of NVIDIA Jetson AGX Thor. Jetson Thor delivers up to 2070 FP4 TFLOPS of AI compute and 128 GB of memory with power configurable between 40 W and 130 W, providing 7.5× higher AI compute than NVIDIA AGX Orin and 3.5× better energy efficiency. It runs Blackwell-architecture GPU plus a 14-core Arm Neoverse-V3AE CPU and ships with the NVIDIA Holoscan sensor framework, Isaac robotics platform, and GR00T foundation model integration.

The Jetson generational jump puts data-center-class compute inside a humanoid's chassis. From Orin (275 TOPS, 32 GB, 60 W) in 2022, to Thor (2070 TFLOPS, 128 GB, 130 W) in 2026 — a 7.5× compute leap, 4× memory, in 2× the watt budget. The chips that matter for humanoid programs in 2026 are the Thor at the high end, the Tesla custom inference SoC (specs unpublished), and the various x86/ARM application processors handling the slower planning loops.
JETSON AGX ORIN 2022 · the previous default Orin SoC AI compute 275 TOPS memory 32 GB power 60 W CPU 12-core Arm A78 capable of classical CV · <1B policies 7.5× JETSON AGX THOR 2026 · the new default Blackwell GPU AI compute 2070 TFLOPS memory 128 GB power 40–130 W CPU 14-core Neoverse-V3AE capable of 7B+ VLA · multi-modal · LLM TESLA OPTIMUS FSD-derived custom SoC Tesla AI5 AI compute undisclosed strategy vertical · same as FSD scale leverage millions of car SoCs cost target ~$200/unit at scale capable of FSD policy + Grok integration adopters of Thor: Boston Dynamics Atlas · Figure · Agility Digit Gen 6 · Amazon · Meta · evaluating: OpenAI, John Deere

The split that matters in 2026: buy NVIDIA (the path Boston Dynamics, Figure, Agility, Amazon, Meta, and most research labs took, riding the Isaac/GR00T software stack); or roll your own (the path Tesla and a handful of Chinese humanoid programs took, leveraging existing automotive AI silicon and avoiding the per-unit cost premium). The NVIDIA-buy path is faster to ship; the in-house path scales cheaper at volume. Most humanoid programs in 2026 will be ex-Jetson programs eventually — but only if they ship enough volume to amortize a custom SoC, which is a meaningful "if."

Below the AI accelerator sits the real-time control compute, which is a separate silicon problem entirely. The Thor's 14-core Neoverse-V3AE handles application-grade workloads, but hard-real-time control loops at 1 kHz and above usually run on dedicated microcontrollers (STM32H7 family, ESP32-S3, occasionally a Cortex-R52) bolted to motor drivers and inertial sensors. The motor commutation at 20 kHz runs on yet smaller chips embedded in the FOC ESCs themselves. The "robot's compute" is at least three different silicon tiers — application AI, real-time control, and motor-level commutation — and they're physically distributed across the chassis to match the thermal and latency constraints from §5 and §6.

7.2   The three-computer model — train, simulate, deploy

The cloud-edge pipeline

NVIDIA's framing the rest of the industry now uses ↓ ANIMATED

NVIDIA's three-computer model has become the de facto framing for humanoid software architecture, used (with variants) by every major program. Computer 1: Training. A GPU-rich data center where foundation models (VLA, world models, motion priors) are trained on vast datasets — teleoperation logs, internet video, synthetic data, motion-capture libraries. NVIDIA DGX with H100/B100 chips, or analogous infrastructure (Tesla's Cortex 2.0 supercomputer at Giga Texas, OpenAI partner data centers for Figure). Computer 2: Simulation. A workstation or render farm that runs physically-accurate simulation at scale — for training reinforcement learning policies, generating synthetic data, validating before deployment. NVIDIA Isaac Sim and Isaac Lab (built on Omniverse), MuJoCo (DeepMind's physics simulator, now MJX-accelerated for GPU parallelism), Drake (Toyota Research Institute's optimization-focused simulator). Computer 3: Onboard runtime. The Thor or custom SoC running inference at the edge. The trained model from Computer 1, validated against Computer 2's simulator, deployed to Computer 3's chip.

The three-computer pipeline. Foundation models train in a cloud data center, are validated and refined in simulation on workstations, then deploy to onboard inference silicon on the robot. Data flows back from deployed robots to the simulator and training cluster, closing the loop. The pipeline is what enables a humanoid program to ship a foundation-model-driven robot at all — every loop is a different OS, framework, and time scale.
1 · TRAINING cloud data center DGX · H100 / B100 Tesla Cortex · OpenAI trains: VLA models motion priors world models model 2 · SIMULATION render farm / workstation RTX · OVX · Omniverse Isaac Sim / Lab tools: MuJoCo (DeepMind) Drake (TRI) PyBullet · Gazebo policy 3 · ONBOARD edge inference Jetson Thor (most) Tesla AI5 · custom SoC runs: VLA inference perception · planning control dispatch deployment data → retraining (the loop closes) weeks hours–days milliseconds timescale →

The pipeline is asymmetric. Training is slow and expensive — weeks of compute on hundreds of GPUs to produce a foundation-model checkpoint, costing $100K–$10M depending on scale. Simulation is medium-speed — hours to days for a policy training run with 4096 parallel environments. Onboard inference is fast and cheap — milliseconds per forward pass on a Thor that costs $3,499 retail. The asymmetry is what enables the whole approach: pay enormous training costs once, amortize across every robot, run cheaply at the edge.

The closing-loop arrow is the underrated detail. Robots in deployment generate data — sensor logs, success/failure cases, edge-case scenarios — that feeds back into the training and simulation loops. Tesla's Cortex 2.0 supercomputer at Giga Texas reportedly delivers 250 MW of compute in its first phase, scaling to 500 MW by mid-2026, specifically to train Optimus on the data its deployed fleet generates. The robot fleet is, at scale, both a deployment target and a training-data factory — and that flywheel is the strategic argument behind every "ship 1 million units" claim from a humanoid program.

7.3   Middleware — ROS 2, DDS, and how processes talk to each other

Publish, subscribe, deadline

the federation of processes the robot is, glued together by message passing SOFTWARE

A 2026 humanoid runs ~30–80 distinct processes simultaneously: per-joint motor drivers, a state estimator, an MPC solver, a perception pipeline, the VLA policy, a teleoperation backend, multiple loggers, watchdog timers, network bridges, voice synthesis, sometimes a small LLM for natural-language interaction. Almost no two of them are written by the same team or in the same language. They communicate through middleware, and the middleware decision is more architecturally important than most people new to the field realize.

The dominant choice in 2026 is ROS 2, the rewrite of the original Robot Operating System that fixed the centralized-broker problem of ROS 1. ROS 2 is built on DDS (Data Distribution Service), a publish-subscribe protocol originally developed by OMG for telecom and aerospace. Each process publishes data to "topics" and subscribes to topics it cares about; the DDS layer handles serialization, network routing, quality-of-service guarantees, and cross-machine discovery. DDS implementations include Cyclone DDS (Eclipse Foundation, the ROS 2 default), Fast DDS (eProsima), and commercial options like RTI Connext.

The reason DDS won is its quality-of-service (QoS) configuration. Each topic can be configured for reliability vs. best-effort, deadline-monitored vs. unmonitored, durable (last value persisted for late-joining subscribers) vs. volatile, with bounded vs. unbounded latency. A camera frame topic might be best-effort and volatile (drop frames if the receiver is slow); a joint-torque-command topic might be reliable, deadline-monitored at 1 ms, and trigger a watchdog if a deadline is missed. The QoS settings are where the real-time guarantees from §5 actually live — they're not a property of the algorithm, they're a property of the message bus.

The competing options matter because not everyone uses ROS 2. Boston Dynamics ships its proprietary middleware on Atlas (legacy from Spot), with ROS 2 as an SDK bridge. Tesla Optimus reportedly uses a custom in-house framework derived from FSD's distributed runtime, on the theory that ROS 2 is too generic for a vertically integrated stack. Apptronik Apollo and Figure use ROS 2 for development with custom production stacks. The Chinese humanoid wave (Unitree, AGIBOT, XPENG IRON) uses ROS 2 plus custom extensions. 1X NEO is one of the few that ships ROS 2 nearly stock.

The practical effect: a humanoid-robotics engineer in 2026 needs ROS 2 fluency to be employable across the industry, but most production humanoids run something at least partially custom underneath. The skills transfer; the codebases don't.

7.4   Simulation stack — Isaac, MuJoCo, Drake

Three philosophies of physics simulation

and the sim-to-real gap that won't go away DATA

You cannot train a humanoid policy in the physical world. The compute, time, and breakage cost are prohibitive — millions of falls, billions of timesteps, all needed before a robot can stand up reliably. Modern humanoid programs train almost entirely in simulation, then transfer to physical hardware. The gap between simulator and reality — the sim-to-real gap — is the single biggest unsolved problem in humanoid software, and three simulator philosophies have emerged in response.

Three simulators dominate humanoid robotics in 2026, each optimized for a different workload. Isaac Sim (NVIDIA) maximizes parallelism — 4096 robot environments running in parallel on a single GPU, ideal for RL training. MuJoCo (DeepMind) maximizes physics fidelity per unit compute, recently GPU-accelerated as MJX, used heavily by academic labs and motion-prior pipelines. Drake (Toyota Research Institute) maximizes optimization-friendliness, with smooth differentiable contact dynamics for trajectory optimization and MPC research.
ISAAC SIM / LAB NVIDIA · Omniverse-based strength: massive parallelism 4096 envs / GPU physics: PhysX 5 + custom visuals: RTX path-traced used for: RL · synthetic data · GR00T core integration with Jetson MuJoCo DeepMind · was Roboti LLC strength: contact-rich physics MJX: GPU-parallel physics: native MuJoCo solver visuals: simple OpenGL used for: academic · motion priors open-source · Apache 2.0 DRAKE Toyota Research Institute strength: smooth differentiability trajectory optimization physics: custom · MIT lineage visuals: Meshcat / minimal used for: MPC research · TRI Atlas BSD-licensed · academic-heavy

The honest picture: most production humanoid programs use two simulators in their pipeline, not one. Isaac Sim or MuJoCo for high-volume RL training where parallelism dominates; Drake or Pinocchio for trajectory optimization and MPC research where differentiability matters. The choice is rarely "pick the best simulator" and usually "pick the right pair."

The sim-to-real gap remains the unsolved problem. Simulators model rigid-body dynamics excellently, contact dynamics tolerably, and friction, deformable materials, soft tissue, fluids, and tactile feedback poorly. Policies trained in simulation routinely fail on real hardware in ways that surprise their developers — a gripper trained in MuJoCo to pick up cubes will reliably crush them on real hardware because the simulated finger pads are infinitely stiff. Mitigation strategies — domain randomization (perturb sim parameters), real-data fine-tuning (mix real teleoperation data into the training set), online adaptation (fine-tune the policy on the real robot's first hours of operation) — all help; none close the gap entirely. The sim-to-real gap is the rate-limiting step on most modern humanoid capability advances. Closing it cuts months off development cycles.

7.5   Foundation models — VLA, world models, motion priors

The model layer the rest of the stack carries

VLA-driven robotics is real; "general-purpose" robotics is still aspirational SOFTWARE

The visible AI layer of a 2026 humanoid is dominated by vision-language-action (VLA) models — neural networks that take camera images plus a natural-language instruction ("pick up the red mug and put it on the shelf") and output joint trajectories or low-level motor commands. The model class as a serious humanoid technology only really arrived in 2023–2024, and the 2026 picture is now genuinely competitive.

The major open and proprietary VLA families: RT-2 (Google DeepMind, 2023, the first credible VLA at scale, never publicly released); OpenVLA (Stanford / Toyota Research Institute, 2024, 7B parameters, fully open-weights, the academic baseline); π0 and π0.5 (Physical Intelligence, 2024–2025, the most-cited open VLA in 2026 papers, trained on a cross-embodiment dataset); NVIDIA Isaac GR00T N1 (released early 2026, paired with Jetson Thor as the reference humanoid stack); Helix (Figure's proprietary VLA, learn-by-watching); Tesla's Optimus stack (FSD-derived, integrates Grok for natural language, full architecture undisclosed); Large Behavior Models from Boston Dynamics + Toyota Research Institute (announced 2026, runs on Atlas).

Below the VLA, modern humanoid stacks include other learned components. Motion priors — generative models trained on human motion-capture libraries that constrain the policy to produce human-like motion. World models — neural networks that predict future sensor observations given current state and proposed actions, used for planning and uncertainty estimation. Imitation policies — typically diffusion-based, trained directly on teleoperation demonstrations. Reward models — for RLHF-style fine-tuning of behavior preferences. The full ML pipeline on a modern humanoid contains 4–8 distinct learned components, not a single end-to-end model.

The honest current-state read: VLAs work for narrow tasks and fail unpredictably on novel ones. Pick-and-place from a clean tote? Reliable. Operating a microwave the model has never seen? Generally not. Folding laundry? Demoed, not commercial. The gap between "demo on a stage" and "ships in a customer warehouse" remains 12–24 months for any new capability, and the rate-limiting step is usually data collection — teleoperation logs at scale, edge-case recovery data, robust fine-tuning sets.

Synthesis

The software federationthree computers, four operating systems, four middleware protocols

The federation pattern that organized §1, §2, §4, and §5 lands again in §7. A humanoid robot's software is not a single application — it's a layered federation, each layer with its own physics, its own players, its own moats. The mistake that most coverage of "robot AI" makes is treating the VLA as the whole story; the VLA is one model in one process out of dozens.

Application AI Application logic Real-time control Motor commutation
What runs VLA · perception · LLM · planning State machine · supervisor · UI · telemetry MPC · WBC · impedance · safety FOC · PWM · current sense
Hardware Jetson Thor · Tesla AI5 · custom SoC x86 application processor or larger ARM Embedded Linux + PREEMPT_RT board Bare-metal MCU (STM32, ESP32, custom)
Operating system Ubuntu Linux + JetPack Standard Linux (Ubuntu, Debian) Linux PREEMPT_RT · Xenomai · QNX Bare metal · FreeRTOS · Zephyr
Frameworks PyTorch · CUDA · TensorRT · GR00T Python · Rust · C++ · application logic C++ · Rust · OSQP / hpipm / Drake C · CMSIS · vendor SDKs
Middleware ROS 2 + DDS topics ROS 2 + DDS topics Shared memory · custom IPC · ROS 2 partial EtherCAT · CAN-FD · vendor-specific
Cycle time 10–100 ms 10 ms – 1 s ~1 ms (hard real-time) ~50 µs (hard real-time)
Engineers ML researchers · perception App developers · platform Controls engineers · embedded FW engineers · power electronics

Every column is its own discipline, with its own tooling, hiring market, and failure modes. The single biggest under-noticed challenge in humanoid software is integration across the columns — the VLA team's policy needs to talk to the controls team's MPC, which needs to talk to the embedded team's motor drivers, all through middleware that has to honor real-time guarantees set by the slowest-acceptable-deadline anywhere in the chain. A humanoid program's velocity is set as much by how cleanly it organizes the column boundaries as by how good its individual components are. Mainstream coverage of "robot AI" reduces this stack to its rightmost column — the VLA — and misses that the rightmost column is the smallest team on the project.

§8 · The world model

Perceptionfrom pixels to beliefs about the world

§4 introduced the sensors. §7 introduced the silicon. §8 covers what happens between them — the algorithms that turn raw pixels and point clouds into something the controller can act on. "Perception" in 2026 means something fundamentally different than it did in 2020. The classical pipeline (feature detectors, SLAM, semantic segmentation) is still alive and shipping, but the VLA-driven foundation-model approach has rewritten the upper half of the stack in a year and a half. This section walks both — the classical pipeline that still runs underneath, and the foundation-model layer that's eating its top.

8.1   The classical CV pipeline — still alive, still shipping

Three layers nobody talks about anymore

but every humanoid still runs them, because they're cheap and correct ↓ ANIMATED

Before VLAs, there was a classical pipeline that solved most of the perception problems robots actually face. It still ships in every modern humanoid, usually as the layer underneath the foundation model rather than as the visible AI. Three layers compose it. Low-level vision — feature detection, optical flow, edge detection. SIFT and SURF feature descriptors from the 2000s; ORB (the rotation-invariant binary descriptor that powers most modern SLAM) from 2011; deep-learned features (SuperPoint, R2D2) from 2018 onward. Mid-level vision — semantic segmentation, object detection, depth estimation. YOLO (now YOLO26 in 2026) for real-time detection; Mask R-CNN and SAM for segmentation; MiDaS and Depth Anything for monocular depth. Geometric reasoning — bundle adjustment, ICP (Iterative Closest Point) for point-cloud alignment, pose estimation from feature correspondences.

A camera frame is processed through three classical layers before any "AI" gets to see it. Low-level features (corners, edges, blobs) are extracted from raw pixels. Mid-level vision attaches semantic labels and depths to the pixels. Geometric reasoning fuses across frames and time to recover camera pose, scene structure, and dynamics. This pipeline still runs underneath every modern humanoid — it's cheap, deterministic, and correct in regimes where neural networks aren't.
RAW FRAME 1920 × 1080 · 60 fps LOW-LEVEL ORB · SIFT · SuperPoint ~1000 keypoints / frame MID-LEVEL box mug wall YOLO · SAM · Depth Anything labels + depths per pixel GEOMETRIC pose SLAM · ICP · bundle adj pose · map · trajectory ~5 ms ~10 ms ~30 ms ~20 ms budget →

The classical pipeline solves problems that VLAs are bad at — precise geometric estimation, deterministic timing, sub-millimeter pose accuracy, robust failure modes. A VLA can recognize "the red mug" with 95% accuracy and then place its end-effector 3 cm away from it. The classical depth-and-pose pipeline running underneath snaps that 3 cm error to sub-millimeter alignment in the final approach. Most production humanoid stacks use the foundation model for "what" — task understanding, scene parsing, semantic grasp selection — and the classical pipeline for "where" — final-approach geometry, contact-rich precision, sensor fusion across frames.

8.2   SLAM — knowing where you are while building the map

The chicken-and-egg problem solved fifty different ways

simultaneous localization and mapping · the algorithm a robot needs to do anything outside a known environment ↓ ANIMATED

SLAM — Simultaneous Localization and Mapping — is the algorithm a robot uses to figure out where it is while simultaneously building a map of its surroundings. It's chicken-and-egg: knowing your pose requires a map; building a map requires knowing your pose. SLAM resolves this by running both estimates jointly, with each new sensor frame refining both. The field is forty years old and has produced enough variants to fill a textbook — visual SLAM (cameras only), LIDAR SLAM (laser scans), visual-inertial SLAM (cameras + IMU, the modern default), tightly-coupled vs. loosely-coupled, filter-based vs. graph-based, sparse vs. dense.

A robot moves through a scene. At each timestep, it observes feature points (corners and edges of objects) and estimates its pose relative to where they were last seen. As the robot moves, the feature graph grows — old features stay anchored as landmarks, new ones are added to the map, and the robot's trajectory is recovered as the path that's most consistent with all observations jointly. Loop closure (recognizing a place you've been before) corrects accumulated drift.
robot loop closure landmark past pose trajectory loop closure visibility ray

The 2026 mainstream is visual-inertial SLAM — fusing camera frames with IMU measurements at high frequency. Two open-source implementations dominate research: ORB-SLAM3 (Universidad de Zaragoza, the reference for sparse visual-inertial SLAM) and VINS-Fusion (HKUST, tightly-coupled with multiple sensor support). Production humanoids increasingly use proprietary derivatives — Boston Dynamics' Spot SLAM is custom and battle-tested; Tesla Optimus uses an FSD-derived stack repurposed from automotive. Dense neural SLAM (NICE-SLAM, Gaussian Splatting SLAM) is the active research frontier — replacing the sparse feature map with a continuous neural or Gaussian representation that can render novel views.

The honest read on SLAM in 2026: it's solved well enough for known indoor environments, marginal for outdoor and unstructured environments, and an active research problem for genuinely dynamic scenes. A humanoid that maps a warehouse on its first walkthrough and uses that map for the next year is shipping commercial product. A humanoid that walks into a stranger's house and immediately operates is not.

8.3   VLA architectures — single-model vs dual-system

The architectural split that emerged in 2024

fast-thinking inside slow-thinking · Kahneman's System 1 / System 2 inside a robot ↓ ANIMATED

The Vision-Language-Action models introduced in §7.5 split into two architectural philosophies, and the split is more than aesthetic — it determines what the robot can actually do at what speed. Single-model VLAs run one large transformer that reads camera frames + language instructions and emits actions in a single forward pass. The lineage: RT-2 (Google DeepMind, 2023, the original at scale, never released), OpenVLA (Stanford / TRI, 2024, 7B parameters, fully open, the academic baseline), π0 and π0.5 (Physical Intelligence, the most-cited open VLA in 2026 papers, with a flow-matching action head that handles multi-stage tasks like garment folding). Dual-system VLAs split the architecture into a slow vision-language reasoning module ("System 2") and a fast action-generation module ("System 1") that run at different rates and exchange tokens. The lineage: Helix (Figure's proprietary model), NVIDIA GR00T N1 (released March 2025, paired with Jetson Thor as the reference humanoid stack), Gemini Robotics (Google DeepMind, built on Gemini 2.0).

Single-model VLA (left) takes camera and language input, runs them through a unified transformer, and emits actions every forward pass. Simple, low-latency, but constrained by the speed of the largest model in the loop. Dual-system VLA (right) decouples slow reasoning ("what is in this scene, what is the goal") from fast action ("emit motor commands"). System 2 runs at ~10 Hz on a VLM backbone; System 1 runs at ~50–100 Hz on a smaller diffusion transformer. The action loop never waits for the reasoning loop.
SINGLE-MODEL · RT-2, OpenVLA, π0 image "pick up the mug" UNIFIED TRANSFORMER vision + language + action tokens, single pass 7B – 55B params action (joint cmds) RATE · ~5–10 Hz simple · low latency floor but bottlenecked by the largest model DUAL-SYSTEM · Helix, GR00T N1 image prompt SYSTEM 2 VLM backbone 10 Hz · slow reasoning tokens SYSTEM 1 diffusion / flow 50–100 Hz · fast action action RATE · S1 ≫ S2 action loop runs ~10× faster than reasoning loop Kahneman's "Thinking, Fast and Slow" — System 1 / System 2 — mapped onto neural architecture 2026 trend: dual-system winning for humanoids · single-model still standard for tabletop manipulation

The dual-system architecture is winning in humanoid robotics specifically because of the §5 timing-pyramid problem: a robot's action loop needs to run at 50–100 Hz to feel responsive, but a 7B-parameter VLM can only generate tokens at ~10 Hz on edge silicon. Single-model VLAs paid that latency cost — they simply ran slowly. Dual-system architectures decouple the rates: System 2 thinks at 10 Hz; System 1 emits actions at 100 Hz; the action loop never waits. GR00T N1's System 2 reasoning module is a pre-trained VLM that runs at 10 Hz on an NVIDIA L40 GPU. It processes the robot's visual perception and language instruction to interpret the environment and understand the task goal. Subsequently, a Diffusion Transformer, trained with action flow-matching, serves as the System 1 action module.

Practical performance numbers from 2026: OpenVLA outperformed RT-2-7B on 73% of BridgeV2 evaluation tasks after fine-tuning. With a strong pre-trained backbone like OpenVLA or GR00T, fine-tuning on as few as 50–100 demonstrations can produce usable performance on simple tabletop manipulation tasks. Complex, dexterous, or precision industrial tasks may require 500–2,000 demonstrations for acceptable success rates. The practical floor for a commercially viable task-specific deployment is typically 100–300 hours of teleoperation data. Those numbers are the honest gauge of where VLAs are: not "general-purpose" yet, but "fine-tunable from a foundation in days, not months."

8.4   The vision-only-vs-fusion debate — how 2026 humanoids actually see

The architectural decision §4 introduced

now with the algorithmic stack to actually evaluate the trade-off SOFTWARE

§4.5 introduced the contested architectural choice — Tesla's vision-only philosophy (8 cameras, no LIDAR, no depth sensor) vs. the multi-sensor fusion approach (cameras + LIDAR + depth, used by Figure, Agility, 1X, most others). With §8's algorithmic context now in hand, the trade-off is concretely evaluable.

The vision-only argument rests on three claims. First, humans navigate complex environments with vision alone, which is existence proof that it's possible. Second, neural networks trained on enough vision data can extract depth and pose accurately enough for manipulation tasks (Depth Anything v2, Marigold, and other 2024–2025 monocular depth models hit centimeter-accuracy on indoor scenes). Third, eliminating LIDAR removes weight, watts, and a $700–10,000 BOM line item. Tesla's vertical-integration argument: their FSD program already trained the world's largest vision-only depth pipeline; reusing it for Optimus is essentially free.

The multi-sensor argument rests on three counter-claims. First, LIDAR works in low light, dust, smoke, and direct sunlight where vision degrades. Second, LIDAR returns are deterministic — they tell you a point is at distance X with millimeter accuracy, vs. a neural depth estimator that might be off by 10%. Third, LIDAR dramatically simplifies SLAM in unstructured environments. Figure's Mid-360 LIDAR plus depth cameras gives the robot a dense 3D map for ~$1,000 of BOM.

The 2026 honest read: vision-only works for warehouse and factory environments where the robot operates in good lighting on known surfaces. It fails outdoors, in dim spaces, and in environments where the cameras can be physically obstructed. The fact that Tesla has shipped Optimus units doing factory work and not Optimus units doing kitchen work is partly a reflection of this — the easier perception problem comes first. Whether vision-only generalizes to homes is the bet that determines whether Tesla's BOM advantage holds at scale, or whether the multi-sensor programs eat the home market once homes become a target.

8.5   Sensor fusion — Kalman to graphs to learned

How disagreeing sensors become one belief

the algorithmic glue that makes a federation of sensors usable PHYSICS

A humanoid carries five different sensor industries (§4) reporting on the same physical world at different rates with different noise characteristics. Camera says one thing; LIDAR says another slightly different thing; IMU says a third, with high-frequency drift; encoders say a fourth. Sensor fusion is the algorithmic problem of combining all of them into a single coherent belief about robot state and world state. Three approaches dominate.

Kalman filtering is the classical workhorse, dating from Rudolf Kalman's 1960 paper. The key idea: maintain a probability distribution over the system state, predict its evolution forward in time using a dynamics model, then update the distribution when new sensor measurements arrive, weighted by their respective uncertainties. The Extended Kalman Filter (EKF) handles nonlinear systems; the Unscented Kalman Filter (UKF) handles them better; the Multi-State Constraint Kalman Filter (MSCKF) is the standard for visual-inertial odometry. Kalman filters dominate state estimation at the inner loop because they're cheap, well-understood, and provably optimal for the linear-Gaussian case.

Factor graphs are the modern alternative for back-end SLAM and longer-horizon estimation. Instead of a single state distribution, a factor graph holds a graph of variables (poses, landmarks) connected by factors (sensor measurements, motion constraints). Solve the graph by jointly optimizing all variables to be most consistent with all factors — bundle adjustment is a special case. The GTSAM library (Georgia Tech) is the reference implementation; iSAM2 handles incremental updates efficiently. Factor graphs are slower than Kalman filters but handle loop closure, multi-modal beliefs, and long-horizon dependencies naturally.

Learned fusion is the newest entrant. Train a neural network to take raw sensor inputs and emit a state estimate, bypassing explicit probabilistic modeling. Works in regimes where the noise distributions are non-Gaussian or hard to characterize — particularly tactile and visuomotor fusion in dexterous manipulation. Most production humanoids in 2026 still use classical Kalman or factor-graph approaches for state estimation and localization, with learned fusion increasingly creeping in at the perception/grasping interface. The fusion choice is rarely one-size-fits-all — a typical humanoid runs Kalman for IMU+encoder+joint-torque fusion, factor graphs for SLAM, and learned components for grasp pose estimation. Three fusion algorithms in one robot.

Synthesis

The perception stackthree layers, two paradigms, one robot

The same federation pattern that organized §1, §4, §5, and §7 lands here too. A 2026 humanoid's perception is not "the VLA" — it's a layered stack where the foundation model sits at the top, a classical SLAM/CV pipeline sits underneath, sensor fusion glues them together, and each layer has its own time scale, hardware substrate, and engineering discipline. The most honest framing is that the foundation models did not replace the classical stack — they sit on top of it, and the stack is more competent at every layer than it was three years ago.

Foundation model layer Classical CV pipeline Sensor fusion Direct sensor processing
What it does Scene understanding, task interpretation, semantic grasp selection Feature extraction, segmentation, depth, SLAM State estimation, fusing multi-sensor beliefs Encoder reading, IMU integration, F/T calibration
Algorithms VLA (RT-2, OpenVLA, π0, GR00T, Helix), VLM, LLM ORB, SIFT, YOLO, SAM, Depth Anything, ORB-SLAM3 EKF, UKF, MSCKF, GTSAM, factor graphs, learned fusion Direct sensor reads + Kalman pre-filtering
Hardware Jetson Thor / Tesla AI5 Same Jetson, parallel pipeline Embedded Linux + PREEMPT_RT Motor driver MCU
Cycle time 10–100 ms (System 2 + System 1) 5–30 ms per layer ~1 ms ~50 µs
Maturity Active research → early production Mature · 30 yr · still improving Mature · 60 yr Mature · standard
When it fails Novel scenes, edge cases, language ambiguity Poor lighting, repetitive textures, motion blur Sensor disagreement during fast transients Hardware faults, EMI, calibration drift

The mistake in mainstream coverage of "humanoid AI" is to focus exclusively on the leftmost column — VLAs are charismatic, they're what people demo, and they're the visible AI on the robot. But VLAs sit on top of a classical pipeline that handles geometric precision, a fusion layer that handles sensor disagreement, and a direct-sensor layer that handles the fastest loops. When a 2026 humanoid fails at a manipulation task, the failure is rarely "the VLA was confused" — it's almost always "the depth estimate was off by 5 cm" or "the gripper torque sensor drifted" or "the SLAM lost loop closure when the lighting changed." The foundation model is the part that gets the press; the stack underneath is the part that determines whether the robot ships.

§9 · The unsolved problem

Manipulationwhy grasping is harder than walking

Locomotion is a solved problem. Modern quadrupeds run, jump, recover from kicks, and traverse terrain humans struggle with. Modern bipeds walk, dance, and balance. But ask any of them to fold a t-shirt, open a kitchen drawer with a wet hand on the handle, or pick a tomato from a vine without crushing it, and the modern robot reverts to clumsiness that would embarrass a four-year-old. Manipulation is the unsolved problem. Elon Musk has called the Optimus hand the "majority of the engineering difficulty" — tougher than designing the Cybertruck and accounting for about 60% of the overall Optimus challenge. This section walks why that's true: the geometry of grasping, the sensorized fingertip arms race, the dexterous-hand designs of 2026, and the algorithmic split between analytical, learned, and hybrid approaches.

9.1   Why manipulation is hard — the asymmetry with locomotion

Three asymmetries that don't favor robots

contact dynamics, object diversity, sub-millimeter tolerances PHYSICS

Locomotion has three properties manipulation lacks. First, the contact set is small and predictable — a biped contacts the ground with two feet at a time, the contact geometry is known (foot soles), the friction parameters are stable (concrete, tile, carpet — a small dictionary). A grasping robot contacts arbitrary objects with arbitrary surfaces, in arbitrary configurations, with friction parameters that change continuously across the contact patch. Second, locomotion is forgiving in the small. A biped that's 5 cm off in foot placement just takes a slightly different next step. A manipulation robot that's 5 cm off in finger position fails to grasp at all. Third, locomotion's failure mode is "fall down" — recoverable, low-information. Manipulation's failure mode is "drop the wineglass" or "crush the tomato" — sometimes catastrophic, always information-rich about what went wrong but too late to fix.

The deeper asymmetry is that locomotion is whole-body and open-loop tolerant; manipulation is end-effector and closed-loop demanding. A walking gait can be tuned in simulation and deployed; the dynamics are deterministic enough. A grasp policy must read the actual object's actual surface in real time and adapt the actual finger trajectory accordingly. Every manipulation success depends on tactile and visual feedback that locomotion doesn't need.

The numbers that quantify this. A typical humanoid leg has 6 degrees of freedom per side; legged locomotion has been generating viral demos since 2018. A typical humanoid hand has 15–22 DoF per side; manipulation videos as compelling as quadrupeds doing parkour have not yet been produced. Locomotion is a 12-DoF problem with bounded contact uncertainty; manipulation is a 30–44-DoF problem with open contact uncertainty. The dimensionality and uncertainty scaling alone explain most of the field's relative progress.

9.2   The grasping problem — form closure, force closure, and the search space

Two centuries of grasp theory

Reuleaux's classification still names the categories every robot uses ↓ ANIMATED

Grasping is one of the oldest mechanical engineering problems. Franz Reuleaux (1829–1905) classified mechanical contacts into closure types in the 1870s, and the categories he named are still how modern robotic grasp planners think. Form closure means the object is geometrically constrained — a marble in a cupped hand can't move regardless of friction. Force closure means the object is held by friction — a coffee cup pinched between thumb and forefinger stays put as long as the friction is high enough. Most real grasps are a mix: a power grasp on a hammer is mostly form closure (the fingers wrap around it); a precision grasp on a pencil is mostly force closure (held by fingertip friction).

Four canonical grasp types arranged by closure dominance. Power grasp (hammer): fingers wrap around object, dominated by form closure, high payload. Precision pinch (pencil): held between fingertip pads, dominated by force closure, high precision. Lateral pinch (key): thumb opposes side of index, hybrid closure. Tip prehension (small bead): only fingertips contact, pure force closure, lowest payload but highest dexterity. Modern dexterous hands target all four; most fail at #4 because tactile feedback isn't precise enough.
POWER hammer · wrap closure: ~80% form payload: high precision: low PRECISION PINCH pencil · pad-to-pad closure: ~60% force payload: low precision: high LATERAL PINCH key · thumb-to-side closure: ~50/50 hybrid payload: medium precision: medium TIP PREHENSION bead · tip-to-tip closure: ~95% force payload: very low where most robots fail

The challenge that scales with the dexterity of the grasp is the search space. A 22-DoF hand approaching an arbitrary object has, theoretically, an infinite number of possible grasps. Real grasp planners narrow this — the analytical approach uses contact-mechanics theorems to enumerate force-closure grasps for known geometries; the data-driven approach uses learned models trained on millions of human or simulated grasps to predict good grasp poses. Both produce candidate grasps; both must be evaluated for collisions, kinematic reachability, and stability.

The Dex-Net family (Berkeley AUTOLab, Ken Goldberg, 2017–2021) defined the data-driven baseline for parallel-jaw grippers, training neural networks on millions of synthetic depth-image-to-grasp-quality pairs. Modern dexterous-hand grasp planners — GraspGen, UniGrasp, AnyGrasp — extend this to 5-finger configurations using diffusion models or VLA-based grasp prediction. Performance benchmarks: success rates on novel objects in cluttered bins now exceed 85% for parallel-jaw, ~60–70% for 5-finger dexterous in 2026 — the dexterous-hand gap is the visible reflection of the open contact uncertainty problem.

9.3   Fingertip sensing — the GelSight era

What §4.4 sets up

tactile sensing as the gating bottleneck on dexterous manipulation SOFTWARE

§4.4 introduced the five tactile sensor branches. §9 is where their consequences land. Without high-resolution tactile feedback, dexterous manipulation is fundamentally guess-and-check. A robot that grasps an unknown object visually — even with perfect depth — has no idea whether the grip is firm enough until it tries to lift, by which point it's too late to adjust without dropping. Humans solve this with a tactile feedback loop running at roughly 50 Hz between fingertip mechanoreceptors and motor cortex; modern humanoids approximate it with various combinations of vision-based gel sensors, magnetic skin, capacitive arrays, and barometric caps.

The Meta–GelSight Digit 360, announced October 2024, became the de facto research-grade fingertip in 2026. 18+ sensing modalities packed into a fingertip-shaped puck, detecting forces as small as 1 millinewton. Resolution: ~0.1 mm spatial, ~30 Hz update rate (camera-framerate-limited). The follow-on commercial integrations — Figure 03's palm cameras + fingertip force sensors detecting 3-gram forces, Tesla's tactile-sensing fingertips on every Gen 3 finger — are products of the GelSight design language even when they don't use GelSight directly.

The honest read on tactile in 2026: vision-based gel wins on resolution, loses on speed; magnetic and capacitive win on speed, lose on resolution; barometric wins on robustness, loses on spatial detail. Production humanoids increasingly use 2–3 modalities per fingertip, layered: a thin capacitive grid on the outer skin for fast-touch detection, a deeper barometric or magnetic cap for force estimation, sometimes a vision-based gel pad on the contact surface for high-resolution shape inference. The Tesla Gen 3 hand's "tactile sensing in all fingers" is a marketing claim that hides considerable architectural diversity in what's actually deployed.

The tactile-VLA integration story is what makes 2026 different from 2023. The first generation of VLAs was vision-only — they read camera images and emitted actions, with no tactile signal in the loop. Modern VLAs (π0.5, GR00T N1, Helix) increasingly fuse tactile inputs through additional input tokens or dedicated branches. Touch-conditioned policies are the active research frontier — policies that adapt grasp force in real time based on slip detection from fingertip sensors. This is the layer where the next 12 months of manipulation progress will land.

9.4   Dexterous hand designs — actuators, tendons, and the 22-DoF question

The mechanism wars

direct-drive in finger · tendon-driven from forearm · hybrid · the trade-offs ↓ ANIMATED

A human hand has 27 degrees of freedom. The most ambitious robotic hand in 2026 — Tesla Optimus Gen 3 — has 22, with another 3 in the wrist/forearm. The reference research hand — Shadow Dexterous Hand — has 24. The actuation architecture decision dominates the rest of the hand's properties.

Two architectural approaches to a high-DoF hand. Direct-drive in finger (left): each joint has its own motor + small reduction inside the finger. Compact wiring, high bandwidth, but the finger must carry its own actuators and is therefore heavier and bulkier. Tendon-driven from forearm (right): motors live in the forearm, pulling cables routed through the wrist into each finger joint. The Tesla Gen 3 architecture — fingers are light and slim because actuators are remoted; complexity moves to the cable routing and wrist friction management.
DIRECT-DRIVE IN FINGER Shadow Dexterous Hand · DLR/HIT II forearm (empty) motors at every joint finger weight heavy (carries own motors) bandwidth high · short electrical path complexity in finger packaging backdrivability excellent (gear ratio low) used in research-grade hands · expensive · mature TENDON-DRIVEN FROM FOREARM Tesla Optimus Gen 3 · Allegro Hand 22 motors tendons route through wrist finger weight light · slim · human-like bandwidth limited · cable stretch + friction complexity in cable routing & wrist backdrivability poor at distal joints used in commercial humanoids · cheaper · mass-manufacturable

Tesla's choice of tendon-driven architecture was confirmed by patent filings made public in early 2026. Each finger has 4 degrees of freedom, wrist adds 2 more, totaling 22 DoF for human-like dexterity. Three flexible cables per finger route through wrist and guided channels in phalanges for precise, independent motion. Advanced wrist innovation shifts cables from lateral to vertical stack, minimizing friction, stretch, torque, and crosstalk. The wrist redesign is the engineering centerpiece — tendon hands fail when wrist friction varies with wrist angle, and Tesla's vertical-stack arrangement is specifically designed to minimize this coupling.

The competing approaches haven't disappeared. Allegro Hand (Wonik Robotics, Korean) ships ~$20K research-grade tendon hands with 16 DoF. Shadow Dexterous Hand (UK, the historical research reference) ships 24 DoF tendon hands at $100K+ for academic labs and military programs. DLR/HIT Hand II (German Aerospace Center + Harbin Institute of Technology) is the academic gold standard for direct-drive-in-finger architecture. Inspire Hand (Chinese, ~$5K) and SCHUNK SVH are the cost-tier commercial options. The honest read: the 22-DoF tendon-driven hand from Tesla's forearm is, mechanically, a 1990s-vintage research design that finally got the manufacturing engineering it needed to mass-produce. Innovation in 2026 is not new mechanisms but new manufacturing.

9.5   The algorithmic stack — analytical, learned, and the gap between them

Three approaches that don't yet talk to each other

each works in a regime · none works everywhere · most production stacks blend two SOFTWARE

Manipulation algorithms split into three philosophies that have largely evolved independently. Analytical methods use first-principles physics — force closure, friction cones, contact-mechanics theorems, trajectory optimization on rigid-body dynamics. The 1980s–2010s mainstream. They work brilliantly when the object's geometry, mass, friction, and the robot's contact dynamics are all known precisely; they fail when any of these are uncertain. Learned policies — diffusion policies, behavior cloning from teleoperation, sim-to-real RL — work in regimes where the dynamics are hard to model but lots of demonstration or simulation data exists. Foundation-model approaches — VLA-driven grasp prediction — sit on top, providing semantic grasp selection ("which object should I grab and how") that the lower layers execute.

The split that organizes 2026 production: VLA on top → diffusion policy in the middle → analytical refinement underneath. The VLA picks "grab the red mug from the table"; the diffusion policy generates a candidate finger trajectory; the analytical layer refines the final approach using accurate depth estimates and contact-mechanics constraints. This three-layer hybrid is what most modern humanoid manipulation stacks actually run, even when their marketing says "end-to-end VLA."

The honest research frontier is contact-rich manipulation — tasks where the robot must apply controlled force across multiple contact points and update its grip mid-motion. Folding clothes, kneading dough, threading a needle, opening a pickle jar. None of these are robustly solvable in 2026, even with the full hybrid stack. The bottleneck is not algorithmic in any single layer; it's the integration across layers. The VLA emits a high-level intent at 10 Hz; the diffusion policy emits trajectories at 50 Hz; the analytical refinement runs at 1 kHz. When the object's contact state changes mid-motion — a glass slipping, a fabric folding unexpectedly — getting that information from the tactile sensor (1 kHz) up to the policy (50 Hz) and possibly back to the VLA for replanning (10 Hz) is the timing problem the field is actively working.

The data story matters here too. Cross-embodiment datasets — Open X-Embodiment (Google, 2023, 1M+ trajectories from 22 robot embodiments), DROID (Stanford, 2024, focused on dexterous manipulation), the LeRobot data ecosystem (Hugging Face, the open-source standard) — are what enable foundation-model-style training to transfer across robots. The bet: a policy trained on a million demonstrations across 20 robot types will outperform a policy trained on 10,000 demonstrations on one specific robot. Empirically, this bet has paid off for high-level scene understanding and goal selection; less so for the contact-mechanics-precision regime where the embodiment-specific physics dominate.

Synthesis

The dexterity gap22 DoF in hardware · 7 DoF in usable behavior

A 2026-class humanoid hand has 22 mechanical degrees of freedom. The same hand, in production-deployed manipulation, uses about 7 of them effectively. The 15-DoF gap between hardware capacity and usable behavior is the field's open frontier. The closing argument of §9 is that the hardware has run ahead of the algorithms, the algorithms have run ahead of the data, and the data has run ahead of the integration. None of these gaps closes by working harder on any single layer.

Hardware capacity Algorithmic frontier Production deployment Honest 2026 reality
Degrees of freedom 22 (Tesla Gen 3) · 24 (Shadow) Up to ~16 actively controlled in research demos ~7 used effectively in shipped tasks Half the hand sits passive in most grasps
Grasp success Mechanically capable of all 4 grasp types Power + precision pinch: solid; lateral pinch: ok; tip prehension: poor Power grasp on known objects: >95%; novel dexterous: <30% "General-purpose dexterity" is aspirational
Tactile feedback Sub-millinewton sensing available (Digit 360) Touch-conditioned policies emerging (π0.5, GR00T) Most production grasps run open-loop after initial visual approach Tactile data isn't yet pervasively in the loop
Contact-rich tasks All four grasp types mechanically possible Folding, threading, twisting: research-only Pick-and-place + simple insertion in production The unsolved part of manipulation is contact-rich
Object generalization Hand can adapt to any geometry mechanically VLAs generalize to novel objects with 50–60% reliability Production stacks restrict to known SKU sets "It works on anything" is not yet honest

The 22-DoF hand is, in 2026, the most under-utilized expensive component on a humanoid. The bottleneck isn't the hand — it's the rest of the stack catching up to what the hand can already do. Manipulation progress in the next 24 months will come from closing the integration gap: tactile data flowing into VLAs that emit grasps that diffusion policies refine that analytical controllers execute, with the failure modes from each layer flowing back upward fast enough to recover before the object falls. The field that's been hardware-limited for thirty years is now algorithm-limited and data-limited — which is, on balance, a much better problem to have.

§10 · The solved problem

Locomotionwhy walking is easier than picking up a tomato

§9 ended on the fact that locomotion is essentially solved. This section explains why. The answer is not "robotics got smarter about walking" — it's "walking is a structurally easier problem than manipulation, and the field had a thirty-year head start." This section walks (the pun is unavoidable) the locomotion taxonomy: wheels, tracks, legs, flight; the mathematical foundation of biped balance (ZMP, capture point, divergent component of motion); the MPC-on-quadrupeds revolution that ended the classical-control era; the dynamic-vs-static-balance split; and the honest 2026 status — where humanoids are running half-marathons but still tripping on doorways. Locomotion in 2026 is at the bottom of the S-curve where manipulation will be in 2032: the algorithmic stack is mature, the demos are convincing, and the remaining problems are integration and edge cases rather than open research.

10.1   The locomotion menu — wheels, tracks, legs, flight

Four substrates, four physics regimes

each one optimal somewhere · the choice is downstream of where the robot has to go ↓ ANIMATED

The locomotion choice is one of the most decisive architectural commitments a robot makes. It determines what surfaces it can cross, how fast it can move, how much it weighs, what battery life it gets, and what humans will accept it doing in their environment. Four families dominate.

Four locomotion paradigms compared on the dimensions that actually matter for robot design. Wheels: most efficient on flat surfaces, useless above a 30° slope or on stairs. Tracks: handle rough terrain but weigh significantly more for the same payload. Legs: handle anything, at the cost of complex control and high power draw. Flight: optimal for sparse 3D coverage where ground travel is impossible. Most 2026 humanoids choose legs not because they're best, but because human-shaped environments require human-shaped traversal.
WHEELS most efficient · flat only efficiency ★★★★★ terrain ★☆☆☆☆ stairs none use AGVs · warehouse ~80% of mobile robots in industrial use TRACKS rough terrain · heavy efficiency ★★☆☆☆ terrain ★★★★☆ stairs limited use military · construction EOD · agriculture surveillance LEGS all terrain · expensive efficiency ★★☆☆☆ terrain ★★★★★ stairs native use humanoids · quadrupeds human-shaped spaces designed for legs FLIGHT 3D · short endurance efficiency ★☆☆☆☆ terrain ★★★★★ (3D) runtime ~30 min use aerial · inspection DJI · Skydio · military

The honest read on the menu in 2026: wheels are best when the floor is flat, legs are best when humans share the space, tracks are best when terrain is brutal, flight is best when ground access is impossible. Most warehouse robots are wheeled; most outdoor inspection robots are tracked or legged; most humanoid robots are bipedal because they're trying to operate in spaces designed for humans. The bipedal-vs-quadruped split is itself meaningful: quadrupeds (Boston Dynamics Spot, Unitree Go2/B2, ANYbotics ANYmal) are dramatically more stable and efficient than bipeds, which is why every commercial industrial-inspection robot is four-legged. Bipeds win only when the deployment environment specifically requires human form factor — kitchens, ladders, doorways, vehicle cabs designed for humans.

10.2   ZMP and the capture point — the math of staying upright

How a biped knows it's about to fall

Vukobratović 1972 · Pratt 2006 · the math that classical balance is built on ↓ ANIMATED

Walking is a controlled fall. A biped's center of mass is constantly accelerating out of equilibrium, and each footstep arrests the imbalance just before it becomes catastrophic. The mathematical formalization of this dates to Miomir Vukobratović (Serbian mathematician, 1972), who introduced the Zero Moment Point — the point on the ground where the net horizontal moment about the support polygon is zero. If the ZMP stays inside the support polygon (the convex hull of the robot's foot contacts), the robot is statically balanced. If it leaves, the robot is in free-fall toward the side it's leaving on.

ZMP became the workhorse of bipedal walking control from the 1970s through the 2010s. Honda's ASIMO, Sony's QRIO, Aldebaran's NAO and Pepper — all built on ZMP-based gait planning. The approach: plan a center-of-mass trajectory in advance such that the ZMP stays inside the support polygon throughout. Use inverse kinematics to convert that into joint trajectories. Track the joint trajectories with high-gain PID. The result: stable, but mechanical, "walking-around-eggshells" gaits that became visually iconic of pre-2020 humanoids.

A biped seen from above. The support polygon (cyan) is the convex hull of the foot contact patches. The center of mass projection (orange) is where the robot's mass center maps to the ground plane. As long as the ZMP (which closely tracks the CoM projection during slow walking) stays inside the polygon, the robot is statically balanced. Walking is the motion of stepping forward to extend the polygon faster than the CoM falls outside it. The capture point (purple) is where you'd need to step to catch the fall in one stride.
TOP-DOWN VIEW · STATIC BALANCE STABLE support polygon CoM ⊥ inside polygon → balanced FALLING — capture point shown CoM capture point step here to catch fall CoM velocity capture point = CoM projection + (CoM velocity / √(g/h)) · the foot must reach there before the body does

The conceptual breakthrough that displaced ZMP-only thinking was Jerry Pratt's capture point (MIT, 2006) — the point on the ground where, if you stepped there immediately, your forward momentum would be exactly arrested by the new support. Capture point analysis turned biped walking from "plan a CoM trajectory that doesn't fall over" into "constantly compute where the foot needs to land to prevent the impending fall." The Divergent Component of Motion (DCM) generalizes this — a 3D quantity that grows exponentially during free-fall and is exactly cancelled by a foot placement at the right location. The capture point reframed walking from a static-balance problem (don't leave the polygon) into a dynamic-balance problem (catch yourself before you hit the ground), which is what walking actually is.

Modern humanoid controllers — the MPC + WBC architecture from §5.4 — use capture point and DCM as state variables in the optimization. The MPC plans where the next 4–8 footsteps will go to keep the DCM bounded; the WBC executes joint torques to track that footstep plan in real time. The combination is why 2026 bipeds walk smoothly and recover from disturbances — they're constantly recomputing the capture point, and stepping to it.

10.3   The MPC quadruped revolution — how 2018 changed everything

What MIT Cheetah and ANYmal proved

convex MPC at 1 kHz · disturbance recovery without explicit terrain modeling SOFTWARE

The 2018-era breakthrough that ended classical locomotion and ushered in modern quadrupedal robotics was convex MPC at kilohertz rates, demonstrated independently by MIT (Sangbae Kim's Cheetah) and ETH Zurich (Marco Hutter's ANYmal). The key insight was simplification of the dynamics. The single-rigid-body model — treating the quadruped as a floating mass with rigid leg contacts that produce ground reaction forces — turns the legged locomotion problem into a convex quadratic program small enough to solve at 1000 Hz on a laptop CPU. Plan ground reaction forces over the next 100 ms, project to joint torques, dispatch to the QDD actuators, repeat.

The dramatic property of MPC-controlled quadrupeds is robustness to unmodeled disturbances. A classical ZMP-style controller fails predictably when something it didn't plan for happens — a kick, a bump, an unexpected slope. An MPC controller re-solves the optimization at every timestep with the new state, so it absorbs the disturbance into the next plan. The viral 2019 videos of Spot recovering from kicks, ANYmal navigating rubble, and the MIT Cheetah doing backflips were all products of this single algorithmic shift. Quadrupeds went from "interesting research curiosity" to "shipping commercial product" in the four years from 2018 to 2022, almost entirely because of MPC.

The same machinery extended to bipeds, with one major caveat. A quadruped has 12 active actuators (3 per leg × 4 legs) and is statically stable when standing — it can balance with no controller running. A biped has 12 active actuators (6 per leg × 2 legs) and is statically unstable — it falls if the controller fails for more than a few hundred milliseconds. MPC works for bipeds, but the tolerance for solver failure or numerical ill-conditioning is much smaller. The two-layer MPC + WBC architecture (§5.4) emerged specifically because the biped case needed the WBC layer's instantaneous reactivity in addition to MPC's planning horizon.

10.4   Reinforcement learning locomotion — sim-to-real and the Unitree wave

The 2022 transition that made parkour normal

RL policies trained in simulation now ship in commercial humanoids SOFTWARE

The second locomotion shift — from MPC to reinforcement learning policies — happened roughly 2022–2024. The key enabler was massively-parallel simulation: NVIDIA Isaac Gym (and later Isaac Lab) running 4096 quadruped or biped simulations in parallel on a single GPU, accumulating millions of timesteps of training data per hour. Train a neural-network policy on this stream — observation in, joint actions out — using PPO or similar RL algorithms, then deploy the resulting network on the real robot. The Unitree H1 holds the record for the fastest bipedal humanoid, reaching speeds of 13 km/h (about 8 mph), and that record was set with an RL-trained controller, not a classical MPC one.

RL controllers have advantages MPC can't easily match. They're robust to broader noise distributions (because they're trained on randomized environments). They handle high-dimensional sensorimotor coordination (the network learns the right action without an explicit dynamics model). They produce visibly more natural-looking gaits (because the reward function can include human-likeness terms). RuN achieves stable, natural gaits and smooth walk-run transitions across a broad velocity range (0–2.5 m/s), outperforming state-of-the-art methods in both training efficiency and final performance — that's a March 2025 result on the Unitree G1 platform, and the kind of performance that's now routine in 2026.

The disadvantages: RL controllers fail unpredictably when out-of-distribution. A policy trained on flat ground walks confidently across flat ground; the same policy on slick ice or a tilted surface it never saw in simulation may produce confidently wrong actions that result in spectacular falls. Mitigation strategies — broader domain randomization in training, residual policies (combining RL with classical control), online adaptation — work; none fully solves the OOD problem. The 2026 mainstream is hybrid: classical MPC handles the well-modeled core (flat-ground walking, balance), RL handles the regimes where modeling fails (rough terrain, high-speed running, fall recovery).

The cultural marker: in April 2026, Chinese humanoid robots competed in the Beijing E-Town Half-Marathon, with Honor's "Lightning" robot winning in approximately 50 minutes. A biped that completes a 21-km run autonomously is a different kind of locomotion benchmark than the lab-bound walking demos of three years ago. The Beijing half-marathon is to bipedal robotics what the DARPA Grand Challenge was to autonomous vehicles: not the most useful thing the technology will ever do, but a public demonstration that the foundational capability is real.

10.5   The honest 2026 status — solved, with footnotes

What works · what doesn't · what nobody talks about

locomotion is solved · the failures that remain are interesting DATA

Locomotion is, on balance, solved. A 2026 biped walks, runs, climbs stairs, recovers from kicks, and operates in unstructured outdoor terrain. The remaining failures are interesting because they're now edge cases rather than core problems.

What works in 2026: flat-ground walking at 1–3 m/s on every commercial humanoid; running up to 3.6 m/s on the fastest (Unitree H1); stairs in both directions on Atlas, Digit, Optimus, Apollo; uneven terrain navigation on quadrupeds; kick recovery on quadrupeds and most bipeds; sustained walking for hours (Beijing half-marathon as proof); operating in rain, dust, low light (Spot has been doing this since 2019); jumping over obstacles up to ~30 cm; turning in place; backing up; sidestepping.

What's still hard: very high-friction surfaces (sand, deep snow); very low-friction surfaces (ice, polished marble); ladders (most humanoids can't climb a ladder reliably); tight doorways (footstep planning fails when both feet won't fit through simultaneously); carrying heavy loads while walking (the COM shift breaks gait planning); running while manipulating objects (the dual-task interference is severe); recovering from falls more complex than "land and stand back up." The Atlas videos that remain genuinely impressive in 2026 — gymnastic-style flips, parkour vaults — are still scripted teleoperation rather than autonomous capability.

What nobody talks about: the bipedal-robot graveyard. ASIMO (retired 2022 after 22 years), HRP-4 (retired 2023), QRIO (retired 2006), Pepper (production discontinued 2021). Most "household humanoid" attempts of the 2010s shipped, achieved minimal commercial traction, and were quietly wound down. The current humanoid wave (Optimus, Figure, Apollo, Digit, Unitree H1, 1X NEO) is the second or third wave at a problem the field has been working on for thirty years. The new ingredient is foundation models on top of the locomotion stack, not the locomotion stack itself.

The field's transition is from "can the robot walk" to "is the robot useful." Locomotion is no longer the bottleneck on humanoid deployment — manipulation is (§9), and the integration with task-level reasoning (§7, §8) is. When a 2026 humanoid program fails to deploy, the failure is rarely "the robot can't walk to where it's supposed to go." It's "the robot got there and couldn't pick up the thing."

Synthesis

Locomotion vs manipulationtwo problems, opposite trajectories

The framing that organizes §9 and §10 together: locomotion and manipulation are inverses of each other. Locomotion is the mature problem with shipped solutions; manipulation is the active research frontier. The asymmetry is structural, not coincidental, and the table below makes the contrast explicit.

Locomotion (2026) Manipulation (2026)
Degrees of freedom 12 per biped (6 per leg × 2) 22+ per dexterous hand
Contact set Small, predictable (foot soles on ground) Open, varies per object and grasp
Friction parameters Stable across known surfaces Vary continuously across contact patch
Failure mode Fall — recoverable, low-cost Drop or crush — high-cost, sometimes catastrophic
Success rate (2026) >99% on flat ground · >90% on uneven terrain >95% power grasp on known objects · <30% novel dexterous
Algorithmic stack MPC + WBC + RL (mature) VLA + diffusion + analytical (early)
Sensor reliance IMU + encoders + occasional vision Tactile + vision + force/torque (multi-modal)
Maturity Solved · shipping in commercial humanoids Active research · 7 of 22 DoF used effectively
Field trajectory Bottom of S-curve · incremental progress Steep middle of S-curve · large gains possible

Why is locomotion solved and manipulation not? Three structural reasons. Dimensionality: 12 vs 44+ DoF. Contact uncertainty: bounded vs open. Failure cost: low vs high. None of these will reverse — manipulation will always be harder than locomotion, and the gap will close not by manipulation getting easier but by the field building the algorithmic and data infrastructure to handle the harder problem. A 2026 humanoid is two robots in one chassis: a competent walker carrying an incompetent grasper. The walker is shipped product. The grasper is research. The next 24 months of humanoid progress are predominantly about closing that asymmetry.

§11 · The market

Form factorseight robot industries, one umbrella term

"Robot" is a category error. The word covers a $200 billion industrial arm business, a $15 billion mobile-robot logistics market, a fast-growing cobot segment, $10 billion of consumer vacuums and mowers cleaning households worldwide, surgical robotics dominated by a single company, drone fleets at every airport perimeter, and the headline-grabbing humanoid wave that's still mostly pre-revenue. Each is its own industry — different physics, different customers, different regulators, different competitive moats. A 2026 humanoid program is not "entering the robotics market." It's entering one specific robotics market, with eight adjacent ones running parallel that mostly don't share suppliers, customers, or technology stacks. This section walks the eight, ranked roughly by current revenue.

11.1   Industrial arms — the $90B incumbent

Forty years of installed base

FANUC, ABB, Yaskawa, KUKA, Kawasaki — the Big Five who built the modern factory DATA

The industrial robot arm is the substrate of modern manufacturing. The global industrial robot market was valued at $81.78B in 2025, projected to reach $89.81B in 2026 and $208.68B by 2035 at a CAGR of 9.82%. Five companies dominate: FANUC (Japan, the largest by revenue, 400,000+ active installations worldwide), ABB (Swiss-Swedish, the European leader), Yaskawa (Japan, particularly strong in welding and arc applications), KUKA (German, since 2016 owned by China's Midea Group), and Kawasaki Heavy Industries (Japan, automotive-focused). Together these five hold 70%+ of installed base globally.

The application split is automotive-dominant. Automotive leads the industrial robot market with over 38% share, driven by automation in welding, painting, and assembly. Electrical and electronics applications follow closely at 26%, dominated by semiconductor handling, PCB assembly, and inspection tasks. A modern automotive plant has 1,500–4,000 industrial robots; a smartphone factory 5,000–10,000. The robots are mostly 6-DoF articulated arms (the canonical form), with SCARA arms (4-DoF, faster, used for pick-and-place electronics) and delta robots (parallel-kinematic, used for high-speed packaging) filling out the long tail.

Geography matters. Asia-Pacific accounts for over 66% of installations — China alone installed more industrial robots in 2024 than the rest of the world combined. The geography of installed base is not the geography of vendors: Japanese manufacturers ship most of their volume to Chinese factories. The political consequence is real: when KUKA was acquired by Midea, Germany changed its foreign-investment rules; when FANUC's reliance on Chinese factories became geopolitically uncomfortable, the company began diversifying production to Mexico.

The industrial-arm market is mature, high-margin, and structurally different from humanoid robotics in almost every way. An industrial arm sits bolted to the floor, doing one task forever, in a cell with safety cages around it. A humanoid walks around, does many tasks, shares space with humans. The technical overlap (motors, encoders, controllers) is real but smaller than the press coverage suggests. The Big Five are not the natural sellers of humanoids; they're the incumbents the humanoid wave is trying not to disrupt — they sell into the same factories but for different jobs.

11.2   AMRs & AGVs — the wheeled $5B that became $30B

What e-commerce did to mobile robots

Amazon's Kiva acquisition was the canary; warehouses are now automated by default DATA

The fastest-growing pre-humanoid robot segment is wheeled mobile robots — split into two classes that the industry distinguishes carefully. AGVs (Automated Guided Vehicles) follow fixed paths marked by tape, magnetic strips, or laser reflectors; they're the older technology, dating to the 1950s, and they still ship at volume because they're cheap and reliable. AMRs (Autonomous Mobile Robots) navigate dynamically using SLAM, LIDAR, and onboard computer vision — the technology that emerged in 2014–2018 and ate the AGV market's growth.

The market sizes vary by source — the autonomous mobile robots market is estimated at $4.74B in 2025, expected to reach $5.49B in 2026 and $14.04B by 2033 at 14.4% CAGR. Including AGVs and broader categories, the mobile robots market is estimated at $8.64B in 2025 and is expected to reach $30.24B by 2030 at 28.48% CAGR. The AMR-dominant category is growing roughly twice as fast as the broader industrial robot market, with manufacturing and warehousing as the dominant verticals.

The companies you should know: Geek+ (Chinese, the global leader in goods-to-person warehouse fleets), Locus Robotics (US, the e-commerce specialist; DHL Supply Chain surpassed 500 million collaborative picks after tripling its Locus fleet), Symbotic (US, Walmart's automation partner), Mobile Industrial Robots / MiR (Danish, owned by Teradyne), OTTO Motors (Canadian, factory-floor focused), KUKA AMR division (now significant), Omron / Adept (Japanese-American). The Chinese players (Geek+, Hai Robotics, Quicktron) are particularly aggressive on price and have driven significant deflation in the segment.

The flagship deployment of 2026: Walmart committed $22 billion to five automated grocery campuses averaging 700,000 ft², positioning two-thirds of its stores to rely on robotic fulfillment by early 2026. The Walmart commitment is the canonical proof that AMR-driven warehousing has crossed the chasm — when one of the world's three largest companies is rebuilding its physical infrastructure around mobile robots, the technology is no longer experimental. AMRs are the form factor that paid for everyone else's research budgets — the cash flow from warehouse automation funds the humanoid programs that the same companies are running in adjacent labs.

11.3   Collaborative robots — the cobot half-decade

What Universal Robots created

$3B → $11B in seven years · the first robot category that didn't need a safety cage DATA

The collaborative robot — "cobot" — is a 6-axis arm specifically engineered to work alongside humans without a safety cage. The category was effectively created by Universal Robots (Danish, founded 2005, acquired by Teradyne in 2015 for $285M) with the UR5 and UR10 arms in 2009. The trick: limit speed and force enough that an unexpected human collision is uninjurious, even before considering force-sensing safety stops. The market response was unexpectedly large — small and medium manufacturers who could never afford a $200K caged industrial robot would happily buy a $35K cobot they could roll up to a workbench.

Market trajectory: the collaborative robots market is projected to expand from $2.8B in 2026 to $10.9B by 2033, registering a CAGR of 21.4%. Slower growth than AMRs but from a smaller base, and structurally different — cobots sell to SMEs (small and medium enterprises) where industrial arms can't reach, expanding the addressable market rather than displacing existing automation.

The competitive landscape is more crowded than in industrial arms. Universal Robots / Teradyne (Danish, the original, ~50% global share), FANUC (the CR-series), ABB (the YuMi line, dual-arm cobots specifically for electronics), KUKA (the LBR iiwa, German engineering at premium price), Yaskawa (the HC series), Doosan Robotics (Korean, fastest-rising challenger), Techman Robot (Taiwanese, Quanta Computer subsidiary), Aubo Robotics and Jaka Robotics (Chinese cost-tier). The cobot space has been the entry point for non-Japanese-and-German players to break into industrial robotics.

The technology fundamentals connect to earlier sections. Cobots are impedance-controlled (§5.3) — that's what makes them safe. They use joint torque sensors at every joint (§4.3) — that's what gives the impedance controller its inputs. They run ROS 2 increasingly often (§7.3) — that's what makes them programmable by non-roboticists. A modern cobot is the technology stack of a humanoid arm with the sensing and software of a humanoid arm, just bolted to a stand and not packaged with legs. The mechanical and software adjacency is high; the market adjacency is also high (humanoid programs increasingly position their upper-body capabilities as "cobot-equivalent on legs").

11.4   Humanoids — the wave

From zero to a hundred programs

~13,000 units shipped in 2025 · all the volume from China · all the press from US DATA

The humanoid market that this guide is fundamentally about. Every section so far has been about humanoid components or capabilities. Here's the market data those sections fit into. Global humanoid shipments topped 13,317 units in 2025 and are accelerating fast, with Chinese manufacturers claiming 87% of volume. The 87% Chinese share is the single most important statistic about the humanoid industry in 2026.

The 2026 program landscape, ranked by 2026 production target rather than press coverage: Unitree (Chinese, 5,500+ units shipped 2025, targeting 10–20K in 2026, the volume leader by a wide margin); UBTECH (Chinese, ~1,000 units in 2025, vertically integrated, public-listed); AGIBOT / Zhipu (Chinese, has hit a 10K-unit milestone); XPENG IRON, Fourier GR-1, Booster Robotics, Robot Era (rest of China, mostly research/early commercial); Tesla Optimus (US, internal use only in 2026, target 50K units by year-end, conversion of Fremont Model S/X production lines committed); Figure (US, $39B valuation, BMW deployment proving out, 12K-unit BotQ factory target); Boston Dynamics electric Atlas (US, Hyundai partnership, 30K/year target by 2028, currently fully committed to Hyundai and Google DeepMind); Agility Digit (US, RoboFab Salem 10K/year capacity, 100K+ totes moved at GXO warehouses, the only commercially-revenue-generating program); Apptronik Apollo (US, Mercedes and Jabil partnerships); 1X NEO (Norwegian-American, consumer preorders open at $20K or $499/mo, US deliveries 2026); Sanctuary AI Phoenix (Canadian, Microsoft + NVIDIA backed, hydraulic hand specialty); Neura 4NE-1 (German, EU-positioned).

Pricing splits clearly. Sub-$20K tier: Unitree G1 ($16K), Unitree R1 ($5.9K), Unitree H1 ($90K — that one's an outlier), Tesla Optimus target ($20–30K). Mid tier: 1X NEO ($20K), Apollo TBD, Figure 02 estimated ($30–50K). Premium tier: Boston Dynamics electric Atlas (price unpublished, institutional only), Agility Digit ($250K+), Sanctuary Phoenix (institutional). The pricing range — $5.9K to $250K+ for what's nominally the same product category — tells you the market hasn't settled on what a humanoid is for, who buys it, or what price-point the volume sits at.

Application clarity is improving. Every humanoid robot company claims to be building a general-purpose robot. Every actual deployment is hyper-specialized: totes for Digit, parts kits for Apollo, battery cells for Optimus, car parts for Figure 02. The honest read: the humanoid form is currently being deployed as a flexible single-purpose robot, with the "general purpose" claim deferred to the foundation-model layer. That's exactly what §7 and §8 predicted — the hardware ships first, the software flexibility comes later.

11.5   Surgical robotics — the duopoly that became a monopoly

Intuitive Surgical's twenty-year run

$8B in revenue · 6,000+ da Vinci installations · the highest-margin robotics business in the world DATA

The most profitable robotics market in the world is one most coverage misses. Intuitive Surgical (US, founded 1995) ships the da Vinci surgical system — a 4-arm teleoperated platform for minimally invasive surgery. ~6,000+ systems installed globally, ~2 million procedures per year, $7.7B revenue (2024), gross margins north of 65%. Each system sells for $1.5–2.5M; consumables (instruments, drapes) generate recurring revenue at 50%+ margin. The business model is more like Apple than like industrial robotics.

Competitors exist but have struggled. Medtronic Hugo (US, launched 2019, slow uptake), CMR Surgical Versius (UK, modular four-arm system, growing in Europe), Asensus Surgical Senhance (US-Italy, formerly TransEnterix), Stryker Mako (orthopedics-focused, the strongest niche challenger), Vicarious Surgical (US, single-port miniaturized concept). The Chinese surgical market is growing rapidly with domestic vendors (Microport / Toumai, Edge Medical) but stays largely separate from the Western market due to regulatory differences.

The technical stack is unusual. Surgical robots are teleoperated by surgeons — they're not autonomous. The control problem is motion scaling and tremor cancellation: translate the surgeon's hand motion (centimeters) into instrument motion (millimeters) while filtering 8–12 Hz physiological tremor. The mechanical engineering is precision-medical-grade — every part sterilizable, every motor backlash-free, every cable redundant. The regulatory burden is the moat: FDA Class II medical device approval for a new surgical platform takes 5–10 years and $100M+ in clinical trials. Surgical robotics is the form factor with the highest unit economics, the smallest engineering overlap with humanoids, and the largest regulatory moat. It's a separate industry that happens to share a category name.

11.6   Drones & the long tail — flight, exoskeletons, agricultural, undersea

The form factors that don't fit the main story

each its own market · each with its own economics DATA

Three more form factors round out the robotics landscape. Drones (Unmanned Aerial Vehicles) are arguably the largest robot deployment by unit count — DJI alone has shipped tens of millions of consumer and commercial units. The military and inspection markets are the high-value segments: Skydio, Anduril (autonomous systems including aerial), Shield AI, Parrot. The technology overlap with humanoids is moderate (cameras, IMUs, autonomy stacks share heritage; the actuation problem is completely different).

Exoskeletons are wearable robotics — powered orthoses that augment human strength or assist disabled users. Industrial exos (Sarcos, German Bionic, Ottobock) target warehouse workers carrying heavy loads; medical exos (ReWalk, Ekso Bionics, CYBERDYNE) target rehabilitation and mobility for users with spinal cord injuries. The market is small but growing, and the technology stack — actuators, sensors, control — is increasingly shared with humanoid programs. Some humanoid teams (notably Sanctuary AI, Apptronik) have exoskeleton heritage in their founders.

Agricultural robots — autonomous tractors (John Deere fully autonomous models), strawberry pickers (Harvest Croo), thinning robots, precision sprayers. The market is fragmented and crop-specific but growing as agricultural labor shortages worsen. Outdoor mobile robotics with significant compute demands; close enough to AMRs technically that some AMR vendors are expanding into agriculture as a flanking move.

Undersea robotics — ROVs (Remotely Operated Vehicles) and AUVs (Autonomous Underwater Vehicles). The market is dominated by oil-and-gas inspection (Oceaneering, Saab Seaeye, Kongsberg), with growing applications in marine science and offshore wind. The actuation, sensing, and communication problems are unique enough that this segment is largely siloed from the rest of the robotics industry.

Service robots — restaurant delivery (Bear Robotics, Pudu), retail and hospitality (Diligent's Moxi, Pepper's lineage), elder care (a much-promised market that has yet to deliver). The category overlaps with humanoids and with AMRs without being either; commercial traction has been mixed; most ambitious programs (Pepper, Anki Cozmo, Jibo) have been wound down. The service-robot graveyard is the warning the humanoid wave is trying not to repeat.

11.7   Consumer robotics — the segment that's bigger than humanoids and cobots combined

Vacuums, lawn mowers, pool cleaners, window washers

$7B in vacuums alone · 50M+ Roombas shipped · the most-deployed mobile robots on Earth DATA

The form factor most overlooked in coverage of "robotics" is the one most people actually own. Robotic vacuum cleaners — the Roomba lineage and its descendants — are the single largest consumer mobile-robot category. Robotic vacuum cleaner market size in 2026 is estimated at $7.05B, growing from 2025's $6.21B with 2031 projections of $13.29B at 13.52% CAGR. iRobot has shipped over 7.5 million Roomba units in 2023 alone, and cumulative Roomba shipments now exceed 50 million units worldwide. Over 10% of households now own a robotic vacuum, with adoption expected to double in the next five years.

The vendor landscape has shifted hard in the last five years. iRobot (US, founded 1990 by MIT alumni) was the unchallenged leader for two decades; the planned $1.7B Amazon acquisition collapsed in early 2024 after EU antitrust opposition. Roborock (Chinese, IPO'd 2020) surpassed iRobot, claiming roughly 16% global robot-vacuum market share in Q4 2024 — a clean inversion of the market just five years earlier. Ecovacs (Chinese, the broadest portfolio), Xiaomi (Chinese, vertically integrated through Mi ecosystem), Dreame, Eufy / Anker, SharkNinja, Dyson (premium tier), and Samsung / LG (smart-home integrated) round out the credible vendors. Chinese vendors collectively hold the majority of the global market — the same pattern §11.4 described for humanoids, just reached five years earlier in this category.

The category isn't just vacuums anymore. The Chinese vendors in particular are aggressively expanding the sub-categories of consumer mobile robotics. Robotic lawn mowers are the second-largest sub-segment — Husqvarna Automower has dominated this since the 1990s; the 2024–2026 wave (Eufy mowers, EcoFlow Blade, Mammotion, Worx Landroid) brought computer-vision-based boundary detection that eliminates the buried perimeter wire requirement. Robotic pool cleaners are the third — Dolphin (Maytronics), Polaris, Beatbot — a quietly large segment dominated by a small number of specialists. Window-washing robots (Ecovacs Winbot, Hobot) and gutter-cleaning robots are smaller but real. Robotic bartenders, litter-box robots (Litter-Robot is the canonical name and a billion-dollar product), and robotic pet feeders round out the long tail.

The technology stack has converged on a recognizable pattern. Modern consumer mobile robots run SLAM (§8.2), increasingly with LIDAR (§4.5) — Dreame's X50 Ultra has dToF time-of-flight LIDAR. Edge inference for obstacle avoidance and "this is a sock, not dirt" semantic recognition runs on cheap embedded SoCs (Allwinner, Rockchip, the Chinese ARM SoC ecosystem). Cleaning is performed by mechanical brushes, mop pads, suction fans — the actuation problem is small enough to be a footnote. The single most under-noticed connection is that the algorithmic stack of a $500 consumer vacuum and a $20,000 humanoid is recognizably the same — SLAM, vision-based perception, path planning, obstacle avoidance — running on dramatically different silicon. The consumer side benefits from production volumes that make sensor and chip prices crash; that price deflation flows uphill into the commercial robotics segments.

The CES 2026 reveal that connects this segment to the humanoid wave: Roborock announced the Saros Rover, the world's first robotic vacuum with AI-powered wheel-leg architecture that can both navigate stairs and slopes with human-like agility while cleaning them. A legged robot vacuum is, mechanically, a small quadruped with a vacuum bolted underneath — applying the locomotion stack from §10 to the consumer mobile-robot form factor. The early glimpse of legged consumer robots is the kind of segment-crossing event that can rapidly shift expectations about what consumer robotics looks like in 2030.

Synthesis

Eight robot industries"the robotics market" doesn't exist

The framing that holds §11 together: the umbrella term "robotics" describes eight distinct industries that share components, lexicon, and academic departments — and very little else. The customers don't overlap. The competitors don't overlap. The regulators don't overlap. The companies that dominate one segment are usually irrelevant in others. The table below shows the eight at a 2026 snapshot.

Form factor 2026 market Growth Top vendors Primary customer Maturity
Industrial arms ~$90B ~10% CAGR FANUC, ABB, Yaskawa, KUKA, Kawasaki Automotive, electronics factories Mature · 40 yr
Mobile robots (AMR/AGV) ~$8–30B ~25% CAGR Geek+, Locus, Symbotic, MiR, KUKA AMR Warehouses, fulfillment centers Crossing the chasm
Cobots ~$3B ~21% CAGR Universal Robots, FANUC, ABB, Doosan, Techman SME manufacturers Established · ~15 yr
Consumer robotics ~$10B (vacuums + mowers + pool) ~14% CAGR Roborock, iRobot, Ecovacs, Xiaomi, Husqvarna, Dolphin Households (50M+ Roombas in homes) Mature consumer · expanding categories
Humanoids ~$1B (mostly forecast) 100%+ CAGR Unitree, Tesla, Figure, BD, Agility, Apptronik Currently: factory pilots; eventually: services, homes Pre-product-market-fit
Surgical ~$10B ~15% CAGR Intuitive Surgical, Medtronic, CMR Surgical, Stryker Hospitals (teleoperated) Mature near-monopoly
Drones ~$30B+ ~15% CAGR DJI, Skydio, Anduril, Shield AI, Parrot Consumers, military, inspection Mature commercial · contested military
Exoskeletons / agricultural / undersea / service ~$5B combined Variable Highly fragmented Specialized verticals Mixed

Three observations matter. First, humanoids are the smallest current market and the loudest in the press — that asymmetry is what an early-stage market looks like, but it's also what a hype cycle looks like, and the next 24 months will determine which. Second, consumer robotics is already in homes at scale — there are 50 million+ Roombas in households worldwide, which is roughly 4,000× the 2025 humanoid shipment volume; the consumer category has answered the "do people want robots in their homes" question in the affirmative for a narrow vacuum-and-mowing definition, and the open question is how far that definition extends. Third, the cash flowing into humanoid R&D is largely funded by the other seven segments. Tesla's automotive cash flow funds Optimus; Boston Dynamics' Spot revenue funds Atlas; KUKA's industrial arm business funds its humanoid work; Teradyne's UR cobots fund Apollo via Apptronik investments; Roborock's vacuum cash flow now funds its experimental legged-robotics program. A bet on humanoids is implicitly a bet that the cash flowing in from the adjacent industries continues long enough for the foundation-model layer to deliver the general-purpose capability that makes humanoids commercially viable in places industrial arms, AMRs, and consumer vacuums can't reach. The eight industries are independent at the customer layer and intertwined at the capital layer, and that's the shape of the robotics business in 2026.

§12 · The map

Players & geopoliticswhere the robots come from · who controls what

Robotics is a globalized industry, and one of the most globalized of the technology sectors — supply chains span China, Japan, Germany, South Korea, the US, Taiwan, Switzerland, Denmark. But "globalized" is not "borderless." Five geographic hubs each have a distinctive technological signature, and the trade-policy alignment between them is hardening rapidly in 2026. The 2026 robotics map is the map of two emerging blocs — a US-aligned Western alliance with Japan, South Korea, and Western Europe; and a Chinese ecosystem that controls component supply chains, raw materials, and increasingly the volume tier of every consumer-facing robot category. This section walks the players and the political alignment, segment by segment.

12.1   The Big Four — FANUC, ABB, Yaskawa, KUKA

The forty-year incumbents

two Japanese, one Swiss-Swedish, one Chinese-owned-German · structural anchors of the field DATA

The robotics industry's center of gravity for forty years has been four companies. FANUC (Yamanashi, Japan, founded 1972 as a spinoff of Fujitsu) is the largest by revenue — yellow-painted articulated arms in every automotive plant in the world, $7B+ revenue, gross margins exceeding 40%. ABB (headquartered in Zürich, Swedish-Swiss merger 1988) is the European champion, more diversified across electrification and motors than the Japanese pure-plays. Yaskawa (Kitakyushu, Japan, founded 1915) is the welding and arc-process specialist; Motoman is its robotic arm brand. KUKA (Augsburg, Germany, founded 1898) is the orange-arm German engineering brand — and since 2016, owned by China's Midea Group, an acquisition that prompted Germany to tighten its foreign-investment rules. Kawasaki Heavy Industries rounds out what's sometimes called the "Big Five" — automotive-focused, primarily Japan-domestic but with Tier-1 supplier relationships across all major OEMs.

The Big Four / Five together hold ~70% of installed industrial-arm base globally. Their collective response to the humanoid wave has been measured: ABB has shown humanoid concept work; KUKA, through Midea, has invested in Chinese humanoid programs; Yaskawa has a research humanoid (Motoman SDA) lineage but no commercial product. None of the Big Four has fielded a humanoid as their flagship product. The strategic position is "let the humanoid wave figure out whether it's real, and acquire if it is." Given that the Big Four collectively have ~$30B in annual revenue and the entire 2026 humanoid market is ~$1B, the wait-and-acquire posture is rational.

The structural risk to the Big Four is not humanoids — it's the cobot encroachment from below (§11.3) and the AMR encroachment from a different direction (§11.2). Universal Robots / Teradyne has already rewritten the SME-manufacturing tier the Big Four were never strong in. Geek+ and Locus have rewritten the warehouse tier the Big Four don't compete in. The Big Four still own the high-end automotive-and-electronics installation base, but their growth comes from defending that base against price compression rather than entering new categories.

12.2   The Chinese surge — state-backed scale at every tier

From cost-tier challenger to global leader

$300K industrial robots installed in 2024 alone — nearly 10× the US DATA

The single most important geopolitical fact about robotics in 2026 is that China installed 300,000 industrial robots in 2024 alone, nearly 10 times more than the United States during the same period. China holds 61% of robotics unveilings since 2022 and owns 70% of component supply chains. The Chinese position is not a single dominant company — it's a layered ecosystem of dozens of vendors at every tier, backed by sustained state investment, supported by domestic supply chains for upstream components.

The industrial-arm tier has Chinese national champions whose Western-press visibility is much lower than their domestic share. Estun Automation (Nanjing, founded 1993, public) is the largest domestic industrial robot vendor; through its acquisition of Cloos and other German welding specialists, it's now a credible global Tier-2 player. Inovance Technology (Shenzhen) makes industrial automation drives, motors, and increasingly arms. Siasun Robot & Automation (Shenyang) is the Chinese Academy of Sciences spinoff. Efort Intelligent Equipment (Wuhu, founded 2013) has grown rapidly through aggressive pricing.

The humanoid tier is the most visible — and the volume leader. Unitree (Hangzhou, founded 2016) shipped 5,500+ humanoids in 2025 with a 10–20K target for 2026, with the G1 priced at $16K and the H1 at $90K — Unitree is arguably the most-shipped humanoid program in the world. UBTECH Robotics (Shenzhen, public-listed, vertically integrated) is the diversified consumer-and-industrial player. AGIBOT / Zhipu Robotics hit a 10,000-unit milestone, claims "world-leading global shipments and market share," and ships X2 and G2 models in commercial use. XPENG IRON is the EV-maker spinoff. Fourier Intelligence (Shanghai) targets healthcare and rehabilitation. LimX Dynamics, Booster Robotics, Astribot, Robot Era, Leju (Kuavo 4th Gen Pro, demoed at AW 2026 Seoul running NVIDIA Isaac Sim + Jetson) round out a credible second tier.

The consumer-robotics tier — §11.7 — is also Chinese-dominated. Roborock, Ecovacs, Xiaomi, Dreame, Eufy / Anker collectively hold the majority of the global robot-vacuum market. DJI (Shenzhen) is the world's largest drone maker by units shipped. Geek+, Hai Robotics, Quicktron are the AMR cost-tier leaders. The pattern repeats: every robotics segment that's reached volume manufacturing has a Chinese company at or near the top of the global share rankings.

The state-backing dimension is explicit. Beijing has designated specific cities as "humanoid robot capitals" with billions of yuan in subsidies; the Chinese government has stated its intent to address demographic decline through robotics; speculative forecasts suggest China could field approximately 300 million humanoid robots to compensate for its demographic decline. Whether that number is achievable is contested. What's not contested is that no Western government is making investments at remotely comparable scale.

12.3   The US humanoid wave — frontier AI plus venture capital

The press-leader, the hardware-leader, the model-leader

Boston Dynamics, Tesla, Figure, Apptronik, Agility, 1X — six programs, six theses DATA

The US position in 2026 robotics is paradoxical: dominant in software and frontier AI, weak in component manufacturing, dependent on Asian and European suppliers for almost every motor, encoder, and sensor that goes into an American-made humanoid. The competitive strength is the foundation-model layer (§7, §8) and the venture capital ecosystem that can fund 5+ years of pre-revenue R&D for hardware-intensive programs.

The six US programs to know, ranked roughly by current capability: Boston Dynamics (Waltham, MA, founded 1992 as MIT spinoff; passed through Google → SoftBank → Hyundai ownership; current CEO transition from Robert Playter to Amanda McMaster, February 2026) — Atlas is the frontier of dynamic locomotion, electric Atlas is in production ramp with 2026 fleets fully committed to Hyundai and Google DeepMind. Tesla Optimus (Fremont/Austin) — the largest manufacturing ambition (1M units/year target), the most vertically integrated stack, and currently the most credible volume-economics path. Figure (Sunnyvale, $39B valuation, OpenAI partnership now wound down, BMW Spartanburg deployment proving out) — the press leader and the most aggressive on home-robot positioning. Apptronik (Austin, $350M raised, Mercedes pilot, Google DeepMind safety collaboration, Jabil contract manufacturing) — the industrial-pragmatic positioning. Agility Robotics (Salem, OR, Hyundai stake-holder, RoboFab 10K/year capacity) — the only US humanoid program with paying commercial revenue (100K+ totes moved at GXO warehouses). 1X Technologies (Norwegian-American, OpenAI-backed, NEO Beta consumer pre-orders open at $20K or $499/mo) — the consumer-home wager.

The supporting cast: Sanctuary AI (Vancouver — Canadian, Microsoft and NVIDIA-backed, hydraulic-hand specialty), Persona AI, Mentee Robotics, Reflex Robotics, Foundation Robotics Labs. The longer-tail US humanoid landscape includes ~40 funded programs in stealth or early-disclosed phases as of early 2026, most of which won't ship. The funding overhang is significant — total disclosed humanoid funding through 2025 exceeds $10B against ~$0 in revenue from autonomous (non-teleoperated) humanoid deployments.

The structural US advantages are real: NVIDIA's silicon (§7.1, Jetson Thor with the major US humanoid programs as adopters), the Anthropic / OpenAI / Google DeepMind / Meta foundation-model layer (§7.5), and venture capital with patience for 5–10 year hardware bets. The structural US weaknesses are also real: roughly 90 percent of key components still sourced from China, no domestic battery cell production at competitive cost, motor and encoder supply chains routed through Japan and Germany. Every American humanoid is partially Chinese-supplied at the component level, regardless of where it's assembled.

12.4   Japan, Korea, and Europe — specialized strengths, narrowing roles

The other three hubs

Japan dominates components · Korea bets on humanoids via Hyundai · Europe is the regulator DATA

Japan remains structurally central to the field at the component layer. Honda's ASIMO program (1986–2022) defined what humanoid robots looked like for two decades; the program's cancellation in 2022 was a watershed moment. Toyota's research humanoid lineage (T-HR, T-HR3) continues through Toyota Research Institute partnerships with Boston Dynamics on Large Behavior Models. Sony retired QRIO in 2006 but remains a sensor and silicon player. The Japanese strength is precision components — Heidenhain (German, but heavily Japan-routed in the supply chain), Harmonic Drive (Japanese-German JV), Nidec and Nabtesco (precision speed reducers). Modern Japanese players include Telexistence (teleoperation-focused humanoids for retail), Tokyo Robotics, RT Corporation, Kawada Robotics, and Ory Lab (the most well-known social-robot lineage, OriHime). Japan's role is now component-supply, R&D, and the "social robotics" niche where decades of work have produced expressive robots no one else builds. The 2026 trend: a US-Japan robotics and AI alliance is being floated explicitly, with the US dominating frontier AI and Japan dominating embodied robotics.

South Korea has a focused position via two channels. The first: Hyundai's acquisition of Boston Dynamics (2020, completed 2021) made Korea the corporate parent of one of the most advanced humanoid programs in the world, and Hyundai has committed to deploying "tens of thousands" of Atlas units across its plants. Hyundai plans to invest $6 billion in a robotics, data, and energy hub near Seoul. The second: Hyundai Robotics Lab's MobED (Best of Innovation award at CES 2026) and Rainbow Robotics (the HUBO lineage), supported by domestic players including WIRobotics, Holiday Robotics, LG Electronics, Robros, Keenon. Korea's positioning is "robotics as the next semiconductor industry" — the same industrial-policy playbook that built Samsung and Hynix, applied to embodied AI.

Europe has lost ground at the volume tier and gained ground at the specialty tier. The flagship industrial player is KUKA (now Chinese-owned, painfully). The cobot leader is Universal Robots / Teradyne (Danish, US-owned). The premium humanoid presence is Neura Robotics (German, the 4NE-1 humanoid). The medical-and-research presence is Franka Emika (Munich, the Panda research arm; bankruptcy in 2023, restructured), Festo (pneumatic and soft robotics), PAL Robotics (Spanish, service robotics), Macco Robotics, Oversonic Robotics (Italian). The UK has Shadow Robot Company (the dexterous-hand specialist from §9), CMR Surgical (the surgical-robot challenger to Intuitive), Prosper Robotics, TheHumanoid. France has Macco, Pollen Robotics (open-source Reachy), and the broader academic ecosystem.

The European competitive position is most distinctive in regulation. The EU's AI Act is the world's most comprehensive AI regulatory framework; emerging robotics-specific safety standards (ISO and ASTM extensions) increasingly originate from European working groups. The EU's antitrust posture blocked the Amazon-iRobot acquisition; CE marking remains the gold-standard hardware-safety certification. Europe's strategic bet is not to win volume manufacturing but to set the rules under which everyone else's robots can sell into European markets. A 2026 humanoid sold to a German factory has to comply with European regulations even if it was made in Shenzhen and runs an American foundation model.

12.5   The surgical lock — Intuitive's twenty-year monopoly

The geographic exception

one company · one country · 75% global share for two decades DATA

The geopolitical structure of surgical robotics is unique among the form factors. Intuitive Surgical (Sunnyvale, CA) has held 70–80% global share of soft-tissue surgical robotics for two decades. The lock is regulatory (FDA Class II surgical-robot approval averages 5–10 years and $100M+ per platform), economic (the $1.5–2.5M sticker price plus consumables creates a near-perfect razor-and-blades model), and clinical (surgeons trained on da Vinci through residency are reluctant to switch).

The challengers exist but have been mostly contained. Medtronic Hugo (US, the strongest credible challenger, slow uptake), Stryker Mako (orthopedics-focused, the only segment Intuitive doesn't dominate), CMR Surgical Versius (UK, the European challenger — strong in NHS deployments, weaker outside Europe), Asensus Surgical Senhance, Vicarious Surgical. The Chinese surgical market — Microport / Toumai, Edge Medical, Tinavi — has grown but stays mostly domestic due to FDA-equivalent regulatory differences (NMPA approval is faster but doesn't transfer westward).

The strategic question for the field: will the humanoid wave eventually disrupt surgical robotics, or will it be the other way around? The honest 2026 answer is neither. Surgical robotics is a regulatory island — the moats around it are clinical and FDA, not technical, and they don't erode just because foundation models get better. Intuitive's competitive position is more like a pharmaceutical company with a 20-year-patent than a hardware company subject to commoditization. The most-likely future is that humanoid programs and surgical robotics remain mostly separate industries even as both mature, because the customer (hospital systems vs. factory plants) and the regulatory regime (FDA medical device vs. OSHA workplace) are structurally incompatible.

12.6   Supply chains & trade policy — the hardening blocs

The components China makes that no one else does at price

rare earths · batteries · sensors · the components moat DATA

The component supply chain is where the geopolitical reality of 2026 robotics is hardest to soften. China refines the vast majority of the world's rare earths, controls much of the cobalt supply chain from the Democratic Republic of Congo, and manufactures 80% of global lithium-ion batteries. Rare-earth permanent magnets — the neodymium-iron-boron magnets in every BLDC motor in every robot — are nearly entirely Chinese-refined. The motors themselves can be wound in Japan or Germany; the magnets inside them are not.

The sensor and SoC supply chain is more mixed. Encoders (§4.1) come from Heidenhain (Germany), Renishaw (UK), AMS (Austria), RLS (Slovenia) — the European specialty. IMUs (§4.2) come from Bosch (Germany), TDK InvenSense (US-Japan), STMicro (French-Italian), with Honeywell and Northrop Grumman at the navigation-grade tier. F/T sensors (§4.3) are US-dominant via ATI. Tactile sensors (§4.4) are research-stage with Meta-affiliated and Chinese players competing. Range sensors (§4.5) split between US (Velodyne, Ouster, Aeva) and China (Hesai, Robosense, DJI Livox). The Jetson Thor SoC (§7.1) is NVIDIA, fabbed at TSMC in Taiwan — the single most strategic component in the entire stack.

The trade-policy environment hardened sharply in 2025–2026. Congress encouraged the Department of Defense to designate Unitree, a major Chinese manufacturer, as a "Chinese military company" in December 2025 while banning Chinese drone components from entering the United States during the same month. The Senate is considering a federal procurement ban on Chinese unmanned ground vehicle systems including humanoid robots, with carve-outs for counterterrorism applications. The FCC has been urged to add Chinese internet-connected robots to the Covered List. A gradual divide between US-aligned and China-aligned robotics ecosystems is emerging, which will raise short-term costs but improve long-term resilience.

The structural consequence: the robotics industry is bifurcating into two parallel ecosystems. A US-aligned humanoid program in 2026 is increasingly under pressure to source no Chinese components — a near-impossible standard given current supply chain reality, achievable only with multi-year nearshoring investments and significant cost premiums. The dual-sourcing requirement becomes mandatory for federal defense contracts, increasingly prevalent in commercial deployments, and a structural source of cost premium that Chinese-supplied competitors don't pay. Whether this premium is bearable depends on whether the foundation-model and AI-software advantage of US programs is large enough to offset the hardware cost gap.

Synthesis

Five hubs, two blocsthe geographic structure of 2026 robotics

The federation framing one final time: robotics in 2026 is not one global industry. It's five regional ecosystems with distinctive technical signatures, increasingly partitioned into two trade-policy blocs (US-aligned and China-aligned), with the volume manufacturing concentrated in China and the foundation-model layer concentrated in the US. The table below summarizes the position of each hub.

China United States Japan South Korea Europe
Strength Volume manufacturing · component supply · cost Foundation models · venture capital · frontier AI Precision components · social robotics · R&D Industrial policy · Boston Dynamics via Hyundai Regulation · medical · open-source · premium
Industrial arms Estun, Inovance, Siasun, Efort (weak — supplied by Asia + Europe) FANUC, Yaskawa, Kawasaki Hyundai Robotics, Doosan ABB, KUKA (Chinese-owned)
Humanoids Unitree, AGIBOT, UBTECH, XPENG, +20 more Tesla, Boston Dynamics, Figure, Apptronik, Agility, 1X Toyota TRI, Telexistence, Kawada Hyundai (BD owner), Rainbow, MobED Neura, PAL, Pollen, Sanctuary (CA)
Consumer Roborock, Ecovacs, Xiaomi, DJI, Anker iRobot, SharkNinja Sony, Sharp, Panasonic LG, Samsung Husqvarna, Dyson
Surgical Microport, Edge, Tinavi (domestic) Intuitive Surgical (dominant), Medtronic, Stryker (component supplier) (emerging) CMR Surgical (UK), Asensus (US-IT)
2026 trade policy State-backed, export-aggressive Procurement bans on China rolling out US-aligned, looking to mediate Tightly US-aligned Regulatory leadership · CE / EU AI Act
Demographic urgency Population peaked 2022 Slow decline · immigration cushion Steepest decline globally Steep decline Structural decline · varies by country

Three structural observations land the section. First, the Chinese position at the volume tier is durable for the rest of the decade — no Western program is closing the cost gap on rare-earth magnets, lithium-ion cells, or commodity sensor components in the next 24 months, and "China + 1" sourcing strategies take 5+ years to mature. Second, the demographic urgency is real and one-directional — China, Japan, South Korea, Germany, and Italy are all running out of working-age population on roughly the same schedule, and humanoid robotics is the only technology being explicitly proposed to fill the gap. The investments aren't speculative; they're insurance against demographic collapse. Third, the bifurcation into two ecosystems is a feature, not a bug — both blocs are accelerating their humanoid programs precisely because the other bloc is, and the field benefits in capability terms even as the trade policy hardens. The story of robotics in 2026 is less "one company wins" and more "two parallel industries develop in tandem, with the developing world increasingly forced to pick a bloc to source from." The map is not yet drawn for that resolution; the next 24 months draw most of it.

§13 · The honest picture

The humanoid momentwhat 2026 actually delivers · what 2030 might

This guide began bottom-up — actuators in §1, sensors in §4, control in §5 — because that order tracks the field's load-bearing physics. The closing section reverses that perspective and asks the only question a reader actually wanted answered: are humanoid robots real, and if so, when do they matter? The honest 2026 answer has a numerator and a denominator. The numerator: there are 2,000+ humanoid robots doing paid work in factories worldwide, growing fast, with credible pilots at every major automotive OEM. The denominator: there are 5 billion humans in the global workforce. The ratio is 0.0004%. Whether that ratio reaches 0.4% by 2030 — a thousand-fold scaling — is the bet the entire field is making, and the answer determines whether the next decade looks more like the smartphone S-curve (2007–2014) or the autonomous-vehicle plateau (2016–2024). This section walks the honest evidence on both sides.

13.1   What's actually shipping in 2026 — the verifiable deployments

Five real deployments

not demos · not pilots · paid work being done by autonomous humanoids DATA

The honest test for "is this technology real" is: which deployments produce measurable revenue under contract? In 2026, five do.

Agility Robotics Digit at GXO Logistics. The flagship verified deployment. The industry-first multi-year RaaS agreement with GXO at a Spanx fulfillment center has moved 100,000+ totes; ~100 units sold at $250K+ enterprise RaaS pricing. Digit is purpose-built for warehouse work — 5'9", 143 lbs, 35 lb payload, 8-hour battery, LiDAR + depth-camera navigation, the Agility Arc cloud platform coordinating fleet operations autonomously. The robot is not general-purpose; it moves totes between conveyor and shelf, and does it autonomously enough that GXO pays for it. RoboFab, the world's first humanoid robot factory, is scaling production from hundreds to 10,000+ per year. The 2026 honest claim: this is the first humanoid program with a sustainable commercial revenue model.

Figure 02 at BMW Spartanburg. The industrial pilot that proved the form factor. Figure 02 proved itself at BMW over 11 months: 90,000+ parts loaded, 30,000 BMW X3 vehicles produced, 1,000 placements per day within 5 mm tolerance. The task is sheet-metal-part insertion into specific fixtures — high-precision, well-defined, structurally similar to existing industrial-arm work but in a layout retrofitted from human workstations. The honest read: Figure 02's BMW deployment is the most rigorous published validation that a bipedal humanoid can do paid factory work alongside humans, but the 5 mm tolerance is loose by industrial standards and the forearm was identified as the top hardware failure point. Figure 03 ships during 2026 with home-deployment ambitions; whether that scales is the open bet on the company's $39B valuation.

Boston Dynamics Atlas at Hyundai. Electric Atlas began production-line deployments in 2025 across Hyundai's manufacturing footprint. The 2026 commitment: tens of thousands of units across Hyundai plants, with 30,000/year production target by 2028 via the joint Hyundai-BD partnership. Atlas leads on dynamic capability — 50 kg payload, the locomotion frontier (§10), the most sophisticated whole-body control in shipping product. The honest read: Atlas is currently committed entirely to Hyundai and Google DeepMind partnerships, not available for commercial purchase, and the production ramp is the gating factor.

Tesla Optimus in Tesla factories. Tesla has 1,000+ Optimus units running inside its own factories. The deployment is internal — Tesla doesn't sell Optimus to anyone yet — and the tasks are battery-cell handling and parts-tray movement. The vertical-integration play is to amortize Tesla's automotive supply chain into humanoid hardware costs at a scale no other Western program can match. The 2025 production count was reportedly in the hundreds rather than the announced thousands; whether 2026 hits the 50K-unit target is the public question. Tesla targets consumer Optimus sales by late 2027.

Unitree at retail and entertainment. The volume-leader story. Unitree's R1 starts at around $4,900 and is going global via AliExpress, making it the most accessible full humanoid robot ever commercially available; the G1 is also available at $13,500–$16,000. 5,500+ units shipped in 2025, 10–20K target for 2026. The deployments are research labs, public demos, retail brand activations, light commercial use. The honest read: Unitree is the volume leader and the price-point disruptor, but the deployment profile is closer to "research platform with commercial side-business" than to Digit's contracted warehouse work. If you measure by units shipped, Unitree wins. If you measure by hours of paid autonomous work performed, Digit wins.

Beyond these five, every other 2026 deployment is honestly described as "pilot" — Apptronik Apollo at Mercedes and Jabil, UBTECH Walker S2 in Chinese auto plants (BYD, Foxconn), 1X NEO with early-access US households, Sanctuary Phoenix at retail and warehouse partners, Apptronik with GXO. The pilots are real and produce useful data, but they don't yet pay for themselves at scale.

13.2   What's demo-only — the gap between video and deployment

The viral video filter

what to watch for · what to discount · the "factory proof" standard PHYSICS

The most-watched humanoid videos of 2025–2026 are mostly demos, not deployments. The signal-vs-noise problem is severe enough that the field has developed working heuristics for telling the two apart. Watch for what's behind the camera. A demo with multiple cuts and a cinematic soundtrack is almost always teleoperated or scripted. A demo that runs in one continuous shot for 5+ minutes, with the robot navigating to objects of the operator's choice rather than pre-staged ones, is plausibly autonomous. Atlas's parkour and dance videos are still mostly the former; Digit's tote-handling at GXO is the latter.

The harder filter is "factory proof" vs "demo proof." A robot that does an impressive task in a controlled lab is not the same as a robot that does a task in a real working facility where downtime, supervision overhead, integration cost, and safety policy all become operational constraints. Factory proof is a different standard from demo proof. It means a robot has to contribute inside environments where downtime, line interruptions, safety policy, and worker coordination matter more than novelty. Most 2026 humanoid programs are still working toward factory proof rather than past it.

The specific failure modes that demos hide. Battery life. Most demos are 5–10 minutes long. A real shift is 4–8 hours. Most humanoids run 1–4 hours on a charge before needing a hot-swap or wall-charge cycle (§6.4). Recovery from edge cases. A robot that's never seen a particular failure (a dropped tote, a misaligned conveyor, a colleague stepping into its path) often handles it badly. The Figure 02 BMW deployment specifically called out "needs human intervention for handling dropped items and truly messy environments." Repeated tasks at consistent quality. A demo shows one successful run; production wants 99.9%+ success across thousands. Operating cost. Energy, maintenance, software updates, and most importantly the human supervision overhead that almost every current deployment requires. The headline cost ($16K–$250K depending on tier) is almost never the right number for a deployed robot — total cost of ownership over a 3–5 year life is 2–4× the headline cost.

What the field calls "the second-half-of-2026 reality check": the difference between programs that hit their published unit-count targets and programs that don't. If Unitree hits its 10,000–20,000 unit target for 2026, the cost curve continues. If they don't, the gap between demo and deployment remains where it's always been. Tesla's 50K-unit Optimus target, Figure's 12K-unit BotQ target, Agility's 10K-unit RoboFab target — these are the public measurables that will tell readers more about whether the humanoid moment is real than any demo will.

13.3   The five-year scenario — 2026 → 2030

Three plausible trajectories · one diagram

linear · S-curve · stall — which one resolves depends on the next 24 months ↓ ANIMATED

Three plausible trajectories for humanoid deployment exist, and the field's mainstream view as of mid-2026 is contested between them.

The S-curve scenario is the bull case. Production scales from ~13K units in 2025 to 100K+ in 2027 to 1M+ in 2030, driven by the foundation-model layer (§7, §8) closing the manipulation gap (§9), the locomotion stack (§10) being already mature, and the cost curve continuing to compress through Chinese volume manufacturing. IDC forecasts global humanoid shipments exceeding 510,000 units by 2030, representing a CAGR of nearly 95%. Goldman Sachs projects $38B market by 2035; Bank of America Research, Morgan Stanley, McKinsey, and Bain all sit in similar ranges. The S-curve scenario is what the consensus institutional forecasts now assume.

The linear scenario is the base case. Production scales steadily but not exponentially, hitting 100K–300K units by 2030, with deployments concentrated in warehousing and light manufacturing rather than expanding to homes or services. Manipulation remains the bottleneck (§9), the foundation-model layer doesn't fully close the gap, and the field looks more like industrial robotics' 40-year history than smartphones' 10-year compression. The linear scenario is what most operationally-experienced practitioners assume — including the consultant cited in the §13.0 source material who's deployed 10,000 robots over an 18-year career.

The stall scenario is the bear case. The cost compression doesn't continue past current levels, manipulation remains stubbornly hard, the regulatory environment hardens, and at least one high-profile humanoid startup files for bankruptcy or shuts down — at least one high-profile humanoid startup is likely to face major setbacks or shut down per established operator coverage. The stall scenario echoes the autonomous-vehicle plateau of 2018–2024, where unbounded optimism in 2017 ran into the long tail of edge cases and the field consolidated rather than broke through. It's the scenario that the "fool me once" investors are pricing in.

Three plausible humanoid-deployment trajectories from 2025 to 2030. The S-curve (orange) is the institutional bull case, with production scaling from 13K to 1M+ units. The linear scenario (cyan) is the base case operationally-experienced practitioners assume, scaling steadily to 100–300K. The stall scenario (red) echoes the autonomous-vehicle plateau, with growth flattening as edge cases and economics dominate. The shaded region in 2026–2027 is where the resolution between scenarios actually happens — the unit counts hit by Unitree, Tesla, Figure, and Agility against their public targets.
10K 100K 1M 10M UNITS / YEAR 2025 2026 2027 2028 2029 2030 YEAR RESOLUTION ZONE ~13K (2025) S-curve ~1M+ (2030) IDC, Goldman Linear 100–300K (2030) operator base case Stall ~30K (2030) AV-plateau analog scenarios diverge here unit-count targets resolve

The diagram makes the central observation visible: the three scenarios diverge in 2026–2027, not at 2030. By the time the 2030 outcome is known, the 2030 outcome is set; the question is which trajectory the field is actually on, and that gets answered now. The unit-count targets that humanoid programs have publicly committed to in the next 18 months — Tesla 50K, Figure 12K, Agility 10K, Unitree 10–20K, BD 30K by 2028 — are the data points that will tell the reader which scenario is right. Hit them, and the S-curve is plausible. Miss by 50%+ and the linear scenario dominates. Miss by 80%+ and the stall scenario is real.

13.4   The open questions — what isn't yet decided

Five questions whose answers determine the field

none of these has a clear 2026 resolution · all of them resolve in the next decade SOFTWARE

Five questions remain open in 2026. Their resolution determines what humanoid robotics looks like by 2035 — and they're the questions the field is still arguing about, not the ones it has settled.

Does manipulation generalize? §9 framed manipulation as the unsolved problem and the dexterity gap as the keystone constraint. The bull case is that VLA-driven foundation models (§7.5, §8.3) close the gap by 2028; the bear case is that manipulation in unstructured environments resists the foundation-model approach the way self-driving in unstructured environments resisted it for a decade. The answer determines whether humanoids stay industrial or expand into homes and services.

Does the bipedal form factor win, or do wheeled-and-legged hybrids? §10's locomotion menu showed wheels are most efficient on flat surfaces, legs handle anything, and the 2026 competitive position is that bipeds are chosen because human-shaped environments require human-shaped traversal. But Roborock's CES 2026 Saros Rover (§11.7) — a wheeled-legged consumer vacuum — and the broader semi-humanoid trend (wheeled-base manipulators with humanoid torsos like HMND 01 Alpha) suggest the form factor isn't yet settled. Bipeds are the pure form; hybrids might be the practical winner.

Does the home market materialize? §11.7's consumer robotics analysis showed that consumer robots are already in 50M+ households at scale, but only for narrow tasks (vacuuming, mowing). 1X NEO at $20K, Figure 03's home-deployment ambitions, and Tesla's late-2027 consumer Optimus target are the bets that the household market for general-purpose humanoids is real. The pattern from prior consumer-robot waves (Pepper, Anki Cozmo, Jibo — all wound down) is a warning. The home market opens humanoid TAM by 100×, but it's the highest-uncertainty deployment context.

Does Chinese supply-chain dominance hold? §12 documented that 90%+ of components in any 2026 humanoid trace through Chinese-controlled supply chains, and that the bifurcating trade policy environment (federal procurement bans, Unitree designation, FCC Covered List) is forcing nearshoring at significant cost premium. The bull case for Western programs is that the cost premium is bearable because foundation-model advantages compensate. The bear case is that Chinese vertical integration plus state subsidy makes Western programs structurally unprofitable in the consumer tier. Whoever wins this argument captures the volume side of the market.

Does the labor-substitution narrative survive contact with reality? §6's energy-gap framing showed humanoids are 50,000× less energy-dense than humans. The compensating story is that robots work 24/7 without breaks, sick days, or wages — the per-hour economic comparison is what makes the math work. But the early deployments at GXO, BMW, and Mercedes are augmenting human workers, not replacing them. Humanoid deployment will accelerate through 2030 driven by technology maturation, cost reductions, and expanding use case viability — but humanoids today operate in structured environments with significant human oversight. The labor-substitution narrative is what justifies the $10B+ in venture funding poured into the field. If 2030 humanoids are still augmentation rather than substitution, the financial returns won't match the projections, and the consolidation phase will be brutal.

Synthesis · the closing argument

The honest five-year picturewhat changes, what doesn't, what to watch

Twelve sections of this guide built up the technical and market picture. This closing summary collapses it into the practical observations that matter for someone trying to understand what 2030 actually looks like. Not the bull case, not the bear case — the calibrated middle that survives engagement with the field's physics, economics, and history simultaneously.

What changes by 2030 What doesn't change by 2030 The signal to watch
Hardware Cost compression continues · sub-$15K humanoids at industrial reliability · solid-state batteries (maybe) · QDD actuators commodified Energy density gap with humans (~50,000×) · the dexterous-hand mechanism (still tendon-driven from forearm) Unitree price points · Tesla per-unit cost · battery Wh/kg curves
Software VLA generalization improves substantially · System-1/System-2 architectures dominate · sim-to-real gap narrows Long-tail edge cases · novel-object manipulation reliability ceiling · the integration challenge across the federation of stacks Open-source VLA performance benchmarks · "factory proof" claims with numbers
Deployment Warehouses → light manufacturing → some service · 100K–1M+ units cumulative Homes for general-purpose tasks (still aspirational) · surgical (Intuitive's lock holds) · most service categories Hours of paid autonomous work · revenue per unit · churn / return rates
Geopolitics US/China bifurcation hardens · EU regulatory framework matures · Korean and Japanese specialization narrows Chinese component supply-chain dominance · rare-earth and lithium sourcing reality Federal procurement actions · component-tariff regimes · Beijing humanoid subsidy levels
Economics Several humanoid startups consolidate or fail · $10B+ disclosed funding looks small in retrospect · Robot-as-a-Service models normalize The labor-substitution math at home and in services (still doesn't pencil out) · the 3-5× peak-vs-continuous torque ratio (thermal physics) Unit economics at fleet scale · the first publicly-reported humanoid program shutdown · IPO valuations vs. revenue
The field itself "Robotics" as a category increasingly fragments into the eight industries §11 named · humanoid hype peaks then re-baselines · foundation models eat more of the stack The federation pattern documented across §1–§12: actuators are an industry, sensors are five industries, control is five timescales, software is four columns · the underlying complexity doesn't go away The publication cadence of "humanoids are real" vs "humanoids are overhyped" pieces · ratio reverses around the resolution zone in 2026–2027

Three closing observations that survive everything in the guide above. First, the technology is structurally real this time. Foundation models on top of mature actuators, mature sensors, mature locomotion stacks, and rapidly-maturing manipulation algorithms is genuinely different from the 2010s humanoid waves that didn't have any of those layers. The Unitree R1 at $4,900 is not a press release — it's actually shipping at AliExpress price. The 99.4% price collapse from ASIMO ($2.5M, 2000) to R1 ($4,900, 2026) is the cost-curve datum that makes the rest of the field's economics workable. Second, the path-to-revenue is narrower than the press coverage suggests. Most current value flows through warehouse and factory deployments, where the form-factor advantages over wheeled robots are real but modest, and where the returns don't justify the $10B+ in disclosed humanoid funding without significant expansion into adjacent markets. The bull case requires the foundation-model layer to deliver on home and service expansion. Third, the consolidation phase is coming and will be informative. The field has too many programs chasing too few clear use cases at premature unit economics; some will fail, some will get acquired, and the post-consolidation landscape will look different from the 2026 landscape in ways that are predictable in shape and unpredictable in detail. The honest answer to "are humanoid robots real" in 2026 is: yes, more so than they've ever been, less so than the press makes it sound, and the next 24 months are the period in which the field's medium-term trajectory becomes legible. The reader of this guide is now equipped to read the signals, not the press releases.

End of guide

Codathe federation, all the way down

Thirteen sections back, this guide began with the observation that "the unit of software is becoming a single HTML file with a model inside it" — the Naklitechie thesis that organizes a different conversation. The robotics counterpart is similar in shape: a humanoid robot is a federation of substrates, layered into a single chassis. Five actuator industries (§1, §2). Five sensor industries (§4). Five control timescales (§5). Five compute substrates (§7). Two perception paradigms running concurrently (§8). The dexterous hand sourcing from a separate ecosystem of micro-actuator suppliers (§9). The biped's locomotion stack borrowing thirty years of quadruped research (§10). The market splitting into eight distinct industries (§11). The geopolitical map bifurcating into two trade blocs (§12). And the foundation-model layer (§7.5) trying to glue all of it into something general-purpose enough to be useful.

The mistake mainstream coverage makes is to treat the humanoid as a unified product. It's not. It's a stack — a deeply layered, federated stack — held together by foundation-model glue that's good enough to ship and not yet good enough to deliver the general-purpose promise. Whether the glue gets good enough fast enough is the bet the entire field is making, and it's a bet that will be resolved in the next 24 to 36 months by data the public can already see: unit shipments, paid deployment hours, the price points charged at AliExpress, the components that pass federal procurement bans, and the GXO totes that move per hour autonomously.

For the reader who got this far: the right way to read 2026 humanoid coverage is to ignore the demos and watch the federation. Each layer's progress matters, each layer's bottlenecks compound across the others, and the foundation-model layer that gets the press is the layer most dependent on every other layer working. The robotics industry's 60-year history is the federation maturing one substrate at a time. The 2026 humanoid moment is the first time enough substrates have matured that the federation might cohere into something the press is currently calling "general-purpose robots." The honest 2026 answer to whether that cohesion arrives at scale is: maybe, and we'll know.