Artificial Intelligence March 5, 2026

NVIDIA’s Vera Rubin Platform: Six Chips, One AI Supercomputer

At CES 2026 on January 5, NVIDIA CEO Jensen Huang walked onto the stage at the Fontainebleau Las Vegas and did what he does best – redefined the baseline for AI infrastructure. The product he unveiled, the Vera Rubin AI platform, is not just a new GPU. It is a complete, six-chip architecture engineered from the data center outward, designed to slash the cost of generating AI tokens to roughly one-tenth that of the previous Blackwell platform while treating the entire rack – not an individual chip – as the fundamental unit of compute.

Named after Vera Florence Cooper Rubin, the trailblazing American astronomer whose work on galaxy rotation curves transformed our understanding of dark matter, the platform arrives at a moment when AI computing demand for both training and inference is, in Huang’s words, “going through the roof.” The Vera Rubin platform enters full production immediately, with initial samples already shipped to customers and volume production on track for the second half of 2026.

What makes this launch different from a typical generational GPU bump is the concept NVIDIA calls “extreme codesign” – the simultaneous engineering of GPUs, CPUs, networking switches, DPUs, SuperNICs, and software as a single unified system. The result is a rack-scale AI supercomputer that promises to train mixture-of-experts models with 4x fewer GPUs and deliver inference at 10x lower cost per million tokens compared to Blackwell.

The Six Chips That Make Up One Supercomputer

The Vera Rubin platform is built from six new chips, each engineered for a specific role but designed from the start to function as an integrated whole. This is a deliberate departure from the traditional approach of optimizing individual components in isolation.

Together, these components eliminate the bottlenecks that emerge when compute, networking, and infrastructure are treated as loosely coupled layers. The extreme codesign philosophy ensures that communication, coordination, security, and efficiency are first-class architectural concerns rather than afterthoughts.

Rubin GPU: 50 Petaflops of Inference Per Chip

The Rubin GPU is the computational heart of the platform, built on a TSMC 3nm process and equipped with 288 GB of HBM4 memory delivering 22 TB/s of bandwidth per GPU. The headline performance figure is staggering: up to 50 petaflops of NVFP4 inference and 35 petaflops for training workloads, per GPU.

NVFP4 is NVIDIA’s low-precision numerical format designed to maximize inference throughput while maintaining acceptable model quality for production workloads. The third-generation Transformer Engine includes hardware-accelerated adaptive compression that boosts NVFP4 performance while preserving accuracy. Critically, the engine is fully backward-compatible with Blackwell, meaning previously optimized code transitions to Rubin without rework.

For scale-out connectivity, each Rubin GPU gets 3.6 TB/s of NVLink 6 bandwidth for GPU-to-GPU traffic. At the rack level, a full NVL72 configuration delivers 260 TB/s of aggregate NVLink bandwidth – a figure NVIDIA notes exceeds the bandwidth of the entire global internet.

Vera CPU: The Traffic Controller GPUs Need

The Vera CPU is not a general-purpose server processor competing with AMD or Intel for traditional workloads. It is purpose-built to keep the Rubin GPUs fed with data and to handle agentic reasoning tasks in the control plane. It features 88 custom NVIDIA Olympus cores – Arm-compatible – supporting 176 threads through a technique called Spatial Multithreading, which physically partitions resources rather than time-slicing them.

Vera CPU Specification Value
Core Count 88 NVIDIA Olympus cores (Arm-compatible)
Thread Count 176 threads via Spatial Multithreading
Memory 1.5 TB LPDDR5X
Memory Bandwidth 1.2 TB/s
NVLink-C2C to GPUs 1.8 TB/s
I/O PCIe Gen6, CXL 3.1
Security Full confidential computing support

The 1.8 TB/s NVLink chip-to-chip connection to GPUs is the key specification here. It creates a memory-coherent link that effectively erases the traditional CPU-GPU boundary, allowing the Vera Rubin Superchip – which combines two Rubin GPUs with one Vera CPU – to operate as a single coherent unit. This doubles prior-generation CPU performance and delivers 2.4x the memory bandwidth of the Grace CPU it succeeds.

Rack-Scale Architecture: The NVL72 Configuration

The flagship product of the Vera Rubin platform is the NVL72 – a liquid-cooled, rack-scale AI supercomputer that packs 72 Rubin GPUs and 36 Vera CPUs into a single rack. This is not a loose collection of servers sharing a network switch. It is a unified system where the entire rack operates as one accelerator within a larger AI factory.

Configuration GPUs CPUs NVFP4 Inference Key Feature
Single Rubin GPU 1 50 PFLOPS Base compute unit
Vera Rubin Superchip 2 1 100 PFLOPS Memory-coherent NVLink-C2C
Vera Rubin NVL72 72 36 3,600 PFLOPS Full rack-scale, 260 TB/s NVLink

The NVL72 rack uses a modular, cable-free tray design that enables 18x faster assembly and serviceability compared to Blackwell systems. A single compute board assembles 17,000 components with micron precision, with robots handling placement. The second-generation RAS (Reliability, Availability, and Serviceability) Engine provides proactive maintenance and real-time health checks without downtime, while software-defined NVLink routing ensures continuous operation even during component failures.

One building block within the rack features a BlueField-4 DPU, 8 ConnectX-9 SuperNICs, 2 Vera CPUs, and 4 Rubin GPUs – all working in concert for compute, networking, and security.

Performance That Matters: Cost Per Token and Training Efficiency

Raw petaflops are impressive on slides, but what matters in production is cost per token delivered and wall-clock training time. NVIDIA’s platform-level claims for Vera Rubin are specific and aggressive.

For inference, the Vera Rubin NVL72 delivers tokens at one-tenth the cost per million tokens compared to the Blackwell GB200 NVL72, based on the Kimi-K2-Thinking model using 32K/8K input/output sequence lengths. For training, Rubin trains a 10-trillion-parameter MoE model on 100 trillion tokens in a fixed one-month timeframe using one-fourth the number of GPUs required by Blackwell.

These are not abstract improvements. Huang framed them in business terms: “The faster you train AI models, the faster you can get the next frontier out to the world. This is your time to market. This is technology leadership.” He also noted that accelerated computing is modernizing roughly $10 trillion in computing infrastructure built over the prior decade.

Third-Generation Confidential Computing and Security

Vera Rubin NVL72 introduces the first rack-scale confidential computing capability in the AI industry. The platform creates a unified trusted execution environment spanning all 36 Vera CPUs, 72 Rubin GPUs, and the NVLink fabric connecting them. Data security is maintained across CPU, GPU, and NVLink domains simultaneously, with attestation services providing cryptographic proof of compliance.

This is a meaningful development for enterprises and cloud providers running multi-tenant inference on proprietary models. It protects training data, model weights, and inference workloads at a scale that was previously impossible – addressing a critical barrier to enterprise AI adoption.

Open Models and the Alpamayo Autonomous Driving Platform

Alongside the hardware, NVIDIA announced an expanding portfolio of open models trained on its own supercomputers. The portfolio spans six domains: Clara for healthcare, Earth-2 for climate science, Nemotron for reasoning and multimodal AI, Cosmos for robotics and simulation, GR00T for embodied intelligence, and a new addition – Alpamayo for autonomous driving.

Alpamayo is an open reasoning model family designed for Level 4 autonomy. It includes vision-language-action models, simulation blueprints, and datasets. The platform can generate realistic videos from single images, synthesize multi-camera driving scenarios, model edge-case environments from scenario prompts, perform physical reasoning and trajectory prediction, and drive interactive closed-loop simulation.

The flagship model, Alpamayo R1, is described as the first open reasoning VLA model for autonomous driving. Huang demonstrated a vehicle navigating busy San Francisco traffic, noting that the system “not only takes sensor input and activates steering wheel, brakes and acceleration, it also reasons about what action it is about to take.” The first passenger car featuring Alpamayo on NVIDIA’s DRIVE platform will be the all-new Mercedes-Benz CLA, with AI-defined driving coming to the U.S. this year.

Ecosystem Adoption and Availability Timeline

The list of companies expected to adopt Rubin reads like a who’s who of AI infrastructure: OpenAI, Anthropic, Meta, xAI, Microsoft, Google, AWS, Oracle, CoreWeave, Dell, HPE, Lenovo, and Supermicro, among many others. Microsoft’s next-generation Fairwater AI superfactories will deploy hundreds of thousands of Vera Rubin NVL72 rack-scale systems. CoreWeave plans Rubin deployments starting mid-2026 through its Mission Control software.

NVIDIA CFO Colette Kress confirmed the timeline during recent financial results: “We shipped our first Vera Rubin samples to customers earlier this week, and we remain on track to commence production shipments in the second half of the year.” The Vera Rubin NVL72 is expected to be available through cloud partners including Microsoft Azure and AWS in H2 2026. An HGX Rubin NVL8 configuration with 8 GPUs is also available for partners who need a server-level building block rather than a full rack.

Industry endorsements were emphatic. Sam Altman of OpenAI noted that “intelligence scales with compute” and called Rubin essential for continued progress. Elon Musk of xAI called it “a rocket engine for AI.” Satya Nadella of Microsoft emphasized building “the world’s most powerful AI superfactories” with Vera Rubin GPUs.

What This Means for the Future of AI Infrastructure

The Vera Rubin platform represents a philosophical shift in how AI compute is designed and delivered. Rather than selling faster chips that plug into existing server architectures, NVIDIA is selling rack-scale systems where every component – from silicon to cooling to software – is co-designed for sustained intelligence production. The data center itself becomes the product.

This approach addresses real operational pain points that have emerged as AI factories scale: downtime from component failures, security gaps in multi-tenant environments, bandwidth bottlenecks between GPUs, and the sheer cost of generating trillions of tokens for reasoning and agentic workloads. The modular tray design, second-generation RAS Engine, rack-scale confidential computing, and AI-native storage with BlueField-4 processors are all solutions to problems that only become visible at production scale.

For organizations planning AI capacity for late 2026 and beyond, the Vera Rubin platform sets a new baseline. Independent benchmarks and real-world validation remain pending, but the architectural direction is clear: the era of the isolated GPU server is giving way to the era of the AI factory, and NVIDIA intends to build the factory floor.

Sources