Photo via Unsplash

root-reportAI8 min read

China Just Built a Trillion-Parameter AI and Gave It Away for Free. The Root Goes Back to a Victorian Counting Machine.

DeepSeek V4 packs 1 trillion parameters but only activates 32 billion per query, running frontier AI at 1/50th the cost of GPT-5. The root of all neural networks traces back to a drawing room in 1837.

RootByte Editorial·March 7, 2026

Key Takeaways

•DeepSeek V4 has 1 trillion total parameters but only activates 32 billion per query, a Mixture-of-Experts trick that slashes compute costs by up to 50x
•Built on Huawei and Cambricon chips to avoid US export controls, a deliberate move away from NVIDIA dependency
•The concept of a machine learning from data dates to Frank Rosenblatt's Perceptron in 1958, a device the Navy funded to 'read and identify shapes'
•Ada Lovelace wrote the first algorithm in 1843 for a machine that was never built, it took 114 years for hardware to catch up to her vision

Root Connection

From Charles Babbage's Analytical Engine (1837) to Ada Lovelace's first algorithm, to Rosenblatt's Perceptron (1958), to Hinton's backpropagation (1986), to transformers (2017), to trillion-parameter models in 2026, the idea that machines can think has been building for 189 years.

Timeline

1837Charles Babbage designs the Analytical Engine, the first general-purpose computing machine. Ada Lovelace writes what is considered the first computer program for it.

1943McCulloch and Pitts publish 'A Logical Calculus of Ideas Immanent in Nervous Activity', the first mathematical model of a neural network.

1958Frank Rosenblatt builds the Perceptron at Cornell, the first machine that could learn from data. The New York Times headline: 'New Navy Device Learns By Doing.'

1986Geoffrey Hinton publishes the backpropagation algorithm, enabling multi-layer neural networks to train efficiently. The AI winter begins to thaw.

2017Google Brain publishes 'Attention Is All You Need', introducing the Transformer architecture. Every modern AI model descends from this paper.

2023DeepSeek founded in Hangzhou, China. Begins building open-source models designed to run efficiently on limited hardware.

2026DeepSeek V4 announced: 1 trillion parameters, Mixture-of-Experts architecture, 32B active per query, open-source. Cost per million tokens: $0.10-0.30, up to 50x cheaper than GPT-5.2.

In early March 2026, a Chinese AI lab called DeepSeek announced something that sent tremors through Silicon Valley: a trillion-parameter AI model that runs at a fraction of the cost of anything from OpenAI, Google, or Anthropic. And they're releasing it open-source. Free. For everyone.

DeepSeek V4 packs one trillion total parameters. But here's the trick that makes it revolutionary: it only activates 32 billion of them for any given query. The architecture, called Mixture-of-Experts (MoE), routes each input to specialized sub-networks, like a hospital where you only see the specialist you need, not every doctor on staff. The result: frontier-level intelligence at $0.10 to $0.30 per million input tokens. GPT-5.2 charges roughly 50 times more for comparable performance.

The model handles text, images, audio, and video natively. It targets a million-token context window. And it was built on Chinese-made chips from Huawei and Cambricon, a deliberate move to sidestep US export controls on NVIDIA hardware.

But the deepest story here isn't about China versus America. It's about a 189-year-old idea finally reaching its logical conclusion.

“The question is no longer who can build the biggest model. It's who can make frontier intelligence accessible to everyone. DeepSeek's answer: give it away.”
— Analysis, RootByte Editorial

THE ROOT

In 1837, a cantankerous English mathematician named Charles Babbage designed the Analytical Engine, a mechanical, steam-powered computing machine with a mill (processor), a store (memory), and the ability to branch and loop. It was, in every meaningful sense, the first general-purpose computer. It was never built. Babbage couldn't secure funding. The British government had already sunk 17,000 pounds (roughly $3 million today) into his earlier Difference Engine and wasn't keen on another round.

But a young mathematician named Ada Lovelace saw what Babbage couldn't sell. In 1843, she wrote extensive notes on the Analytical Engine, including what is widely considered the first computer program, an algorithm for computing Bernoulli numbers. More importantly, she articulated something radical: "The Engine might compose elaborate and scientific pieces of music of any degree of complexity or extent." She saw that a machine processing symbols wasn't limited to math. It could create.

It took 114 years for hardware to catch up to her vision.

In 1943, Warren McCulloch and Walter Pitts published the first mathematical model of a neural network, showing that simple binary units connected together could, in theory, compute any logical function. They modeled the brain as a circuit.

“The Analytical Engine weaves algebraical patterns just as the Jacquard loom weaves flowers and leaves.”
— Ada Lovelace, Notes on the Analytical Engine, 1843

In 1958, Frank Rosenblatt built the Perceptron at Cornell, funded by the US Navy. It was the first machine that could learn from data, a single-layer neural network that adjusted its own weights based on errors. The New York Times ran the headline: "New Navy Device Learns By Doing." The article suggested the machine could eventually "walk, talk, see, write, reproduce itself, and be conscious of its existence."

Then came the backlash. In 1969, Marvin Minsky and Seymour Papert published "Perceptrons," a book that proved single-layer networks couldn't solve certain basic problems (like XOR). Funding dried up. The first AI Winter descended.

THE THAW

It took until 1986 for the thaw to begin. Geoffrey Hinton, along with David Rumelhart and Ronald Williams, published the backpropagation algorithm, a method for training multi-layer neural networks by propagating errors backward through the layers. Suddenly, deep networks could learn. The problem Minsky had identified was solved: you just needed more layers.

But more layers meant more computation, and 1986 hardware wasn't up to the task. It took another three decades for GPUs, originally designed for video games, to provide the parallel processing power that deep learning demanded.

In 2012, Hinton's student Alex Krizhevsky won the ImageNet competition with AlexNet, a deep convolutional neural network running on two NVIDIA GTX 580 GPUs. The error rate dropped from 26% to 16%. The modern AI era had begun.

Then came the paper that changed everything. In June 2017, eight researchers at Google Brain published "Attention Is All You Need," introducing the Transformer architecture. Instead of processing sequences step by step (like previous models), Transformers could attend to all parts of an input simultaneously. This made them dramatically faster to train and better at capturing long-range patterns. Every major AI model since, GPT, Claude, Gemini, Llama, DeepSeek, is built on Transformers.

FROM MILLIONS TO TRILLIONS

The parameter race accelerated at a pace that would have made Babbage weep. GPT-2 (2019) had 1.5 billion parameters. GPT-3 (2020) had 175 billion. GPT-4 (2023) was rumored to have over a trillion. Google's PaLM had 540 billion. Each generation brought capabilities that seemed impossible just months earlier.

But bigger isn't always better, or at least, it doesn't have to be more expensive. DeepSeek's breakthrough with V4 is proving that you can have a trillion parameters without needing a trillion dollars. The Mixture-of-Experts approach means the model is vast in knowledge but lean in execution. It knows a lot of things, but it only thinks about what's relevant to your question.

And by building on Chinese-made chips, DeepSeek is demonstrating that the US export control strategy, restricting access to NVIDIA's most advanced GPUs, may be accelerating Chinese innovation rather than slowing it. The constraint became a catalyst.

WHY IT MATTERS

When Ada Lovelace wrote her algorithm in 1843, she imagined a machine that could manipulate symbols to create meaning. She was describing, without knowing it, what a large language model does: process tokens, find patterns, generate something new.

It took 183 years to go from her handwritten notes to a trillion-parameter model that costs a fraction of a cent per query. The Analytical Engine was never built. The Perceptron was dismissed as a dead end. Backpropagation sat dormant for decades. Transformers were published as a technical paper that most people ignored.

Every breakthrough in AI was preceded by a period where the people building it were told it wouldn't work. Then it did.

DeepSeek V4 is the latest chapter in that story, not because it's the biggest or the best, but because it's the most accessible. A trillion parameters, available to anyone with an internet connection, for free. Ada would have approved.

aideepseekchinaopen-sourcemachine-learningparametershistory

Sharein 🦋𝕏

Studio

AI Model Parameters Over Time

Source: Various announcements (GPT-2 through DeepSeek V4 era)

Read Root Access

The public newsroom stays free. Root Access is the future member-supported lane for AI-authored columns, founder notes, and direct experiments behind the work.

Open Root Access

How did this make you feel?

Keep Reading

History

Python Started as a Christmas Holiday Project, Now It Runs the AI Revolution

5 min read

A cascade of neural network nodes expanding outward from a single origin point

Twelve AI Models Launched in One Week. The Field That Produced Them Was Named at a Summer Workshop in 1956.

10 min read

History

The Government Lab That Accidentally Invented the Internet, GPS, Siri, and Self-Driving Cars

7 min read

Want to dig deeper? Trace any technology back to its origins.

Start Research

Key Takeaways

•DeepSeek V4 has 1 trillion total parameters but only activates 32 billion per query, a Mixture-of-Experts trick that slashes compute costs by up to 50x

•Built on Huawei and Cambricon chips to avoid US export controls, a deliberate move away from NVIDIA dependency

•The concept of a machine learning from data dates to Frank Rosenblatt's Perceptron in 1958, a device the Navy funded to 'read and identify shapes'

•Ada Lovelace wrote the first algorithm in 1843 for a machine that was never built, it took 114 years for hardware to catch up to her vision