Skip to main content
SuggestedTech
Back to all news

Open-weight models

DeepSeek just released a massive free AI model — here's what it means for you

A model with over a trillion parameters, a reading window the size of ten novels, and a price tag anyone can afford. Plain-English explainer.

The SuggestedTech TeamVerified April 2026

DeepSeek released V4 on 24 April 2026: two free, downloadable AI models with record-breaking context windows.

If you spotted headlines about DeepSeek dropping 'a trillion-parameter model' and weren't quite sure what to make of it — here's the plain-English version. The short answer: this is a genuinely big deal for anyone who builds with AI, uses coding assistants, or wants a capable AI model without paying top-dollar subscription fees. Here's why.

What DeepSeek actually released

On 24 April 2026, a Chinese AI lab called DeepSeek released two new models:

  • V4-Pro — the big one: 1.6 trillion total parameters (though only about 49 billion are active at once, thanks to a 'mixture of experts' design that routes each query to the relevant part of the model). Think of it as a very large library where you only open the relevant shelf.
  • V4-Flash — the smaller, faster sibling: 284 billion parameters, 13 billion active. Great for tasks where speed and cost matter more than maximum intelligence.

Both are free to download under an MIT licence — meaning you can run them on your own hardware, modify them, and use them commercially, no permission needed.

The context window for both is one million tokens — roughly 750,000 words, or about ten average-length novels at once. For context (pun intended): most AI tools until recently topped out at around 100,000 tokens. Being able to feed an entire codebase, a year's worth of customer emails, or multiple lengthy reports into one prompt is a qualitatively different capability.

Why the price matters as much as the power

If you're using V4-Pro through DeepSeek's API (rather than running it yourself), the standard price is $1.74 per million input tokens and $3.48 per million output tokens — and at launch DeepSeek ran a 75% promotion that put V4-Pro at just $0.435 input / $0.87 output. For V4-Flash, it's $0.14 input / $0.28 output. Either way that's far cheaper than comparable U.S. AI services for the Pro version — and an order of magnitude cheaper for Flash. For a developer building an AI-powered product, that kind of price difference can mean the difference between a profitable business and one that isn't. For a curious individual, it means experimenting becomes essentially free.

An internal survey of 85 experienced developers cited by MIT Technology Review found 'more than 90% included V4-Pro among their top model choices for coding tasks' — a striking adoption signal from a community that tends to be sceptical of new releases.

Source: MIT Technology Review · 24 April 2026

How smart is it, really?

Independent testing from Artificial Analysis placed V4-Pro at #2 on their open-weight reasoning index, behind Moonshot's Kimi K2.6 (another open-weight model). DeepSeek's own report also claims a strong score on the LiveCodeBench coding test — 93.5 versus Claude Opus 4.6's 88.8 — though that particular figure is self-reported by DeepSeek rather than independently verified, so treat it as a vendor claim. The Council on Foreign Relations put the overall U.S. AI lead at roughly seven months — so it's not the single smartest AI in the world, but it's the smartest you can download and run yourself, and it's close enough to the front to matter for most real tasks.

Artificial Analysis reported V4-Pro ranked #2 among open-weight reasoning models on the Intelligence Index, behind only Kimi K2.6, and led open models on agentic coding tasks.

Source: Artificial Analysis · 27 April 2026

One thing worth knowing: V4 is text-only at launch — no image understanding, no voice. If you need multimodal capabilities, you'll need a different model for now. Also, V4 is the first DeepSeek model designed to run on Chinese domestic chips (Huawei Ascend) rather than the Nvidia GPUs most AI models are built for. That's a technical detail that mainly matters if you're running it at scale — but it's worth knowing DeepSeek is engineering its own independent hardware path.

What to actually do with it

If you're a developer, the most practical starting point is the DeepSeek API — swap out your current model endpoint and compare. Many find V4-Flash is sufficient for 80% of use cases at a tenth of the cost. If you want to self-host and have a GPU server, the weights are on Hugging Face right now. If you're just curious, DeepSeek's chat interface at chat.deepseek.com now offers both models: 'Instant Mode' (Flash) and 'Expert Mode' (Pro). The new capabilities to test: paste a very long document and ask detailed questions about it, or throw a large codebase at it. The million-token context is where V4 genuinely earns its reputation.

Frequently asked questions

Is DeepSeek V4 free to use?
The weights are free to download and self-host under an MIT licence. API access is pay-per-token: V4-Pro lists at $1.74/M input (launched on a 75% promo at $0.435/M), V4-Flash at $0.14/M — much cheaper than most comparable services. The chat interface at chat.deepseek.com offers both for free with rate limits.
What is a 'mixture of experts' model?
It's an architecture where only a fraction of the model's total parameters activate for any given input — the relevant 'experts' are routed to, while the rest sit idle. V4-Pro has 1.6 trillion total parameters but only 49 billion active at once. This makes it far cheaper and faster to run than a dense 1.6T-parameter model would be.
How does DeepSeek V4 compare to ChatGPT or Claude?
Independent benchmarks put V4-Pro within a few percentage points of the best U.S. models on most tasks — and DeepSeek's own report claims it leads LiveCodeBench at 93.5 vs Claude Opus 4.6's 88.8, though that figure is self-reported and not independently confirmed. The CFR put the overall U.S. AI lead at roughly seven months. On coding and agentic tasks it performs very well. The main advantages are price (far cheaper via API) and the ability to self-host.
What is the one-million-token context window good for?
Feeding in very large documents, codebases, or datasets in a single prompt. At 1M tokens you can process roughly 750,000 words — an entire software repository, years of correspondence, or multiple full-length reports — and ask questions across the whole thing at once.

Sources

← All news