AI Models

China Built a Free AI on Chips Washington Banned. It Just Made $25/Month Subscriptions Look Absurd.

Aditya Kumar JhaAditya Kumar JhaLinkedInAmazon·May 20, 2026·24 min read

DeepSeek V4 Pro costs $3.48 per million output tokens. Claude Opus 4.7 costs $25. GPT-5.5 costs $30 — double what GPT-5.4 cost six weeks ago. The weights are MIT-licensed and free to download. It runs on Huawei chips. The CEO of Nvidia called that 'a horrible outcome for America.' DeepSeek's engineers had a serious training failure in mid-2025 trying to make it work — gradient sync timed out on a 1,024-card cluster. They rewrote the code anyway. You have 11 days to evaluate this before the 75% promotional discount expires May 31. This is everything you need to make that decision: every verified benchmark, the real human story behind the Huawei chips, the distillation scandal briefed to Congress, who should switch and who absolutely should not. Verified and tested by Aditya Kumar Jha, May 20, 2026.

Twenty-six days ago, two things happened within 18 hours of each other that permanently changed the economics of building with AI. On April 23, OpenAI launched GPT-5.5 at $30 per million output tokens — double the price of GPT-5.4, which had launched six weeks earlier. On April 24, DeepSeek launched V4 at $3.48 per million output tokens. Free weights. MIT license. One million token context. And when Nvidia CEO Jensen Huang found out it ran on Huawei chips instead of his, he said it would be 'a horrible outcome for America.'

Most coverage lasted 48 hours and moved on. This piece is written 26 days later because May 20 is when the decisions actually happen. The 75% promotional discount on V4 Pro expires in 11 days. Teams that haven't evaluated it yet are making that call this week. The Claude Sonnet 4.6 comparison — the one that reframes the entire debate — is still buried in FAQ sections of every major analysis. The story of how DeepSeek actually built V4 on Huawei chips after a training failure nearly derailed the entire project hasn't been told in English at all.

This is not a recap. It is an update. Every number in this piece was verified on May 20, 2026. Every section was written assuming you have already read the launch-day coverage — and still aren't sure what to actually do.

Insight

TL;DR — May 20, 2026. Verified by Aditya Kumar Jha. DeepSeek V4 Pro: $3.48/M output (7x cheaper than Claude Opus 4.7, 8.6x cheaper than GPT-5.5). MIT open weights. 1M context. 75% promo through May 31 15:59 UTC ($0.87/M promo rate). V4 Pro leads on LiveCodeBench (#1) and BrowseComp (83.4% vs Claude's 79.3%). Trails Claude on SWE-bench Pro (55.4% vs 64.3%), GPQA Diamond (90.1% vs 94.2%), HLE no-tools (37.7% vs 46.9%), and SimpleQA factual accuracy (57.9% — 17.7pts behind Gemini 3.1 Pro). All API traffic routes to Chinese servers — banned across US federal agencies, 12 states, Australia, Taiwan, South Korea, Italy, Denmark, and Canada. Self-hosting (V4 Flash, ~158GB disk, 96GB+ GPU memory) eliminates that entirely. 11 days left on the promo. Test at list price before you commit.

Insight

📸 The comparison that ends most DeepSeek debates: Claude Sonnet 4.6 scores 79.6% on SWE-bench Verified. DeepSeek V4 Pro scores 80.6%. They are within 1 percentage point of each other on the benchmark that best predicts everyday coding value — at comparable API cost — and only one of them routes your data to Chinese servers. The entire industry is debating the wrong two models.

What Most Comparisons Won't Tell You

The person who told you to switch to DeepSeek V4 Pro probably hasn't run their own SimpleQA benchmark. The benchmark that measures factual accuracy — not analytical capability, not coding, but 'will this model tell my users something confidently wrong?' — shows V4 Pro at 57.9% versus Gemini 3.1 Pro at 75.6%. A 17.7-point gap. That gap shows up in the third week of production when a customer-facing pipeline confidently returns an incorrect fact and you're debugging why your support tickets tripled.

The entire industry is also debating the wrong two models. The fight everyone is watching is DeepSeek V4 Pro versus Claude Opus 4.7. Claude Sonnet 4.6 — at $3 input / $15 output per million tokens, 79.6% SWE-bench Verified, 40–60 tokens/second, US data jurisdiction, HIPAA/SOC 2 compliant — is within 1 percentage point of V4 Pro on the benchmark that predicts everyday coding value. At comparable price. Without routing data to Chinese servers. If you are currently choosing between Opus 4.7 and V4 Pro, you are choosing between a $25 product and a $3.48 product. If you are choosing between Sonnet 4.6 and V4 Pro, you are choosing between two models performing within measurement error of each other on your most likely workload — and the decision is mostly about LiveCodeBench performance, data sovereignty, and whether you want open weights. That framing changes everything about how you evaluate this.

Why May 20 Is the Right Day to Read This

Three things changed since launch day that make this piece different from April 24 coverage.

First: 11 days. The 75% promotional discount on V4 Pro expires May 31, 2026 at 15:59 UTC. After that, list price applies: $1.74/$3.48 per million tokens. Teams should model routing decisions at list price — the promo rate ($0.435/$0.87) represents a 29x gap versus Claude Opus 4.7. The list rate is 7x. Both are significant, but the difference matters for your spreadsheet.

Second: the Council on Foreign Relations published its V4 assessment on April 29. The headline — 'V4 trails US frontier models by 3–6 months' — made news. The buried line did not: they called the Huawei 950PR chip migration 'potentially more consequential long-term for the US-China AI race than V4's benchmark scores.' That sentence is doing a lot of work. This piece explains why.

Third: at Google I/O on May 19, 2026 — one day before this publication — Google announced Gemini 3.5 Flash. It surpasses Gemini 3.1 Pro on coding and agentic benchmarks at one-third to one-half the cost, outputting tokens 4x faster than other frontier models. Gemini 3.5 Pro follows next month. Gemini 4 was not announced. A new low-cost Google entrant with US data jurisdiction has entered the market — it belongs in every routing decision you make this week.

The Story Behind the Headline: How DeepSeek Nearly Failed to Build V4

Everyone knows V4 runs on Huawei chips. Almost no one knows why it almost didn't exist at all.

In mid-2025, according to 36Kr's investigative reporting (confirmed by Robonaissance's English-language analysis), DeepSeek ran into what engineers close to the company described as a 'relatively serious training failure.' The team was migrating its training framework from NVIDIA's CUDA software stack to Huawei's CANN framework — a rewrite that proved, in the words of engineers who watched it happen, 'an order of magnitude harder than the public framing suggested.' The specific failure point: when DeepSeek attempted full training on the Ascend 910C, gradient synchronization timed out on the 1,024-card cluster. The model's numerical precision — the exact way each layer computed and communicated results — aligned differently on Ascend hardware than on NVIDIA hardware. Getting the same model to produce identical mathematical results on both platforms required months of low-level rewriting that had nothing to do with model architecture and everything to do with chip-level floating point behavior. Source: 36Kr/AI Emergence investigative report; Robonaissance analysis, April 2026.

It got harder. Internally, DeepSeek's founder Liang Wenfeng 'put forward some of his own demands,' according to an insider quoted by 36Kr, 'but it was difficult to find compromises at the execution level.' The company had engineering disagreements about training direction on top of the hardware migration. It delayed the multimodal capabilities that outside speculation had predicted — V4, despite months of expectation, ships as a language-only model. The constraint was compute and cash, not capability ambition. Source: 36Kr/AI Emergence, April 2026; Robonaissance, April 2026.

They pushed through it. The migration from CUDA to CANN completed. And then, deliberately — not out of necessity — DeepSeek gave Huawei exclusive early hardware access during V4 development and did not show the model to NVIDIA at all. This reverses the standard industry practice of sharing new models with chip vendors first to verify compatibility. 'The months of delay to complete the CUDA-to-CANN migration were, in effect, the price of a strategic independence statement,' as one analysis framed it. DeepSeek's V3 was trained on NVIDIA H800 GPUs. V4 was not. That transition was chosen, survived a training failure, and shipped. Source: Remio AI analysis, April 2026; Reuters, April 4, 2026; The NextWeb, April 2026.

Insight

📸 Nvidia CEO Jensen Huang said on the Dwarkesh Podcast that DeepSeek optimizing its models for Huawei chips instead of American hardware would be 'a horrible outcome for America.' DeepSeek then did exactly that — after a serious training failure, internal disagreements, and months of rewriting code — and shipped it as a free, MIT-licensed model at $3.48 per million tokens. Source: The NextWeb, April 2026; 36Kr, April 2026.

The scale story is what the geopolitical analysts are getting wrong. 750,000 Huawei Ascend 950PR units planned for 2026 sounds significant. ChinaTalk's analysis puts it in context: 'For reference, that is just one week of quality-adjusted American chip production.' DeepSeek broke through the NVIDIA dependency. It did not close the compute gap. The US Blackwell and upcoming Rubin chips support FP4 numerical precision — effectively doubling usable compute from the prior generation. Huawei's Ascend 950PR also supports FP4, which is why it matters. But DeepSeek still trails by 3–6 months and the training volume advantage remains American. The chip export controls slowed DeepSeek. They did not stop it. Source: ChinaTalk DeepSeek V4 analysis, April 2026; CFR, April 29, 2026.

One more detail that no English-language coverage has emphasized: V4's launch day saw eight Chinese domestic chip vendors confirm native V4 support simultaneously through BAAI's FlagOS national AI software stack — Cambricon, Hygon, Moore Threads, Suiyuan, and four others. This is not a DeepSeek story. This is a Chinese chip ecosystem story. DeepSeek V4 is the first frontier-adjacent open-weight model that the entire Chinese domestic chip industry can run natively, without NVIDIA, on the same day it ships. That is what Jensen Huang was worried about. Source: Robonaissance, April 2026.

The Four Models: What Each One Is Actually Built For

  • Claude Opus 4.7 (April 16, 2026 — Anthropic). Current Anthropic flagship. 87.6% SWE-bench Verified, 94.2% GPQA Diamond, 64.3% SWE-bench Pro — the highest SWE-bench Pro score of any publicly available model as of May 20. New xhigh effort level, 3.75MP image resolution, Task Budgets, Adaptive Thinking, Agent Teams. Price: $5 input / $25 output per million tokens. Critical: new tokenizer produces up to 35% more tokens for identical text vs Opus 4.6 — your effective bill can rise 12–35% even though the rate card didn't move. Benchmark actual workloads before migrating. Source: Anthropic model card April 2026; Anthropic pricing docs May 2026.
  • Claude Sonnet 4.6 (Feb 17, 2026 — Anthropic). The model most comparisons skip. $3 input / $15 output. 79.6% SWE-bench Verified — within 1 point of V4 Pro at 80.6%. 40–60 tokens/sec. US data jurisdiction. Full compliance. No tokenizer inflation risk. For most engineering workloads, Sonnet 4.6 versus V4 Pro is a data sovereignty decision with comparable performance on each side, not a quality decision. Source: Anthropic, Artificial Analysis, May 2026.
  • DeepSeek V4 Pro (April 24, 2026 — DeepSeek). 1.6T total parameters, 49B active via MoE. MIT license. $1.74/$3.48 list price. 75% promo through May 31: $0.435/$0.87. 1M context. 80.6% SWE-bench Verified, 90.1% GPQA Diamond, #1 LiveCodeBench, 3,206 Codeforces, 37.7% HLE, 83.4% BrowseComp. Trained on Huawei Ascend 950PR. API routes to Chinese servers. Not on AWS Bedrock, Azure OpenAI, or Google Cloud as of May 20. Source: DeepSeek technical report April 2026.
  • DeepSeek V4 Flash (April 24, 2026 — DeepSeek). 284B total, 13B active. $0.14/$0.28 per million tokens — 89x cheaper output than Claude Opus 4.7, 107x cheaper than GPT-5.5. MIT license, 1M context. 79% SWE-bench Verified. ~140 tokens/sec. Self-host: ~158GB on disk (Q4_K_M quant), minimum 96GB GPU memory — single H200 141GB runs it cleanly, dual H100 80GB is the minimum for 256K context. Source: TechCrunch April 2026; Codersera hardware guide April 2026.

Every Benchmark That Matters — And What It Actually Predicts

BenchmarkWhat it predicts in productionClaude Opus 4.7Claude Sonnet 4.6DeepSeek V4 ProDeepSeek V4 FlashWinner + what it means
SWE-bench VerifiedFixing real GitHub issues in actual codebases — best everyday coding predictor87.6%79.6%80.6%79.0%Claude Opus 4.7 — but Sonnet 4.6 and V4 Pro are within 1pt of each other
SWE-bench ProMulti-language production engineering — no test leakage, harder variant64.3%N/A55.4%N/AClaude Opus 4.7 (+8.9pts) — this gap matters for complex production repos
GPQA DiamondGraduate-level physics, chemistry, biology — domain knowledge ceiling test94.2%74.1%90.1%N/AClaude Opus 4.7 (+4.1pts) — meaningful at this difficulty level
HLE (no tools)Expert cross-domain reasoning at knowledge limits — can the model genuinely think?46.9%~34.5%37.7%N/AClaude Opus 4.7 (+9.2pts) — gap inverts when tools enabled (V4 Pro 48.2%)
LiveCodeBenchCompetitive coding — continuously refreshed, cannot be memorizedStrong, not #1N/A#1 all models91.6% — also #1 in classDeepSeek V4 Pro — clear winner for algorithmic and novel coding tasks
BrowseCompAgentic web research — finding difficult information online across multiple steps79.3%N/A83.4%N/ADeepSeek V4 Pro — beats Claude, trails GPT-5.5 (84.4%) and Gemini 3.1 Pro (85.9%)
Terminal-Bench 2.0Autonomous agentic execution — real multi-step tasks with 3-hour real-world timeouts69.4%N/A67.9%N/AEffectively tied. GPT-5.5 leads at 82.7% — notable gap for autonomous agents
SimpleQA-VerifiedFactual accuracy — the gap your support tickets will feel before your dashboard doesStrongStrong57.9%N/AClaude/Gemini win. V4 Pro trails Gemini 3.1 Pro by 17.7pts — the production risk no one mentions
Output price / M tokens (May 20)What you pay per million output tokens at list rate, May 20, 2026$25.00$15.00$3.48$0.28DeepSeek — 7–89x cheaper depending on which Claude model you compare
Insight

📸 The number your manager needs to see: SimpleQA-Verified, where V4 Pro scores 57.9% — a 17.7-point gap behind Gemini 3.1 Pro. Every other benchmark measures analytical intelligence: can the model solve this? SimpleQA measures reliability: will the model give your user the correct fact, or a confident hallucination? The cheapest model that tells your users the wrong thing is not actually cheaper. Source: BuildFastWithAI, April 24, 2026.

The Price Math: Real Workloads, Real Numbers

All prices as of May 20, 2026 from official provider pricing pages. Three workloads — from light to enterprise-scale.

Workload 1 — Coding assistant (1 developer): 50,000 input tokens per request, 10,000 output, 20 requests/day, 22 working days. Monthly: 22M input tokens, 4.4M output tokens. Claude Opus 4.7: $110 + $110 = $220/month. DeepSeek V4 Pro at list: $38.28 + $15.31 = $53.59/month. Savings: $166/month, $1,996/year. Five developer seats: ~$10,000/year saved at list price alone.

Workload 2 — Batch document processing: 200,000 input, 5,000 output, 200 documents/day. Claude Opus 4.7: $22,550/month. DeepSeek V4 Pro at list: $8,030/month. Savings: $14,520/month. Caveat: at 57.9% SimpleQA accuracy, if 1% of outputs require $50 correction each across 4,400 documents/month, you add back $2,200/month in downstream cost. The token savings are real — model the error cost against them before committing.

Workload 3 — The Uber calculation (the one that went viral): Cline CEO Saoud Rizwan published publicly that if Uber had routed its 2026 AI workload through DeepSeek V4 Pro instead of Claude, the budget would last seven years instead of four months. Real company, real consumption data, real pricing differential. It spread widely in developer circles because enterprise procurement teams could immediately run the same math on their own numbers. It's not a demo — it's arithmetic. Source: Decrypt, April 24, 2026.

Insight

⏰ DEADLINE — 11 DAYS: The 75% promotional discount on V4 Pro expires May 31, 2026 at 15:59 UTC. Promotional output: $0.87/M (29x cheaper than Claude Opus 4.7). Post-promotion output: $3.48/M (7.2x cheaper). Model your economics at list price before you build on the promo rate. Bloomberg confirmed April 27, 2026 that DeepSeek permanently reduced input cache pricing to 1/10th of prior rates across its entire model family — that reduction survives beyond May 31 and benefits workloads with repeated system prompts or large document contexts. Source: Bloomberg, April 27, 2026; DeepSeek official pricing docs, May 2026.

The Week Nobody Covered Correctly: AI Prices Moved in Both Directions at Once

April 23, 2026. OpenAI launches GPT-5.5 at $5 input, $30 output per million tokens. That is double GPT-5.4's rate of $2.50/$15 — the largest single-model price increase OpenAI has ever made for a flagship. Greg Brockman called it 'a new class of intelligence built specifically for real work.' Developers calculated that GPT-5.5's improved token efficiency produces roughly 20% effective cost increase on agentic workloads instead of a clean 2x — but on standard completions with no efficiency gains, the rate doubled. Full stop.

April 24, 2026. DeepSeek launches at $3.48 output. Within 48 hours, a developer evaluating AI models went from 'GPT-5.5 at $30' to 'V4 Pro at $3.48' as the cost reference points — an 8.6x swing in nominal price in two days. The OpenAI price doubling is the context every DeepSeek comparison skips. Without it, $3.48 looks cheap against a stable market. With it, the market itself broke apart at both ends simultaneously — and no one fully covered what that double movement means for developers caught in the middle.

The FairMind analysis (CEO Alexio Cassani, May 5, 2026) quantified the impact across one seven-day window: OpenAI, Anthropic, and GitHub all altered their economic terms through three different mechanisms, generating gaps of up to 92% between published list prices and actual billed costs on identical requests. Any AI infrastructure cost model built before May 2026 is wrong. Not slightly wrong — 92% wrong in some configurations. Rebuild it from the current rate cards before making routing decisions. Source: FairMind analysis, May 5, 2026.

Where V4 Pro Wins — The Specific Tasks Worth Routing There

LiveCodeBench is the most important benchmark result in this comparison because it cannot be gamed. Problems come exclusively from recent competitive programming contests, refreshed continuously. No model can have memorized the answers. DeepSeek V4 Pro holds the #1 spot across all models — ahead of Claude Opus 4.7 and GPT-5.5. For genuinely novel algorithmic problems, V4 Pro at $3.48/M output outscores models at $25/M. That is not a footnote — it is the routing rule.

Long-context document work: V4 Pro's inference costs approximately 10x less compute than V3.2 at 1 million tokens, making the 1M context window economically viable in production. On CorpusQA — document analysis at 1M tokens — V4 Pro leads open-source models. For bulk processing of legal filings, financial documents, large codebases, or research papers, the context window plus price combination is unmatched at list pricing.

Interleaved thinking is the architectural feature that makes V4 Pro genuinely better for multi-step agents — not just cheaper. Prior models flushed reasoning context between tool calls; every step started from scratch. V4 Pro preserves the reasoning chain across tool calls. In practice: more coherent outputs when an agent needs to synthesize information from multiple sequential tool results. For web research agents and multi-file code review bots, this produces measurable improvement independent of the price advantage.

Where V4 Pro Loses — The Tasks That Justify Paying the Premium

SWE-bench Pro reveals the production coding gap the headline SWE-bench Verified number hides. Verified: 80.6% V4 Pro vs 87.6% Claude Opus 4.7 — a 7-point gap. Pro (harder, multi-language, no test leakage): 55.4% vs 64.3% — an 8.9-point gap. For production codebases where debugging a wrong AI output costs more than the token savings, Claude Opus 4.7's lead is real, consistent, and grows as task complexity increases.

Factual accuracy: V4 Pro's 57.9% on SimpleQA-Verified versus Claude's strong performance is not an abstract benchmark gap. It is a concrete production failure mode — the model producing a confident wrong answer in your customer-facing output. Claude Opus 4.7's 36% hallucination rate (The-Decoder, April 24, 2026) versus GPT-5.5's 86% puts Claude at the safest end of the available spectrum. If your application surfaces AI output directly to users without human review, this is the number that determines your error budget.

Expert reasoning without tools: V4 Pro's 37.7% HLE versus Claude Opus 4.7's 46.9% is a 9.2-point gap on the benchmark designed to test genuine cross-domain expert reasoning. This matters for scientific synthesis, novel research, graduate-level academic analysis — work at the edge of what any model can currently do. The gap inverts when tools are enabled (V4 Pro 48.2% vs Claude ~47%), which means for tool-equipped reasoning pipelines, V4 Pro is actually competitive.

The Huawei Strategy: What Jensen Huang Was Actually Worried About

The US AI export control policy rests on a single structural bet: advanced AI requires NVIDIA compute, and controlling NVIDIA compute controls the pace of Chinese AI development. The bet was not obviously wrong through 2024. The Ascend 910C — Huawei's chip before the 950PR — delivered roughly 60% of the inference performance of NVIDIA's H100. That gap matters. Then DeepSeek spent months rewriting gradient synchronization code, survived a training failure on a 1,024-card Huawei cluster, and shipped V4 Pro on Ascend 950PR. The bet did not hold.

The more important number from ChinaTalk's analysis: 750,000 Ascend 950PR units planned for 2026 'is just one week of quality-adjusted American chip production.' DeepSeek did not close the compute gap. It found a way to build a frontier-adjacent model despite the gap — and proved the gap can be worked around. Whether Huawei ships at full volume in 2026 determines whether the gap continues to close or stabilizes. That is now the hinge question for US AI export control policy. Source: ChinaTalk, April 2026; Asia Times, April 2026.

Insight

📸 DeepSeek's V3 model was trained on NVIDIA H800 GPUs. V4 was not. DeepSeek gave Huawei — not NVIDIA — exclusive early hardware access during V4 development and did not share the model with NVIDIA at all. That reverses standard industry practice. The months of delay caused by the CUDA-to-CANN migration were, as one analysis put it, 'the price of a strategic independence statement.' V4 shipped. And eight Chinese domestic chip vendors confirmed native V4 support on launch day. Source: Remio AI; Robonaissance; 36Kr, April 2026.

The Distillation Scandal: What Congress Was Briefed On

April 23, 2026 — the day before V4 launched — the White House OSTP released a statement: foreign entities, principally based in China, are engaged in 'deliberate, industrial-scale campaigns to distill US frontier AI models.' Present tense. Already happening.

Anthropic had already documented the specifics on February 23, 2026: DeepSeek, Moonshot, and MiniMax generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts in violation of Anthropic's terms and regional access restrictions. The explicit goal: extract Claude's capabilities to improve their own models. OpenAI submitted a parallel memorandum to the House Select Committee on China stating DeepSeek had used distillation to 'free-ride on capabilities developed by OpenAI,' and that new obfuscated bypass methods had been identified and not fully stopped. Source: Anthropic February 23, 2026; Asia Times citing OpenAI memo, February 2026.

DeepSeek's own April 2026 technical paper describes using 'On-Policy Distillation (OPD)' from 10 teacher models. The company has not formally responded to Anthropic's or OpenAI's allegations. Distillation is not clearly illegal under US law — it is an open IP question. It is clearly a terms of service violation for both Anthropic and OpenAI. No US legal action has been filed as of May 20, 2026. Source: TechCrunch, April 24, 2026.

The Data Sovereignty Risk: Who Is Actually Banned and Who Is Not

The DeepSeek API routes all inference to servers in mainland China. Every query falls under Chinese data law — mandatory government access on request, no court order required, no user notification. This is not a privacy concern in the abstract. It is the specific legal mechanism that drove three years of TikTok regulatory debate applied directly to your API calls.

Confirmed bans: US federal agencies including NASA, Pentagon, Navy, House of Representatives, and DISA. State legislation in Texas, New York, Virginia, Tennessee. Country-level bans on government devices: Australia, Taiwan, South Korea, Italy, Denmark, France, Belgium, Canada. Pending legislation: the 'No DeepSeek on Government Devices Act' (Gottheimer and LaHood) and the 'Protection Against Foreign Adversarial Artificial Intelligence Act' (Rosen and Cassidy) targeting all federal employees and contractors respectively. Source: Computer Weekly; Industrial Cyber; Conference Board, 2025–2026.

No US law currently prohibits private individuals or companies from using the DeepSeek API. If your organization operates under HIPAA, SOC 2, PCI-DSS, FedRAMP, or any federal contracting requirement, routing production data to the DeepSeek API is a compliance violation — not ambiguous. For everyone else: you are routing queries to Chinese servers, under Chinese law, from a company operating in a legal environment with mandatory government access. That is the decision to make explicitly, not by default.

Pro Tip

The clean solution: self-host. V4 Flash weights are ~158GB on disk (Q4_K_M quantization) and require 96GB+ GPU memory — a single H200 141GB runs it cleanly; dual H100 80GB for 256K context. V4 Pro requires multi-GPU cluster (8x H200 for full 1M context). Self-hosted: no data leaves your infrastructure, no Chinese law applies, inference costs under $0.10 per million tokens at cloud GPU rates. DeepSeek V4 is not on AWS Bedrock, Azure OpenAI, or Google Cloud as of May 20. Source: Codersera hardware guide April 2026; RunPod deployment guide April 2026.

Who This Changes Everything For — And Who It Changes Nothing For

Solo developer or indie hacker building non-sensitive products: V4 Pro is the obvious choice for high-volume tasks — especially coding and long-context document processing. Self-host V4 Flash for anything requiring data control. The price difference at your scale is the difference between a viable project and an expensive hobby. The compliance constraints don't apply to you. The factual accuracy gap matters if you're building something customer-facing — test it against your specific queries before you deploy.

Early-stage startup (non-regulated): V4 Pro saves $150,000–$2M per year at typical Series A AI inference volumes depending on workload. That is real runway. The routing decision is: V4 Pro for bulk coding automation, long-context processing, and agentic research. Claude Sonnet 4.6 for customer-facing factual outputs. Self-hosted V4 Flash for internal automation at scale. The compliance audit before Series B needs to account for any DeepSeek API exposure in regulated verticals you plan to enter.

Enterprise (regulated industry — healthcare, finance, legal, government-adjacent): The DeepSeek API is not available to you for production use. This is a compliance non-starter, not a pricing decision. Self-hosted V4 Flash is available if your security team approves the weights review. The interesting question for your AI strategy team is not V4's API pricing — it is whether the open weights, self-hosted, give you access to LiveCodeBench-quality coding performance inside your own data perimeter.

Federal contractor or government agency: DeepSeek is legally prohibited. The 'No DeepSeek on Government Devices Act' would formalize what your agency has likely already implemented as policy. This piece is not written for you on the V4 evaluation question — but the Gemini 3.5 Flash announcement from Google I/O May 19 is: it offers improved coding and agentic performance at lower cost than Gemini 3.1 Pro, with US data jurisdiction, on a platform with existing FedRAMP-ready infrastructure.

The Routing Decision Matrix — Updated May 20, 2026

Task typeRecommended modelWhyData sovereign?
Competitive / algorithmic codingDeepSeek V4 Pro (self-hosted or API if non-sensitive)#1 LiveCodeBench — clear winner for novel algorithmic problems at 1/7th Opus cost✅ self-hosted / ⚠️ API
Production codebase refactoringClaude Opus 4.7SWE-bench Pro 64.3% vs 55.4% — 8.9pt gap matters for complex multi-file work✅ Yes
Everyday coding (most teams)Claude Sonnet 4.6 or V4 ProWithin 1pt SWE-bench Verified. Decision is data sovereignty + open weights preference, not quality✅ Sonnet / ⚠️ V4 Pro API
Long-doc analysis (bulk)DeepSeek V4 Pro1M context at 10x cheaper inference vs V3.2; #1 CorpusQA open weights; non-sensitive data only⚠️ Non-sensitive only
Agentic web researchDeepSeek V4 Pro83.4% BrowseComp — beats Claude (79.3%) at 1/7th the cost; interleaved thinking advantage⚠️ Non-sensitive only
Expert scientific reasoningClaude Opus 4.7GPQA Diamond 94.2% vs 90.1%; HLE 46.9% vs 37.7% — non-negotiable for precision research✅ Yes
Customer-facing factual outputsClaude Sonnet 4.6SimpleQA gap makes V4 Pro a liability for direct user-facing factual responses at scale✅ Yes
High-volume internal automationDeepSeek V4 Flash (self-hosted)89x cheaper output than Opus 4.7; 79% SWE-bench Verified; MIT; inside your own infra✅ Self-hosted
Fast general-purpose (new option)Gemini 3.5 Flash (Google I/O May 19)Beats Gemini 3.1 Pro on coding and agentic; 4x output speed; 33–50% cheaper than 3.1 Pro; US jurisdiction✅ Yes
Regulated / healthcare / legalClaude or GPT (HIPAA/SOC 2 compliant providers only)DeepSeek API = Chinese servers = compliance violation. Non-negotiable.❌ DeepSeek API prohibited
Federal / government contractorClaude or GPT onlyDeepSeek banned by federal agencies; pending legislation would extend to all contractors❌ Legally prohibited

What This Actually Means: V4 Didn't Crash the Market — It Did Something More Permanent

When DeepSeek R1 launched in January 2025, NVIDIA lost $589 billion in market cap in one day. When DeepSeek V4 launched on April 24, 2026, the market response was, as CNBC reported, 'muted.' Morningstar's Ivan Su explained it plainly: 'Traders have already priced in the reality that Chinese AI is competitive and cheaper to use.' The shock is gone. What replaced it is integration.

V4 Flash weights have been downloaded more than 75 million times from HuggingFace across the entire DeepSeek model family since R1. V4 Pro is already callable inside Claude Code, Cursor, and Windsurf without a separate account via OpenRouter. The model is three weeks old and it is already embedded in the development workflow of a meaningful fraction of professional developers — not as an API call to DeepSeek, but as a routed alternative inside tools they already use. That is how incumbents get displaced. Not with a crash. With a quiet integration. Source: Britannica, 2026; OpenRouter, May 2026.

One more number. TechCrunch reported on May 6, 2026 that DeepSeek could hit a $45 billion valuation from its first outside investment round. The company offering MIT-licensed weights for free, running an API at $3.48 per million tokens with a 75% discount, is worth $45 billion. The business model is not the API pricing. The business model is the data — and the strategic position. Understanding that is the context for every pricing decision they make. Source: TechCrunch, May 6, 2026.

Insight

📸 DeepSeek V4 didn't crash markets like R1 did. It did something more permanent: it got quietly integrated into Claude Code, Cursor, and Windsurf as a routed alternative. Three weeks after launch, professional developers are running it without a DeepSeek account. That is not a disruption story — that is an adoption story. And those end differently.

5 Things to Do Before May 31 — The Evaluation Checklist

  • Run 30 of your actual production queries through V4 Pro and Claude Sonnet 4.6 side by side at platform.deepseek.com and claude.ai. Not benchmarks — your queries. For coding tasks: measure manual review time per output, not just pass rate. For factual tasks: independently verify 10 claims each model makes. The SimpleQA gap shows up in the tasks it predicts. Give yourself 2 hours and a decision framework before the promo ends.
  • Model your costs at V4 Pro list price ($1.74/$3.48), not promo price ($0.435/$0.87). The promo expires May 31 at 15:59 UTC. Any routing decision built on the promo rate is 4x off on June 1. Run the same calculation with Sonnet 4.6 ($3/$15) as the alternative — the comparison may surprise you.
  • Classify every workload by data sensitivity before touching DeepSeek's API. Three categories: (1) regulated/government-adjacent data — DeepSeek API prohibited; (2) internal non-sensitive data — evaluate on merit; (3) customer-facing — factor in SimpleQA gap before routing. This classification takes 30 minutes and prevents a compliance incident.
  • If you want open weights inside your own infrastructure, pull V4 Flash from HuggingFace this week. The weights are MIT-licensed, ~158GB on disk, and require 96GB+ GPU memory. A single H200 141GB runs it. Self-hosted: no data leaves your environment, no Chinese law applies, inference under $0.10/M tokens at cloud GPU rates. This is the DeepSeek use case that has no compliance objection for any organization.
  • If you are building with Claude Code or Cursor, check whether V4 Pro is already an available routing option in your settings. Via OpenRouter, it is callable inside these tools today without a separate DeepSeek account. You may already have access — and may already be sending queries there without a formal evaluation decision having been made.

Frequently Asked Questions

Frequently Asked Questions
01Is DeepSeek V4 actually free to use?

Three levels apply. (1) MIT open weights — free to download from HuggingFace. V4 Flash is ~158GB on disk; V4 Pro requires multi-GPU cluster infrastructure. No licensing fee, but you pay for GPU time. (2) chat.deepseek.com is free for personal use with fair-use throttling. (3) The API is paid: $0.14/$0.28/M for Flash, $1.74/$3.48/M for Pro at list price. 75% promo through May 31 reduces Pro to $0.435/$0.87. Free weights mean you own the model — inference compute still costs money. Source: DeepSeek official pricing docs, May 2026.

02Should American businesses be worried about data security with DeepSeek?

Yes — with specificity rather than panic. The API routes all traffic to Chinese servers under Chinese data law (mandatory government access on request, no court order, no notification). For regulated industries — HIPAA, SOC 2, PCI-DSS, FedRAMP, federal contracting — the API is a compliance violation for production data, not a risk to manage. For unregulated private use with non-sensitive data, it is a decision to make explicitly. Self-hosting V4 Flash eliminates the sovereignty issue entirely: MIT open weights, runs inside your own infrastructure, no data leaves your environment. Source: federal agency ban documentation, 2025–2026.

03Did DeepSeek steal from Claude?

Anthropic confirmed on February 23, 2026 that DeepSeek and two other Chinese labs generated over 16 million exchanges with Claude through 24,000 fraudulent accounts specifically to extract Claude's capabilities. OpenAI made parallel allegations to Congress. DeepSeek's own April 2026 technical paper describes On-Policy Distillation from 10 teacher models. Whether this is illegal under US law is an open IP question. Whether it violated Anthropic's terms of service is clear — those terms prohibit using Claude's outputs to train competing systems. No US legal action filed as of May 20, 2026. Source: Anthropic, February 23, 2026; Asia Times, April 2026.

04Did DeepSeek V4 actually run on Huawei chips, or is that disputed?

Confirmed by Reuters on April 4, 2026. The migration from NVIDIA CUDA to Huawei CANN required months of engineering work, including recovering from a serious training failure in mid-2025 when gradient synchronization timed out on a 1,024-card Ascend cluster. DeepSeek deliberately gave Huawei exclusive early hardware access and did not share V4 with NVIDIA — reversing standard industry practice. Eight domestic Chinese chip vendors confirmed native V4 support on launch day through BAAI's FlagOS. Whether inference also runs entirely on Huawei or still partially on NVIDIA hardware remains disputed; training on Ascend is confirmed. Source: Reuters April 4, 2026; 36Kr; Robonaissance, April 2026.

05Is Claude Sonnet 4.6 a better comparison than Opus 4.7 for most developers?

Yes — for most everyday engineering tasks. Sonnet 4.6 at $3/$15 per million tokens scores 79.6% SWE-bench Verified vs V4 Pro's 80.6% — within 1 percentage point at comparable cost, with US data jurisdiction and full compliance. The industry debate comparing V4 Pro to Claude Opus 4.7 is the wrong frame for most teams. V4 Pro beats Sonnet 4.6 specifically on: LiveCodeBench-class competitive coding, BrowseComp agentic web research, and long-context document processing at scale. For everything else, the models are within measurement error — and Sonnet 4.6 doesn't route your data to China. Source: Anthropic; Artificial Analysis, May 2026.

06What happens to the price on June 1 after the promotional discount ends?

V4 Pro returns to list: $1.74 input / $3.48 output per million tokens. That is still 7.2x cheaper than Claude Opus 4.7 ($25 output) and 8.6x cheaper than GPT-5.5 ($30 output). The promo rate ($0.87 output) made it 29x cheaper. DeepSeek permanently reduced input cache pricing to 1/10th of prior rates on April 26, 2026 — that change survives beyond May 31 and benefits workloads with large repeated contexts. Model at list price before committing. Source: Bloomberg, April 27, 2026; DeepSeek official pricing docs, May 2026.

Pro Tip

Verified testing note — Aditya Kumar Jha, May 20, 2026. I ran 30 production-equivalent queries this week across V4 Pro, Claude Sonnet 4.6, and Claude Opus 4.7: algorithmic coding problems, multi-file refactoring, research synthesis, and factual retrieval tasks. V4 Pro's LiveCodeBench advantage was visible and real — on 4 algorithmic problems I had not encountered before, V4 Pro found correct solutions that Sonnet 4.6 either missed or approximated. The SimpleQA gap was also real: on 3 of 10 factual retrieval tasks, V4 Pro returned confident but incorrect claims that both Claude models answered correctly. My routing rule after testing: V4 Pro for competitive coding, agentic research, and long-context document work. Claude Sonnet 4.6 for everything customer-facing or factually sensitive. Claude Opus 4.7 for complex production code refactoring and expert reasoning tasks with legal or financial consequences. The benchmarks predict what they say they predict. Run the test on your workload before the promo ends.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free to get started

Claude, GPT-5.4, Gemini —
all in one place.

Switch between 40+ AI models in a single conversation. No juggling tabs, no separate subscriptions. Pay only for what you use.

Start for free No credit card needed
Aditya Kumar Jha
Written by
Aditya Kumar JhaLinkedIn

Published author of six books and founder of LumiChats. Writes about AI tools, model comparisons, and how AI is reshaping work and education.

Keep reading

More guides for AI-powered students.