DeepSeek-V3

125 points by mgt19937 6 months ago

msoad 6 months ago

You can run a model that can beat 4o which was released less than 6 months ago _locally_! I know this requires a ton of hardware but OpenAI will not be the leader in 2025 I can assume. Always bet on open source (or rather somewhat more open development strategies)

The math and coding performance is what we really care about. I am paying for o1 Pro and also Sonnet, in my experience beside Sonnet being faster, it is also better at many tasks. In a few instances I got answers from o1 Pro but it's not justifying the price so I am cancelling and going back to $20/mo.

I am currently paying for Cursor, Claude, ChatGPT and v0! The productivity I am gaining from those tools are totally worth it (except for o1 Pro). But I am really hoping at some point those tools converge so I can pay less. For instance I am looking forward to VSCode Copilot improvements so I can go back to VSCode and once Claude has no limits I rather pay for one AI system.

chvid 6 months ago

OpenAi toppled as LLM leader by an open source / open weight company?
OpenAi has much more capital and compute than any of its competitors (especially deepseek); if that was to happen it would demonstrate that capital and compute doesn't matter as much as it is assumed ... (and it just might be the thing that pops the current ai bubble).
- cloverich 6 months ago
  
  Until the models can host themselves theyll always need a company to make the experience good enough for typical users; OpenAI can always host open source models instead of their own and their user base will mostly stick around, especially if they can leverage their existing base into a network effect. I wouldnt be surprised it they are investing heavily into this vs pure model hosting running.
  Im thinking their real challenge will be surviving Apple (once they go all in) or Google (if they can figure out how to make a good product). Or something along those lines.
- xnx 6 months ago
  
  > OpenAi has much more capital and compute than any of its competitors
  Isn't openai still losing money? I don't think they own any data centers.
polotics 6 months ago

well yes, locally, if you assume that someone's got about 300'000 dollars of hardware at hand... right? as you are not paying for Gemini, may I ask why, did you try it and find it inferior?
- apexalpha 6 months ago
  
  I bought two (relatively) old datacenter GPUs with 48gb VRAM total for €200 that gets me 7 token/s for a 70b model.
  - 383toast 6 months ago
    
    which GPUs?
    
    zargon 6 months ago
    
    Not the GP, but I bought a few P40s over the summer for $150 each. Last I checked they're more expensive now, but it's still cheap vram and fast enough at inference for me.
    
    apexalpha 6 months ago
    
    Nvidia M40 and P40.
- KTibow 6 months ago
  
  You actually can't pay for the latest models, they're only available as free with limits
- msoad 6 months ago
  
  Gemini for coding does not work for me. It gets so many things wrong
  - xnx 6 months ago
    
    You should try again. Gemini rates highest on coding at lmarena.
  - sumedh 6 months ago
    
    Which Gemini AI model did you use?

rahimnathwani 6 months ago

Pricing per million tokens:

  Model               Input     Output    
  ─────────────────────────────────────
  Claude 3.5 Sonnet   $3.00     $15.00
  GPT-4o              $2.50     $10.00
  Gemini 1.5 Pro      $1.25      $5.00
  Deepseek V3         $0.27      $1.10
  GPT-4o-mini         $0.15      $0.60

zardinality 6 months ago

In the introduction of the paper it says: "Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks." They have indeed a very strong infra team.

ComputerGuru 6 months ago

Do we have two completely different definitions of “infrastructure”?

futureshock 6 months ago

Someone pointed out on Reddit that DeekSeek v3 is 53x cheaper to inference than Claude Sonnet which it trades blows with in the benchmarks. As we saw with o3, compute cost to hit a certain benchmark score will become an important number now that we are in the era that you can throw an arbitrary amount of test time compute to hit an arbitrary benchmark number.

https://old.reddit.com/r/LocalLLaMA/comments/1hmm8v9/psa_dee...

handzhiev 6 months ago

How is this not on the front page. It's a remarkable release.

miletus 6 months ago

Noticed the same thing. DeepSeek-V3 is remarkable (beats 4o/ claude), but it's not on the front page.
It seems they don't want china to win haha
sumedh 6 months ago

Probably because it's a censored model

wenyuanyu 6 months ago

Truly remarkable! Their approach to distributed inference is on an entirely new level. For the prefill stage, they utilized a deployment unit comprising 32 H800 GPUs, while the decoding stage scaled up to 320!! H800 GPUs per unit. Incorporates a multitude of sophisticated parallelization and communication overlap techniques, setting a standard that’s rarely seen in other setups.

[0] https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...

WiSaGaN 6 months ago

It still fails my private physics testing question half the time, where claude 3.5 sonnet and openai o1 (both web version) most of the time passes. So I'd say close to SOTA but not quite. However given deekseek already has the r1 lite preview, and they can achieve comparable performance for much less compute (assuming the API cost of close models roughly represent the inference cost), then it's not unreasonable to believe deepseek may be close to release very good test compute scaling model that is similar to o3 high effort.

sergiotapia 6 months ago

I'm using their API - the model is referenced by `deepseek-chat` and works really well. Seeing some more intelligent responses to my users inputs. Better adherence to the "spirit" of what I was trying to accomplish with the prompt. This is so exciting!

Take note of their suggested temperatures! https://api-docs.deepseek.com/quick_start/parameter_settings

williamstein 6 months ago

What is the DeepSeek team? Who is making this?

elfbargpt 6 months ago

From @kevinsxu on twitter:
Some interesting facts about DeepSeek:
- never received/sought outside funding (thus far)
- self-funded out of a hedge fund (called High-Flyer)
- entire AI team is reportedly recruited from within China, no one who's worked at a foreign company
- founder is classmates with the founder of DJI, both studied at Zhejiang University

gck1 6 months ago

Since most of my usage of LLMs is through cline or APIs now, specifically for coding assistance, and I’m not comfortable trusting my codebase or potentially leaked secrets, to a company operating under CCP supervision, I’ll stick to waiting until this forces Claude to lower their pricing on 3.5 Sonnet instead.

ripped_britches 6 months ago

It’s open weight my friend

rubslopes 6 months ago

Already available at OpenRouter: https://openrouter.ai/deepseek/deepseek-chat

Cost / million tokens: Input $0.14 Output $0.28

d4rkp4ttern 6 months ago

But you can directly use it via the deepseek platform via an OpenAI-compatible API. Does OpenRouter offer any advantages?
https://platform.deepseek.com/usage
- atroche 6 months ago
  
  One key, one base_url config, one billing account. If you like to mess around with many different models it's very convenient.
  - rubslopes 6 months ago
    
    Yes, all of that. In addition, you can easily give the same prompt to several models at the same time to see how they respond.
    
    d4rkp4ttern 6 months ago
    
    Thanks, I was more curious whether they offer a better TOPS. Also there’s this concerning skepticism about OpenRouter:
    https://www.reddit.com/r/LocalLLaMA/s/uGxhqi1YYh
xur17 6 months ago

For comparison, Claude 3.5 Sonnet (my favorite model for coding tasks) is: Input $3 Output $15.

janice1999 6 months ago

> a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

What kind of hardware do you need to run this?

bavell 6 months ago

8x H200s recommended:
https://github.com/sgl-project/sglang/tree/main/benchmark/de...
boroboro4 6 months ago

They discuss it in the paper and recommend 32 GPUs (H800 in their case) for prefill stage and 320 GPUs for decoding.
=)

deyiao 6 months ago

The benchmark results seem unrealistically good, but I'm not sure from which angles I should challenge them.

ai-christianson 6 months ago

I think they're real. The model is performing better than claude-3-5-sonnet-20241022 on the claude leaderboard:
https://aider.chat/docs/leaderboards/

bobosha 6 months ago

The results look quite promising.i will give this a try...

orena 6 months ago

So no bitter lesson?