This is by far the best model I have seen until now.
#8 opened 4 months ago
by
ZeroWw
How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100
1
#7 opened 6 months ago
by
harvin-cn
Can DeepSeek-V2 run on two nodes (each with 4 A100)?
1
#5 opened 6 months ago
by
jy395
Calculation of _mscale during YARN RoPE scaling
1
#4 opened 6 months ago
by
sszymczyk
keyError: 'sdpa'
1
#3 opened 6 months ago
by
fengzi258
Smaller Models
1
#2 opened 6 months ago
by
puffy310
KV Cache for compress_kv or key-value states
6
#1 opened 6 months ago
by
House-99