본문으로 건너뛰기

KV Cache

KV Cache

loading...

KV Cache

KV Cache size

ModelKV size per token (FP16)Context LengthMax KV size
meta-llama/Llama-3.2-1B-Instruct32 KiB1280004 GiB
meta-llama/Llama-3.1-8B-Instruct128 KiB12800016 GiB
meta-llama/Llama-3.3-70B-Instruct320 KiB12800040 GiB
Qwen/Qwen3-32B256 KiB32768 (131072 with YaRN)8 GiB (32 GiB with YaRN)
deepseek-ai/DeepSeek-R168.6 KiB1280008.38 GiB
  • meta-llama/Llama-3.2-1B-Instruct
    • Layers: 16
    • KV Heads: 8
    • Head Dimension: 64
    • Total Elements: 16384 (Layers * KV Heads * Head Dimension * 2)
  • meta-llama/Llama-3.1-8B-Instruct
    • Layers: 32
    • KV Heads: 8
    • Head Dimension: 128
    • Total Elements: 65536 (Layers * KV Heads * Head Dimension * 2)
  • meta-llama/Llama-3.3-70B-Instruct
    • Layers: 80
    • KV Heads: 8
    • Head Dimension: 128
    • Total Elements: 163840 (Layers * KV Heads * Head Dimension * 2)
  • Qwen/Qwen3-32B
    • Layers: 64
    • KV Heads: 8
    • Head Dimension: 128
    • Total Elements: 131072 (Layers * KV Heads * Head Dimension * 2)
  • deepseek-ai/DeepSeek-R1
    • Layers: 61
    • KV LoRA Rank: 512
    • QK RoPE Head Dimension: 64
    • Total Elements: 35136 (Layers * (KV LoRA Rank + QK RoPE Head Dimension))