본문으로 건너뛰기

Metrics for LLM inference

Token

  • Input Sequence Length (ISL)
  • Output Sequence Length (OSL)

Latency

  • Time To First Token (TTFT)
    • First Token Latency (FTL)
    • e.g. P90 TTFT < 1 s
  • Time Per Output Token (TPOT)
    • Inter-Token Latency (ITL)
    • Token-to-Token Latency (TTL)
    • Time Between Tokens (TBT)
    • e.g. P90 TPOT < 200 ms
  • End-to-End Latency (E2EL)

Throughput

  • Requests Per Second (RPS)
    • Query Per Second (QPS)
  • Tokens Per Second (TPS)

KV cache

  • GPU KV cache usage