Metrics for LLM inference
Token
- Input Sequence Length (ISL)
- Output Sequence Length (OSL)
Latency
- Time To First Token (TTFT)
- First Token Latency (FTL)
- e.g. P90 TTFT < 1 s
- Time Per Output Token (TPOT)
- Inter-Token Latency (ITL)
- Token-to-Token Latency (TTL)
- Time Between Tokens (TBT)
- e.g. P90 TPOT < 200 ms
- End-to-End Latency (E2EL)
Throughput
- Requests Per Second (RPS)
- Query Per Second (QPS)
- Tokens Per Second (TPS)
KV cache
- GPU KV cache usage