Metrics for LLM inference
Token
- Input Sequence Length (ISL)
- Output Sequence Length (OSL)
Latency
- Time To First Token (TTFT)
- First Token Latency (FTL)
- e.g. P90 TTFT < 1 s
- Normalized Time Per Output Token (NTPOT)
- Time Per Output Token (TPOT)
- (E2EL - TTFT) / (OSL - 1)을 의미합니다.
- e.g. P90 TPOT < 200 ms
- Inter-Token Latency (ITL)
- Output Token 사이의 Latency를 의미합니다.
- Token-to-Token Latency (TTL)
- Time Between Tokens (TBT)
- End-to-End Latency (E2EL)
Throughput
- Requests Per Second (RPS)
- Tokens Per Second (TPS)
KV cache