본문으로 건너뛰기

AMD Kubernetes Device Plugin

amd.com/gpu 자원

  • Node에 AMD GPU 드라이버가 설치되어 있어야 합니다.
  • Node의 capacity에 amd.com/gpu 자원이 표시되기 위해서는 amd-device-plugin을 설치해야합니다.
  • Pod에 amd.com/gpu 자원을 설정하면 GPU 지원 컨테이너로 스케줄링됩니다.
    • amd-smi 명령어를 통해 GPU 정보를 확인할 수 있습니다.

Host 준비

GPU 드라이버 설치

sudo mkdir -p --mode=0755 /etc/apt/keyrings
curl -sL https://repo.radeon.com/rocm/rocm.gpg.key \
| gpg --dearmor \
| sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
export ROCM_VERSION=6.4
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu jammy main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
sudo apt install amdgpu-dkms
sudo reboot

IOMMU

설치

helm repo add amd-gpu https://rocm.github.io/k8s-device-plugin/
helm repo update amd-gpu \
&& helm search repo amd-gpu/amd-gpu -l | head -n 10
helm pull amd-gpu/amd-gpu --version 0.14.0
helm show values amd-gpu/amd-gpu --version 0.14.0 > amd-gpu-0.14.0.yaml
amd-gpu-values.yaml
labeller:
enabled: true
helm template amd-gpu amd-gpu/amd-gpu \
--version 0.14.0 \
-n kube-system \
-f amd-gpu-values.yaml \
> amd-gpu.yaml
helm upgrade amd-gpu amd-gpu-0.14.0.tgz \
--install \
--history-max 5 \
-n kube-system \
-f amd-gpu-values.yaml

ROCm 컨테이너

FROM ubuntu:22.04

ARG ROCM_VERSION=6.4

RUN echo -e "Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600\n" \
| tee /etc/apt/preferences.d/rocm-pin-600

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
libnuma-dev \
gnupg \
&& rm -rf /var/lib/apt/lists/*

# rocm = rocm-dev ∪ rocm-libs
RUN mkdir -p --mode=0755 /etc/apt/keyrings \
&& curl -sL https://repo.radeon.com/rocm/rocm.gpg.key \
| gpg --dearmor \
| tee /etc/apt/keyrings/rocm.gpg > /dev/null \
&& echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu jammy main" \
| tee /etc/apt/sources.list.d/amdgpu.list \
&& echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/${ROCM_VERSION} jammy main" \
| tee /etc/apt/sources.list.d/rocm.list \
&& \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
rocm-dev \
&& rm -rf /var/lib/apt/lists/*

RUN groupadd -g 109 render