本文介绍如何部署 TensorFlow Serving 推理服务,并指定队列、GPU资源。
前提条件
- 您已成功安装 CCE GPU Manager 和 CCE AI Job Scheduler 组件,否则云原生 AI 功能将无法使用。
操作步骤示例
这里用 TensorFlow Serving 作为示例,演示如何通过 deployment 部署推理服务。
-
部署 TensorFlow Serving 推理服务
- 指定使用 default 队列:scheduling.volcano.sh/queue-name: default
- 申请 1张GPU卡的50%的算力,10Gi显存
- 调度器指定为 volcano (必须)
参考 yaml 如下:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-demo namespace: default spec: replicas: 1 selector: matchLabels: app: gpu-demo template: metadata: annotations: scheduling.volcano.sh/queue-name: default labels: app: gpu-demo spec: containers: - image: registry.baidubce.com/cce-public/tensorflow-serving:demo-gpu imagePullPolicy: Always name: gpu-demo env: - name: MODEL_NAME value: half_plus_two ports: - containerPort: 8501 resources: limits: cpu: "2" memory: 2Gi baidu.com/v100_32g_cgpu: "1" baidu.com/v100_32g_cgpu_core: "50" baidu.com/v100_32g_cgpu_memory: "10" requests: cpu: "2" memory: 2Gi baidu.com/v100_32g_cgpu: "1" baidu.com/v100_32g_cgpu_core: "50" baidu.com/v100_32g_cgpu_memory: "10" # if gpu core isolation is enabled, set the following preStop hook for graceful shutdown. # `tf_serving_entrypoint.sh` needs to be replaced with the name of your gpu process. lifecycle: preStop: exec: command: ["/bin/sh", "-c", "kill -10 `ps -ef | grep tf_serving_entrypoint.sh |
grep -v grep | awk '{print $2}'` && sleep 1"] dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: volcano
- 执行以下命令,查看任务运行状态
kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE gpu-demo 1/1 1 1 30s kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE
NOMINATED NODE READINESS GATES gpu-demo-65767d67cc-xhdgg 1/1 Running 0 63s
172.23.1.86 192.168.48.8 <none> <none>
- 验证 Tensorflow 推理服务是否可用
# 需替换 <172.23.1.86> 为实际 pod ip
curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://172.23.1.86:8501/v1/models/half_plus_two:predict
# 输出类似如下结果:
{
"predictions": [2.5, 3.0, 4.5]
}
队列使用说明
可通过 annotations 指定队列
annotations:
scheduling.volcano.sh/queue-name: <队列名称>
资源申请说明
单卡独占示例
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
cpu: "4"
memory: 6Gi
多卡独占示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 2 // 2卡
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 2 // limit与request必须一致
cpu: "4"
memory: 6Gi
单卡共享【不进行算力隔离,只有显存隔离】示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
baidu.com/v100_32g_cgpu_memory: 10 // 10GB
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
baidu.com/v100_32g_cgpu_memory: 10
cpu: "4"
memory: 6Gi
单卡共享【同时支持显存隔离和算力隔离】示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
baidu.com/v100_32g_cgpu_core: 50 // 50%, 0.5卡算力
baidu.com/v100_32g_cgpu_memory: 10 // 10GB
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
baidu.com/v100_32g_cgpu_core: 50
baidu.com/v100_32g_cgpu_memory: 10
cpu: "4"
memory: 6Gi
GPU卡类型和资源名称对比关系
目前以下类型的GPU支持显存和算力的共享与隔离:
GPU卡型号 | 资源名称 |
---|---|
Tesla V100-SXM2-16GB | baidu.com/v100_16g_cgpu |
Tesla V100-SXM2-32GB | baidu.com/v100_32g_cgpu |
Tesla T4 | baidu.com/t4_16g_cgpu |