nvidia tritonserver 几个样例 model

NVIDIA Triton Inference Server是高性能AI模型部署的核心工具。对于初学者和希望快速验证部署环境的工程师来说，理解并运行官方提供的样例模型是至关重要的第一步。本文将指导您如何获取官方模型仓库，并部署运行最基础的

identity

模型（用于环境验证）和

1	densenet_onnx

模型（用于实际深度学习推理）。

Contents

1 1. 前置条件与环境准备
- 1.1 1.1 拉取Triton Server镜像
- 1.2 1.2 获取样例模型仓库
2 2. 部署Triton Server并加载样例模型
3 3. 验证Identity模型 (simple_tf)
4 4. 验证DenseNet图像分类模型
5 相关

1. 前置条件与环境准备

确保您的系统已安装Docker和NVIDIA Container Toolkit，以便能够运行GPU加速的容器。

1.1 拉取Triton Server镜像

我们使用官方推荐的最新稳定版本镜像：


1
docker pull nvcr.io/nvidia/tritonserver:24.05-py3

1.2 获取样例模型仓库

Triton的官方GitHub仓库中包含了完整的测试模型和配置。我们需要克隆它，并运行脚本来下载模型权重文件。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 克隆 Triton Server 仓库

git clone https://github.com/triton-inference-server/server.git

cd server



# 运行脚本下载模型权重（可能需要几分钟）

# 注意：这里我们使用L0级别的模型仓库，其中包含最常用的基础模型

./qa/common/fetch_models.sh



# 定义模型仓库路径，方便后续Docker挂载

MODEL_REPO=$(pwd)/qa/L0_e2e/model_repository



# 检查关键模型是否下载成功，例如 simple_tf 和 densenet_onnx

ls ${MODEL_REPO}/simple_tf

ls ${MODEL_REPO}/densenet_onnx

2. 部署Triton Server并加载样例模型

我们将使用Docker将本地的模型仓库挂载到容器内的

/models

路径下，并启动Triton。


1
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 \n    -v ${MODEL_REPO}:/models \n    nvcr.io/nvidia/tritonserver:24.05-py3 tritonserver --model-repository=/models

成功启动后，您应该在日志中看到如下信息，表示模型加载成功：


1
2
I0610 08:30:15.123456 TRITON | ... successfully loaded 'simple_tf'

I0610 08:30:15.678901 TRITON | ... successfully loaded 'densenet_onnx'

3. 验证Identity模型 (simple_tf)

simple_tf

是一个简单的TensorFlow模型，它接受一个4元素的整数数组，并原样返回。它常用于快速验证Triton服务是否正常工作。

首先，安装Python客户端库：


1
pip install tritonclient[http]

然后，运行以下Python代码进行推理：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import tritonclient.http as httpclient

import numpy as np



TRITON_SERVER_URL = "localhost:8000"

MODEL_NAME = "simple_tf"



def run_identity_inference():

    try:

        client = httpclient.InferenceServerClient(url=TRITON_SERVER_URL)

    except Exception as e:

        print(f"连接Triton失败: {e}")

        return



    # 1. 准备输入数据 (4个int32元素)

    input_data = np.array([[10, 20, 30, 40]], dtype=np.int32)



    # 2. 配置输入

    # 模型的输入名称是 INPUT0，定义在 model_repository/simple_tf/config.pbtxt 中

    triton_input = httpclient.InferInput('INPUT0', input_data.shape, "INT32")

    triton_input.set_data_from_numpy(input_data)



    # 3. 配置输出 (输出名称是 OUTPUT0)

    triton_output = httpclient.InferRequestedOutput('OUTPUT0')



    print(f"发送数据: {input_data}")

    # 4. 发送推理请求

    response = client.infer(

        model_name=MODEL_NAME,

        inputs=[triton_input],

        outputs=[triton_output]

    )



    # 5. 获取结果

    output_data = response.as_numpy('OUTPUT0')

    print("\n--- 推理结果 ---")

    print(f"接收结果: {output_data}")

    assert np.array_equal(input_data, output_data)

    print("验证成功：输入和输出匹配！")



if __name__ == '__main__':

    run_identity_inference()

4. 验证DenseNet图像分类模型

1	densenet_onnx

是一个实际的图像分类模型，它演示了Triton如何处理ONNX格式的深度学习模型。该模型期望输入一个

3x224x224

的浮点张量。

我们可以模拟一个随机图像输入来验证模型是否能够正常执行推理。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import tritonclient.http as httpclient

import numpy as np



TRITON_SERVER_URL = "localhost:8000"

MODEL_NAME = "densenet_onnx"



def run_densenet_inference():

    try:

        client = httpclient.InferenceServerClient(url=TRITON_SERVER_URL)

    except Exception as e:

        print(f"连接Triton失败: {e}")

        return



    # DenseNet 输入要求：(1, 3, 224, 224), dtype=FP32

    input_shape = (1, 3, 224, 224)



    # 生成随机输入数据（模拟图像预处理后的数据）

    input_data = np.random.rand(*input_shape).astype(np.float32)



    # 配置输入 (输入名称是 input.1)

    triton_input = httpclient.InferInput('input.1', input_data.shape, "FP32")

    triton_input.set_data_from_numpy(input_data)



    # 配置输出 (输出名称是 146)

    triton_output = httpclient.InferRequestedOutput('146')



    print(f"发送随机图像数据 ({input_shape})...")

    # 发送推理请求

    response = client.infer(

        model_name=MODEL_NAME,

        inputs=[triton_input],

        outputs=[triton_output]

    )



    # 获取结果

    output_data = response.as_numpy('146')



    print("\n--- DenseNet 推理结果 ---")

    print(f"输出形状: {output_data.shape}")

    print(f"最高置信度结果索引: {np.argmax(output_data)}")

    print("DenseNet模型推理验证成功！")



if __name__ == '__main__':

    run_densenet_inference()

通过运行这些样例，您不仅验证了Triton Server环境的GPU/CPU和网络连接，还掌握了如何配置和使用

1	tritonclient

与不同框架（TensorFlow和ONNX）的模型进行交互。

nvidia tritonserver 几个样例 model

1. 前置条件与环境准备

1.1 拉取Triton Server镜像

1.2 获取样例模型仓库

2. 部署Triton Server并加载样例模型

3. 验证Identity模型 (simple_tf)

4. 验证DenseNet图像分类模型

相关

相关推荐

评论抢沙发

1. 前置条件与环境准备

1.1 拉取Triton Server镜像

1.2 获取样例模型仓库

2. 部署Triton Server并加载样例模型

3. 验证Identity模型 (simple_tf)

4. 验证DenseNet图像分类模型

相关

相关推荐

评论 抢沙发

评论抢沙发