CentOS安装GPU版本的tensorflow serving问题总结

之前安装tensorflow serving一直都是使用的docker镜像的方式，简单快速，但是近来有一台gpu物理机需要跑一批模型要求越快越好担心docker镜像的方式不能充分的利用gpu资源（maybe多个docker镜像就可以），所以还是要尝试在gpu物理机上安装tensorflow serving的，整个过程其实比较简单，但是由于网上的教程大多针对特定操作系统版本，所以出了问题想解决还是比较费劲，这里记录一下，方便他人。

首先我安装的机器环境是CentOS 7.2, 硬件环境是Tesla M40 4张卡。
具体步骤不啰嗦，首先

git clone --recurse-submodules https://github.com/tensorflow/serving

这里有个坑就是如此git clone的话默认拉取的是github上的最新master分支，据我实验这个分支经常会有问题，而且网上资料也少，所以第一步建议拉取一个广泛使用的稳定版本比如1.3,就这样执行

git clone -b r1.3 --recurse-submodules https://github.com/tensorflow/serving

之后进入serving/tensorflow目录,运行./configure 配置构建文件，执行后会有一步一步的交互，询问你，根据自己实际情况回答就行了，主要是要注意启用cuda也选yes，并正确配置cuda目录和cudnn的目录。
接着要安装bazel，这个也有点坑,这里如果你使用了等于或者高于6.0的bazel版本，很有可能报这个错误

The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false.

如果出现这个错误，就重新安装bazel5.0 或者5.4的版本，另外版本太低也不支持。
bazel ok之后，就可以退回到serving目录，然后准备build了，这里要注意如果你按照官网的命令build，那么即使里configure的时候指定了cuda，最后编译出来的model server也用不上gpu，这里其实要在build的时候指定cuda，命令如下

bazel build -c opt --config=cuda tensorflow_serving/...

执行这条命令很大概率会报错。

首先如果报这个错

no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool'

解决方法：
– 编辑 tools/bazel.rc 文件，把@org_tensorflow//third_party/gpus/crosstool 改成 @local_config_cuda//crosstool:toolchain
– 执行 bazel clean --expunge && export TF_NEED_CUDA=1
– 执行 bazel query 'kind(rule, @local_config_cuda//...)'

好的继续，如果报这个错

bazel GPU build error with fatal error: external/nccl_archive/src/nccl.h: No such file or directory

解决方法：

git clone https://github.com/NVIDIA/nccl.git
cd nccl/
make CUDA_HOME=/usr/local/cuda

sudo make install
sudo mkdir -p /usr/local/include/external/nccl_archive/src
sudo ln -s /usr/local/include/nccl.h /usr/local/include/external/nccl_archive/src/nccl.h

搞定之后基本应该没什么问题。

CentOS安装GPU版本的tensorflow serving问题总结

CentOS安装GPU版本的tensorflow serving问题总结

相关

相关推荐

评论 1