NVIDIA Triton Inference Server에 gRPC 통신하기
안녕하세요. yeTi입니다.
오늘은 NVIDIA Triton Inference Server에 gRPC 통신을 해보려고 합니다.
개요
gRPC는 RPC(Remote Procedure Call)를 Google에서 개발한 RPC 프레임워크입니다.
NVIDIA Triton Inference Server에서 gRPC로 통신을 편하게 할 수 있도록 Client SDK를 제공합니다.
Client SDK 획득
Build Using CMake - Fail
Triton Inference Server - GitHub에 코드를 공유하고 있는데 이를 활용하여 직접 빌드하여 SDK
를 생성하는 방법입니다.
git clone https://github.com/NVIDIA/triton-inference-server.git
에서 Repository
를 clone
을 받아 make
를 수행합니다.
저는 수행하는 과정중에 다음과같이 OpenSSL
을 찾을 수 없다는 오류가 발생했습니다.
CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.10/Modules/FindOpenSSL.cmake:390 (find_package_handle_standard_args)
cmake/ssl.cmake:45 (find_package)
CMakeLists.txt:138 (include)
-- Configuring incomplete, errors occurred!
See also "/home/hshwang/triton-library/triton-inference-server/build/grpc/src/grpc-build/CMakeFiles/CMakeOutput.log".
See also "/home/hshwang/triton-library/triton-inference-server/build/grpc/src/grpc-build/CMakeFiles/CMakeError.log".
CMakeFiles/grpc.dir/build.make:108: recipe for target 'grpc/src/grpc-stamp/grpc-configure' failed
make[2]: *** [grpc/src/grpc-stamp/grpc-configure] Error 1
CMakeFiles/Makefile2:222: recipe for target 'CMakeFiles/grpc.dir/all' failed
make[1]: *** [CMakeFiles/grpc.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
Download From GitHub - Fail
Triton Inference Server Release Page에서 SDK
를 다운로드 받는 방법입니다.
SDK
를 다운로드 받습니다. (wget https://github.com/NVIDIA/triton-inference-server/releases/download/v1.13.0/v1.13.0_ubuntu1804.clients.tar.gz
)
압축을 풀고 (tar xvf v1.13.0_ubuntu1804.clients.tar.gz
)
whl
파일로 client
를 install
합니다. (pip3 install tritongrpcclient-1.13.0-py3-none-linux_x86_64.whl
)
저는 수행중 tritonclientutils
를 찾을 수 없다는 오류가 발생했습니다.
Collecting tritonclientutils (from tritongrpcclient==1.13.0)
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
require_hashes
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
page = self._get_page(location)
File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
resp.raise_for_status()
File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/tritonclientutils/
Download Docker Image From NGC - Success
NVIDIA GPU Cloud (NGC)에서 제공하는 Docker
이미지를 활용하여 Client
를 사용하는 방법입니다.
도커 이미지 다운받아 수행합니다. (docker run -it --rm nvcr.io/nvidia/tritonserver:20.03.1-py3-clientsdk /bin/bash
)
도커 컨테이너에서 tritongrpcclient
의 유무를 확인하면 다음과 같이 tritongrpcclient
가 존재하는 것을 확인할 수 있습니다.
# python
>>> import tritongrpcclient
>>> print(tritongrpcclient.grpc_service_v2_pb2)
SDK
파일들을 host
로 복사하여 SDK
모듈을 추출할 수 있습니다.
docker cp 792a245e866b:/usr/local/lib/python3.6/dist-packages/tritongrpcclient ./tritongrpcclient
docker cp 792a245e866b:/usr/local/lib/python3.6/dist-packages/tritonclientutils ./tritonclientutils
Client SDK 활용
# Import Library
import argparse
import grpc
from triton.tritongrpcclient import grpc_service_v2_pb2
from triton.tritongrpcclient import grpc_service_v2_pb2_grpc
# Create GRPCInferenceServiceStub
parser = argparse.ArgumentParser()
parser.add_argument('-v',
'--verbose',
action="store_true",
required=False,
default=False,
help='Enable verbose output')
parser.add_argument('-u',
'--url',
type=str,
required=False,
default='localhost:8001',
help='Inference server URL. Default is localhost:8001.')
FLAGS = parser.parse_args()
channel = grpc.insecure_channel(FLAGS.url)
grpc_stub = grpc_service_v2_pb2_grpc.GRPCInferenceServiceStub(channel)
# Create ModelInferRequest
model_name = 'model_sk'
request = grpc_service_v2_pb2.ModelInferRequest()
request.model_name = model_name
request.model_version = ""
request.id = model_name+"-id-0"
# Set Extend
raw_data = inputs.flatten().tobytes()
raw_height = np.array([np.int64(height)]).tobytes()
raw_width = np.array([np.int64(width)]).tobytes()
input_data = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_data.name = "data"
input_data.shape.extend([1, 260, 260, 3])
input_data_contents = grpc_service_v2_pb2.InferTensorContents()
input_data_contents.raw_contents = raw_data
input_data.contents.CopyFrom(input_data_contents)
input_height = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_height.name = "height"
input_height.shape.extend([1, 1])
input_height_contents = grpc_service_v2_pb2.InferTensorContents()
input_height_contents.raw_contents = raw_height
input_height.contents.CopyFrom(input_height_contents)
input_width = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_width.name = "width"
input_width.shape.extend([1, 1])
input_width_contents = grpc_service_v2_pb2.InferTensorContents()
input_width_contents.raw_contents = raw_width
input_width.contents.CopyFrom(input_width_contents)
request.inputs.extend([input_data, input_height, input_width])
output_bbox = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_bbox.name = "tf_op_layer_bboxes"
output_class_id = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_class_id.name = "tf_op_layer_class_id"
output_prob = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_prob.name = "tf_op_layer_prob"
request.outputs.extend([output_bbox, output_class_id, output_prob])
# Request Inference
response = grpc_stub.ModelInfer(request)
# Parse Response
bboxes = get_array(response.outputs[0], np.int64)
class_ids = get_array(response.outputs[1], np.int64)
if len(response.outputs[2].contents.raw_contents) < 8:
dtype = np.int32
else:
dtype = np.float
confs = get_array(response.outputs[2], dtype)