잡동사니

NVIDIA Triton Inference Server에 gRPC 통신하기 본문

IT/AI

NVIDIA Triton Inference Server에 gRPC 통신하기

yeTi 2020. 6. 23. 08:58

안녕하세요. yeTi입니다.
오늘은 NVIDIA Triton Inference Server에 gRPC 통신을 해보려고 합니다.

개요

gRPC는 RPC(Remote Procedure Call)를 Google에서 개발한 RPC 프레임워크입니다.

NVIDIA Triton Inference Server에서 gRPC로 통신을 편하게 할 수 있도록 Client SDK를 제공합니다.

Client SDK 획득

Build Using CMake - Fail

Triton Inference Server - GitHub에 코드를 공유하고 있는데 이를 활용하여 직접 빌드하여 SDK를 생성하는 방법입니다.

git clone https://github.com/NVIDIA/triton-inference-server.git에서 Repositoryclone을 받아 make를 수행합니다.

저는 수행하는 과정중에 다음과같이 OpenSSL을 찾을 수 없다는 오류가 발생했습니다.

CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
  OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.10/Modules/FindOpenSSL.cmake:390 (find_package_handle_standard_args)
  cmake/ssl.cmake:45 (find_package)
  CMakeLists.txt:138 (include)


-- Configuring incomplete, errors occurred!
See also "/home/hshwang/triton-library/triton-inference-server/build/grpc/src/grpc-build/CMakeFiles/CMakeOutput.log".
See also "/home/hshwang/triton-library/triton-inference-server/build/grpc/src/grpc-build/CMakeFiles/CMakeError.log".
CMakeFiles/grpc.dir/build.make:108: recipe for target 'grpc/src/grpc-stamp/grpc-configure' failed
make[2]: *** [grpc/src/grpc-stamp/grpc-configure] Error 1
CMakeFiles/Makefile2:222: recipe for target 'CMakeFiles/grpc.dir/all' failed
make[1]: *** [CMakeFiles/grpc.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Download From GitHub - Fail

Triton Inference Server Release Page에서 SDK를 다운로드 받는 방법입니다.

SDK를 다운로드 받습니다. (wget https://github.com/NVIDIA/triton-inference-server/releases/download/v1.13.0/v1.13.0_ubuntu1804.clients.tar.gz)

압축을 풀고 (tar xvf v1.13.0_ubuntu1804.clients.tar.gz)

whl 파일로 clientinstall합니다. (pip3 install tritongrpcclient-1.13.0-py3-none-linux_x86_64.whl)

저는 수행중 tritonclientutils를 찾을 수 없다는 오류가 발생했습니다.

Collecting tritonclientutils (from tritongrpcclient==1.13.0)
Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
    wb.build(autobuilding=True)
  File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
    require_hashes
  File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
    resp.raise_for_status()
  File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/tritonclientutils/

Download Docker Image From NGC - Success

NVIDIA GPU Cloud (NGC)에서 제공하는 Docker 이미지를 활용하여 Client를 사용하는 방법입니다.

도커 이미지 다운받아 수행합니다. (docker run -it --rm nvcr.io/nvidia/tritonserver:20.03.1-py3-clientsdk /bin/bash)

도커 컨테이너에서 tritongrpcclient의 유무를 확인하면 다음과 같이 tritongrpcclient가 존재하는 것을 확인할 수 있습니다.

# python
>>> import tritongrpcclient
>>> print(tritongrpcclient.grpc_service_v2_pb2)

SDK 파일들을 host로 복사하여 SDK 모듈을 추출할 수 있습니다.

docker cp 792a245e866b:/usr/local/lib/python3.6/dist-packages/tritongrpcclient ./tritongrpcclient
docker cp 792a245e866b:/usr/local/lib/python3.6/dist-packages/tritonclientutils ./tritonclientutils

Client SDK 활용

# Import Library
import argparse
import grpc
from triton.tritongrpcclient import grpc_service_v2_pb2
from triton.tritongrpcclient import grpc_service_v2_pb2_grpc

# Create GRPCInferenceServiceStub
parser = argparse.ArgumentParser()
parser.add_argument('-v',
    '--verbose',
    action="store_true",
    required=False,
    default=False,
    help='Enable verbose output')
parser.add_argument('-u',
    '--url',
    type=str,
    required=False,
    default='localhost:8001',
    help='Inference server URL. Default is localhost:8001.')
FLAGS = parser.parse_args()
channel = grpc.insecure_channel(FLAGS.url)
grpc_stub = grpc_service_v2_pb2_grpc.GRPCInferenceServiceStub(channel)

# Create ModelInferRequest
model_name = 'model_sk'
request = grpc_service_v2_pb2.ModelInferRequest()
request.model_name = model_name
request.model_version = ""
request.id = model_name+"-id-0"

# Set Extend
raw_data = inputs.flatten().tobytes()
raw_height = np.array([np.int64(height)]).tobytes()
raw_width = np.array([np.int64(width)]).tobytes()

input_data = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_data.name = "data"
input_data.shape.extend([1, 260, 260, 3])

input_data_contents = grpc_service_v2_pb2.InferTensorContents()
input_data_contents.raw_contents = raw_data
input_data.contents.CopyFrom(input_data_contents)

input_height = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_height.name = "height"
input_height.shape.extend([1, 1])

input_height_contents = grpc_service_v2_pb2.InferTensorContents()
input_height_contents.raw_contents = raw_height
input_height.contents.CopyFrom(input_height_contents)

input_width = grpc_service_v2_pb2.ModelInferRequest().InferInputTensor()
input_width.name = "width"
input_width.shape.extend([1, 1])

input_width_contents = grpc_service_v2_pb2.InferTensorContents()
input_width_contents.raw_contents = raw_width
input_width.contents.CopyFrom(input_width_contents)

request.inputs.extend([input_data, input_height, input_width])

output_bbox = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_bbox.name = "tf_op_layer_bboxes"
output_class_id = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_class_id.name = "tf_op_layer_class_id"
output_prob = grpc_service_v2_pb2.ModelInferRequest().InferRequestedOutputTensor()
output_prob.name = "tf_op_layer_prob"
request.outputs.extend([output_bbox, output_class_id, output_prob])

# Request Inference
response = grpc_stub.ModelInfer(request)

# Parse Response
bboxes = get_array(response.outputs[0], np.int64)
class_ids = get_array(response.outputs[1], np.int64)
if  len(response.outputs[2].contents.raw_contents) < 8:
    dtype = np.int32
else:
    dtype = np.float
confs = get_array(response.outputs[2], dtype)
Comments