2021年9月27日 星期一

TensorRT custom ONNX model c++

This article is intent to describe how to run custom ONNX model with TensorRT. By modify TensorRT sample code sampleOnnxMnist to make it happen.

Hardware - nVidia Jetson NX Xavier
Software - Jetpack 4.6 / TensorRT 8
TensorRT sample - /usr/src/tensorrt/samples/sampleOnnxMNIST
ONNX model - https://github.com/PINTO0309/PINTO_model_zoo/tree/main/081_MiDaS_v2

The sampleOnnxMNIST is to detect hand write numbers from 0 ~ 9. I will make some change on this sample to get it work with MiDasV2 depth inference.

1.MiDasV2 Model

get PINTO_model_zoo and download MiDasV2 ONNX model

git clone https://github.com/PINTO0309/PINTO_model_zoo.git
cd PINTO_model_zoo/081_MiDaS_v2
./download_256x256.sh
cd

After successful downloading, file PINTO_model_zoo/081_MiDaS_v2/saved_model/model_float32.onnx is the custom ONNX model.

Now we need to know the input and output dimensions of the model. A tool netron will help it.

pip install netron
export PATH=$PATH:${HOME}/.local/bin
netron PINTO_model_zoo/081_MiDaS_v2/saved_model/model_float32.onnx

The netron will display all layers of the model on browser. Open url localhost:8080 from browser


Now we know the input layer name is inputs:0 and it's dimension is 1 x 256 x 256 x 3
Since the model requires an image input, so I guess the four dimension means batch x height x width x channel.


Go to bottom of the page. the output name is Identity:0 and it's dimension is 1 x 256 x 256. Since the model output depth map, so I guess the three dimension means batch x height x width

That's all we need to know about the model.

2.sample code

sudo -s

Copy ONNX model file to tensorrt sample folder

mkdir /usr/src/tensorrt/data/midas
cp PINTO_model_zoo/081_MiDaS_v2/saved_model/model_float32.onnx /usr/src/tensorrt/data/midas/

Copy source image 

cp PINTO_model_zoo/081_MiDaS_v2/openvino/midasv2_small_256x256/FP16/dog.jpg /usr/src/tensorrt/bin

Create new sample from sampleOnnxMNIST

cd /usr/src/tensorrt/samples
cp -a sampleOnnxMNIST sampleOnnxMiDasV2
cd sampleOnnxMiDasV2
mv sampleOnnxMNIST.cpp sampleOnnxMiDasV2.cpp

Modify Makefile

--- ../sampleOnnxMNIST/Makefile 2021-06-26 08:17:31.000000000 +0800
+++ Makefile 2021-09-27 17:10:13.212404761 +0800
@@ -1,6 +1,8 @@
-OUTNAME_RELEASE = sample_onnx_mnist
-OUTNAME_DEBUG   = sample_onnx_mnist_debug
+OUTNAME_RELEASE = sample_onnx_midasv2
+OUTNAME_DEBUG   = sample_onnx_midasv2_debug
 EXTRA_DIRECTORIES = ../common
 SAMPLE_DIR_NAME = $(shell basename $(dir $(abspath $(firstword $(MAKEFILE_LIST)))))
+COMMON_FLAGS = -I/usr/include/opencv4/opencv -I/usr/include/opencv4
+EXTRA_LIBS = -L/usr/lib/aarch64-linux-gnu/ -lopencv_dnn -lopencv_gapi -lopencv_highgui -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_video -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lopencv_core
 MAKEFILE ?= ../Makefile.config
 include $(MAKEFILE)

Modify ../Makefile.config to get opencv correctly linked

$(OUTDIR)/$(OUTNAME_RELEASE) : $(OBJS) $(CUOBJS)
        $(ECHO) Linking: $@
-         $(AT)$(CC) -o $@ $(LFLAGS) -Wl,--start-group $(LIBS) $^ -Wl,--end-group
+        $(AT)$(CC) -o $@ $(LFLAGS) -Wl,--start-group $(LIBS) $^ -Wl,--end-group $(EXTRA_LIBS)

$(OUTDIR)/$(OUTNAME_DEBUG) : $(DOBJS) $(CUDOBJS)
        $(ECHO) Linking: $@
-        $(AT)$(CC) -o $@ $(LFLAGSD) -Wl,--start-group $(DLIBS) $^ -Wl,--end-group
+        $(AT)$(CC) -o $@ $(LFLAGSD) -Wl,--start-group $(DLIBS) $^ -Wl,--end-group $(EXTRA_LIBS)

The whole story is to read dog.jpg as input of depth inference and display image of dog.jpg and depth map of result on screen

Source code of sampleOnnxMiDadV2.cpp 

3.Build & run

make
cd ../../bin
./sample_onnx_midasv2




4.Diff from sampleOnnxMNIST.cpp

--- ../sampleOnnxMNIST/sampleOnnxMNIST.cpp 2021-06-26 08:17:31.000000000 +0800
+++ sampleOnnxMiDasV2.cpp 2021-09-27 16:49:44.045143887 +0800
@@ -15,11 +15,11 @@
  */
 
 //!
-//! sampleOnnxMNIST.cpp
-//! This file contains the implementation of the ONNX MNIST sample. It creates the network using
-//! the MNIST onnx model.
+//! sampleOnnxMiDasV2.cpp
+//! This file contains the implementation of the ONNX MiDasV2 sample. It creates the network using
+//! the MiDasV2 onnx model.
 //! It can be run with the following command line:
-//! Command: ./sample_onnx_mnist [-h or --help] [-d=/path/to/data/dir or --datadir=/path/to/data/dir]
+//! Command: ./sample_onnx_MiDasV2 [-h or --help] [-d=/path/to/data/dir or --datadir=/path/to/data/dir]
 //! [--useDLACore=<int>]
 //!
 
@@ -37,18 +37,21 @@
 #include <iostream>
 #include <sstream>
 
+#include <opencv2/opencv.hpp>
+
+
 using samplesCommon::SampleUniquePtr;
 
-const std::string gSampleName = "TensorRT.sample_onnx_mnist";
+const std::string gSampleName = "TensorRT.sample_onnx_midas";
 
-//! \brief  The SampleOnnxMNIST class implements the ONNX MNIST sample
+//! \brief  The SampleOnnxMiDasV2 class implements the ONNX MiDasV2 sample
 //!
 //! \details It creates the network using an ONNX model
 //!
-class SampleOnnxMNIST
+class SampleOnnxMiDasV2
 {
 public:
-    SampleOnnxMNIST(const samplesCommon::OnnxSampleParams& params)
+    SampleOnnxMiDasV2(const samplesCommon::OnnxSampleParams& params)
         : mParams(params)
         , mEngine(nullptr)
     {
@@ -74,7 +77,7 @@
     std::shared_ptr<nvinfer1::ICudaEngine> mEngine; //!< The TensorRT engine used to run the network
 
     //!
-    //! \brief Parses an ONNX model for MNIST and creates a TensorRT network
+    //! \brief Parses an ONNX model for MiDasV2 and creates a TensorRT network
     //!
     bool constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
         SampleUniquePtr<nvinfer1::INetworkDefinition>& network, SampleUniquePtr<nvinfer1::IBuilderConfig>& config,
@@ -83,23 +86,23 @@
     //!
     //! \brief Reads the input  and stores the result in a managed buffer
     //!
-    bool processInput(const samplesCommon::BufferManager& buffers);
+    bool processInput(const samplesCommon::BufferManager& buffers, cv::Mat & image);
 
     //!
     //! \brief Classifies digits and verify result
     //!
-    bool verifyOutput(const samplesCommon::BufferManager& buffers);
+    bool verifyOutput(const samplesCommon::BufferManager& buffers, cv::Mat & originImage);
 };
 
 //!
 //! \brief Creates the network, configures the builder and creates the network engine
 //!
-//! \details This function creates the Onnx MNIST network by parsing the Onnx model and builds
-//!          the engine that will be used to run MNIST (mEngine)
+//! \details This function creates the Onnx MiDasV2 network by parsing the Onnx model and builds
+//!          the engine that will be used to run MiDasV2 (mEngine)
 //!
 //! \return Returns true if the engine was created successfully and false otherwise
 //!
-bool SampleOnnxMNIST::build()
+bool SampleOnnxMiDasV2::build()
 {
     auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
     if (!builder)
@@ -162,24 +165,24 @@
 
     ASSERT(network->getNbInputs() == 1);
     mInputDims = network->getInput(0)->getDimensions();
-    ASSERT(mInputDims.nbDims == 4);
+    ASSERT(mInputDims.nbDims == 4); // Input is 1 x 256 x 256 x 3 
 
     ASSERT(network->getNbOutputs() == 1);
     mOutputDims = network->getOutput(0)->getDimensions();
-    ASSERT(mOutputDims.nbDims == 2);
+    ASSERT(mOutputDims.nbDims == 3); // Output is 1 x 256 x 256
 
     return true;
 }
 
 //!
-//! \brief Uses a ONNX parser to create the Onnx MNIST Network and marks the
+//! \brief Uses a ONNX parser to create the Onnx MiDasV2 Network and marks the
 //!        output layers
 //!
-//! \param network Pointer to the network that will be populated with the Onnx MNIST network
+//! \param network Pointer to the network that will be populated with the Onnx MiDasV2 network
 //!
 //! \param builder Pointer to the engine builder
 //!
-bool SampleOnnxMNIST::constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
+bool SampleOnnxMiDasV2::constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
     SampleUniquePtr<nvinfer1::INetworkDefinition>& network, SampleUniquePtr<nvinfer1::IBuilderConfig>& config,
     SampleUniquePtr<nvonnxparser::IParser>& parser)
 {
@@ -212,9 +215,9 @@
 //! \details This function is the main execution function of the sample. It allocates the buffer,
 //!          sets inputs and executes the engine.
 //!
-bool SampleOnnxMNIST::infer()
+bool SampleOnnxMiDasV2::infer()
 {
-    // Create RAII buffer manager object
+    // Create RAII buffer manager object
     samplesCommon::BufferManager buffers(mEngine);
 
     auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
@@ -222,28 +225,29 @@
     {
         return false;
     }
-
+    cv::Mat image = cv::imread("dog.jpg");
+    if (image.cols == 0 || image.rows == 0)
+    {
+        printf("image is empty\n");
+        return false;
+    }
     // Read the input data into the managed buffers
     ASSERT(mParams.inputTensorNames.size() == 1);
-    if (!processInput(buffers))
+    if (!processInput(buffers, image))
     {
         return false;
     }
-
     // Memcpy from host input buffers to device input buffers
     buffers.copyInputToDevice();
-
     bool status = context->executeV2(buffers.getDeviceBindings().data());
     if (!status)
     {
         return false;
     }
-
     // Memcpy from device output buffers to host output buffers
     buffers.copyOutputToHost();
-
     // Verify results
-    if (!verifyOutput(buffers))
+    if (!verifyOutput(buffers, image))
     {
         return false;
     }
@@ -254,31 +258,30 @@
 //!
 //! \brief Reads the input and stores the result in a managed buffer
 //!
-bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
+bool SampleOnnxMiDasV2::processInput(const samplesCommon::BufferManager& buffers, cv::Mat & image)
 {
-    const int inputH = mInputDims.d[2];
-    const int inputW = mInputDims.d[3];
+    const int inputChannels = mInputDims.d[3];
+    const int inputH = mInputDims.d[1];
+    const int inputW = mInputDims.d[2];
 
-    // Read a random digit file
-    srand(unsigned(time(nullptr)));
-    std::vector<uint8_t> fileData(inputH * inputW);
-    mNumber = rand() % 10;
-    readPGMFile(locateFile(std::to_string(mNumber) + ".pgm", mParams.dataDirs), fileData.data(), inputH, inputW);
-
-    // Print an ascii representation
-    sample::gLogInfo << "Input:" << std::endl;
-    for (int i = 0; i < inputH * inputW; i++)
-    {
-        sample::gLogInfo << (" .:-=+*#%@"[fileData[i] / 26]) << (((i + 1) % inputW) ? "" : "\n");
-    }
-    sample::gLogInfo << std::endl;
+    printf("inputs:0 - %d x %d x %d x %d\n", mInputDims.d[0], mInputDims.d[1], mInputDims.d[2], mInputDims.d[3]);
 
-    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
-    for (int i = 0; i < inputH * inputW; i++)
-    {
-        hostDataBuffer[i] = 1.0 - float(fileData[i] / 255.0);
-    }
+    cv::Mat resized_image;
+    cv::resize(image, resized_image, cv::Size(inputW, inputH));
 
+    int batchIndex = 0;
+    int batchOffset = batchIndex * inputW * inputH * inputChannels;
+    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
+    // input shape [B,H,W,C]
+    // inputs:0 - 1 x 256 x 256 x 3
+        for (size_t h = 0; h < inputH; h++) {
+            for (size_t w = 0; w < inputW; w++) {
+ for (size_t c = 0; c < inputChannels; c++) {
+                hostDataBuffer[batchOffset + (h * inputW + w) * inputChannels + c] =
+                    float(float(resized_image.at<cv::Vec3b>(h, w)[c]) / 255.0); // Division 255.0 is to convert uint8_t color to float_t
+ }
+            }
+        }
     return true;
 }
 
@@ -287,39 +290,27 @@
 //!
 //! \return whether the classification output matches expectations
 //!
-bool SampleOnnxMNIST::verifyOutput(const samplesCommon::BufferManager& buffers)
+bool SampleOnnxMiDasV2::verifyOutput(const samplesCommon::BufferManager& buffers, cv::Mat & originImage )
 {
-    const int outputSize = mOutputDims.d[1];
     float* output = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));
-    float val{0.0f};
-    int idx{0};
-
-    // Calculate Softmax
-    float sum{0.0f};
-    for (int i = 0; i < outputSize; i++)
-    {
-        output[i] = exp(output[i]);
-        sum += output[i];
-    }
-
-    sample::gLogInfo << "Output:" << std::endl;
-    for (int i = 0; i < outputSize; i++)
-    {
-        output[i] /= sum;
-        val = std::max(val, output[i]);
-        if (val == output[i])
-        {
-            idx = i;
-        }
-
-        sample::gLogInfo << " Prob " << i << "  " << std::fixed << std::setw(5) << std::setprecision(4) << output[i]
-                         << " "
-                         << "Class " << i << ": " << std::string(int(std::floor(output[i] * 10 + 0.5f)), '*')
-                         << std::endl;
-    }
-    sample::gLogInfo << std::endl;
-
-    return idx == mNumber && val > 0.9f;
+    const int output0_row = mOutputDims.d[1];
+    const int output0_col = mOutputDims.d[2];
+    
+    printf("Identity:0 - %d x %d x %d\n", mOutputDims.d[0], mOutputDims.d[1], mOutputDims.d[2]);
+    
+    cv::Mat image = cv::Mat::zeros(cv::Size(output0_row, output0_col), CV_8U);
+    for (int row = 0; row < output0_row; row++) {
+    for (int col = 0;col < output0_col; col++) {
+        image.at<uint8_t>(row, col) = (uint8_t)(*(output + (row * output0_col) + col) / 8);
+    }
+    }
+    
+    cv::imshow("img", image);
+    cv::imshow("orgimg", originImage);
+    int key = cv::waitKey(0);
+    cv::destroyAllWindows();
+    
+ return true;
 }
 
 //!
@@ -330,16 +321,15 @@
     samplesCommon::OnnxSampleParams params;
     if (args.dataDirs.empty()) //!< Use default directories if user hasn't provided directory paths
     {
-        params.dataDirs.push_back("data/mnist/");
-        params.dataDirs.push_back("data/samples/mnist/");
+        params.dataDirs.push_back("data/midas/");
     }
     else //!< Use the data directory provided by the user
     {
         params.dataDirs = args.dataDirs;
     }
-    params.onnxFileName = "mnist.onnx";
-    params.inputTensorNames.push_back("Input3");
-    params.outputTensorNames.push_back("Plus214_Output_0");
+    params.onnxFileName = "model_float32.onnx";
+    params.inputTensorNames.push_back("inputs:0");
+    params.outputTensorNames.push_back("Identity:0");
     params.dlaCore = args.useDLACore;
     params.int8 = args.runInInt8;
     params.fp16 = args.runInFp16;
@@ -353,12 +343,12 @@
 void printHelpInfo()
 {
     std::cout
-        << "Usage: ./sample_onnx_mnist [-h or --help] [-d or --datadir=<path to data directory>] [--useDLACore=<int>]"
+        << "Usage: ./sample_onnx_MiDasV2 [-h or --help] [-d or --datadir=<path to data directory>] [--useDLACore=<int>]"
         << std::endl;
     std::cout << "--help          Display help information" << std::endl;
     std::cout << "--datadir       Specify path to a data directory, overriding the default. This option can be used "
                  "multiple times to add multiple directories. If no data directories are given, the default is to use "
-                 "(data/samples/mnist/, data/mnist/)"
+                 "(data/samples/MiDasV2/, data/MiDasV2/)"
               << std::endl;
     std::cout << "--useDLACore=N  Specify a DLA engine for layers that support DLA. Value can range from 0 to n-1, "
                  "where n is the number of DLA engines on the platform."
@@ -387,9 +377,9 @@
 
     sample::gLogger.reportTestStart(sampleTest);
 
-    SampleOnnxMNIST sample(initializeSampleParams(args));
+    SampleOnnxMiDasV2 sample(initializeSampleParams(args));
 
-    sample::gLogInfo << "Building and running a GPU inference engine for Onnx MNIST" << std::endl;
+    sample::gLogInfo << "Building and running a GPU inference engine for Onnx MiDasV2" << std::endl;
 
     if (!sample.build())
     {




2021年6月8日 星期二

Jetson nano DeepStream-5.1 YOLOv4

Jetson nano DeepStream-5.1 YOLOv4

Download pytorch-YOLOv4

  1. git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git
  2. cd pytorch-YOLOv4

Download yolov4.cfg & yolov4.weight

  1. wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
  2. wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

Edit yolov4.cfg

  1. [net]
  2. #batch=64
  3. #subdivisions=8
  4. batch=1
  5. subdivisions=1
  6. # Training
  7. #width=512
  8. #height=512
  9. #width=608
  10. #height=608
  11. width=416
  12. height=416

Install torch

  1. pip3 install torch
  2. pip3 install torchvision
  3. pip3 install onnxruntime

Install onnx

  1. sudo apt-get install protobuf-compiler libprotoc-dev
  2. pip3 install onnx -i https://pypi.doubanio.com/simple/

Enable swap

  1. sudo fallocate -l 4.0G /swapfile
  2. sudo chmod 600 /swapfile
  3. sudo mkswap /swapfile
  4. sudo swapon /swapfile

Transform darknet to onnx

  1. export OMP_NUM_THREADS=1
  2. python3 demo_darknet2onnx.py yolov4.cfg yolov4.weights ./data/giraffe.jpg 1

Transform onnx to TensorRT engine

  1. /usr/src/tensorrt/bin/trtexec --onnx=yolov4_1_3_416_416_static.onnx \
  2. --explicitBatch --saveEngine=yolov4_1_3_416_416_fp16.engine \
  3. --workspace=2048 --fp16

Download yolov4_deepstream

  1. cd /opt/nvidia/deepstream/deepstream/sources/
  2. git clone https://github.com/NVIDIA-AI-IOT/yolov4_deepstream.git
  3. cd yolov4_deepstream/deepstream_yolov4

Copy TensorRT engine

  1. cp $HOME/pytorch-YOLOv4/yolov4_1_3_416_416_fp16.engine .

Edit deepstream_app_config_yoloV4.txt

  1. model-engine-file=yolov4_1_3_416_416_fp16.engine

Edit config_infer_primary_yoloV4.txt

  1. model-engine-file=yolov4_1_3_416_416_fp16.engine
  2. network-mode=2

Build

  1. export CUDA_VER=10.0
  2. make -C nvdsinfer_custom_impl_Yolo

Run

  1. unset DISPLAY
  2. rm -rf $HOME/.cache/gstreamer-1.0/registry.aarch64.bin
  1. sudo route add -net 224.0.0.0 netmask 255.0.0.0 wlan9
  1. deepstream-app -c deepstream_app_config_yoloV4.txt

Console output

  1. Unknown or legacy key specified 'is-classifier' for group [property]
  2. 0:00:09.526815556 25328 0x21dc8d0 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_yolov4/yolov4_1_3_416_416_fp16.engine
  3. INFO: [Implicit Engine Info]: layers num: 3
  4. 0 INPUT kFLOAT input 3x416x416
  5. 1 OUTPUT kFLOAT boxes 10647x1x4
  6. 2 OUTPUT kFLOAT confs 10647x80
  7. 0:00:09.527013842 25328 0x21dc8d0 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_yolov4/yolov4_1_3_416_416_fp16.engine
  8. 0:00:09.649444159 25328 0x21dc8d0 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_yolov4/config_infer_primary_yoloV4.txt sucessfully
  9. Runtime commands:
  10. h: Print this help
  11. q: Quit
  12. p: Pause
  13. r: Resume
  14. **PERF: FPS 0 (Avg)
  15. **PERF: 0.00 (0.00)
  16. ** INFO: <bus_callback:181>: Pipeline ready
  17. Opening in BLOCKING MODE
  18. Opening in BLOCKING MODE
  19. NvMMLiteOpen : Block : BlockType = 261
  20. NVMEDIA: Reading vendor.tegra.display-size : status: 6
  21. NvMMLiteBlockCreate : Block : BlockType = 261
  22. ** INFO: <bus_callback:167>: Pipeline running
  23. **PERF: 4.92 (4.82)
  24. **PERF: 4.92 (4.92)
  25. **PERF: 4.92 (4.88)