In this post, I show how to speed up inference for PyTorch image classification models (from TIMM) using ONNX Runtime and TensorRT optimizations.
Key techniques covered:
- Converting PyTorch models to ONNX format
- Running inference with ONNX Runtime on CPU and GPU
- Leveraging TensorRT for maximum GPU performance
- Baking preprocessing into the ONNX model to eliminate overhead
The tutorial uses Hugging Face TIMM models as an example but can be applied to other PyTorch vision models as well. I've included code examples and benchmarks.