II.
Tool overview
Reference · livetool:tensorrt
TensorRT overview
NVIDIA's SDK for high-performance deep-learning inference on GPU. Optimises trained models through layer fusion, precision calibration (FP16/INT8), and kernel auto-tuning to minimise latency and maximise throughput. Integrates with PyTorch via torch-tensorrt, ONNX import, and the Triton Inference Server backend ecosystem.
Attributes
displayName
TensorRT
homepageUrl
kind
other
description
NVIDIA's SDK for high-performance deep-learning inference on GPU. Optimises
trained models through layer fusion, precision calibration (FP16/INT8),
and kernel auto-tuning to minimise latency and maximise throughput.
Integrates with PyTorch via torch-tensorrt, ONNX import, and the Triton
Inference Server backend ecosystem.
Outgoing edges
alternative_to3
- tool:vllm·ToolvLLM
- tool:triton-inference·ToolTriton Inference Server
- tool:onnx-runtime·ToolONNX Runtime
belongs_to_language1
- language:cpp·LanguageC++
tool_used_by2
- skill-area:model-serving·SkillAreaModel Serving
- skill-area:model-optimisation·SkillAreaModel Optimisation
used_for2
- skill-area:model-serving·SkillAreaModel Serving
- skill-area:ai-evaluation·SkillAreaAI Evaluation
Incoming edges
alternative_to3
- tool:vllm·ToolvLLM
- tool:triton-inference·ToolTriton Inference Server
- tool:onnx-runtime·ToolONNX Runtime
uses_tool1
- specialization:gpu-programming·Specialization