II.
Tool overview
Reference · livetool:triton-inference
Triton Inference Server overview
NVIDIA's open-source inference serving platform that hosts models from TensorRT, ONNX Runtime, PyTorch, TensorFlow, and vLLM backends behind a unified gRPC/HTTP API. Supports dynamic batching, model ensembles, concurrent model execution, and Kubernetes-native deployment with Prometheus metrics out of the box.
Attributes
displayName
Triton Inference Server
homepageUrl
kind
other
description
NVIDIA's open-source inference serving platform that hosts models from
TensorRT, ONNX Runtime, PyTorch, TensorFlow, and vLLM backends behind a
unified gRPC/HTTP API. Supports dynamic batching, model ensembles,
concurrent model execution, and Kubernetes-native deployment with Prometheus
metrics out of the box.
Outgoing edges
alternative_to3
- tool:vllm·ToolvLLM
- tool:tensorrt·ToolTensorRT
- tool:onnx-runtime·ToolONNX Runtime
belongs_to_language1
- language:cpp·LanguageC++
tool_used_by2
- skill-area:model-serving·SkillAreaModel Serving
- skill-area:llm-infrastructure·SkillAreaLLM Infrastructure
used_for2
- skill-area:model-serving·SkillAreaModel Serving
- skill-area:ai-evaluation·SkillAreaAI Evaluation
Incoming edges
alternative_to3
- tool:vllm·ToolvLLM
- tool:tensorrt·ToolTensorRT
- tool:onnx-runtime·ToolONNX Runtime
uses_tool1
- specialization:ml-inference-serving·SpecializationML Inference Serving