Agentic AI Atlas

II.

SkillArea overview

skill-area:model-serving

Reference · live

Model Serving overview

Deploying and operating machine learning models in production — inference servers, batching strategies, hardware-aware optimization, autoscaling, and low-latency endpoint design for ML and LLM workloads.

SkillAreaOutgoing · 5Incoming · 14

Attributes

displayName

Model Serving

description

domains

domain:ml-ops

expertiseLevels

intermediate
expert

Outgoing edges

applies_to1

domain:ml-ops·DomainMLOps

prerequisite_for_learning4

skill-area:llm-infrastructure·SkillAreaLLM Infrastructure
skill-area:model-serving-deployment·SkillAreaModel Serving and Deployment
skill-area:model-serving-operations·SkillAreaModel Serving
skill-area:model-optimisation·SkillAreaModel Optimisation

Incoming edges

contains1

specialization:recommendation-infrastructure·Specialization

prerequisite_for_learning1

skill-area:model-evaluation·SkillAreaModel Evaluation & Selection

requires_expertise2

responsibility:inference-latency-sla·ResponsibilityInference latency SLA
role:machine-learning-ops-engineer·RoleMachine Learning Ops Engineer

tool_used_by4

tool:vllm·ToolvLLM
tool:tensorrt·ToolTensorRT
tool:triton-inference·ToolTriton Inference Server
tool:onnx-runtime·ToolONNX Runtime

used_for6

tool:mlflow·ToolMLflow
tool:bentoml·ToolBentoML
tool:vllm·ToolvLLM
tool:tensorrt·ToolTensorRT
tool:triton-inference·ToolTriton Inference Server
tool:onnx-runtime·ToolONNX Runtime