II.
SkillArea overview
Reference · liveskill-area:model-serving
Model Serving overview
Deploying and operating machine learning models in production — inference servers, batching strategies, hardware-aware optimization, autoscaling, and low-latency endpoint design for ML and LLM workloads.
Attributes
displayName
Model Serving
description
Deploying and operating machine learning models in production — inference
servers, batching strategies, hardware-aware optimization, autoscaling,
and low-latency endpoint design for ML and LLM workloads.
domains
expertiseLevels
- intermediate
- expert
Outgoing edges
applies_to1
- domain:ml-ops·DomainMLOps
prerequisite_for_learning4
- skill-area:llm-infrastructure·SkillAreaLLM Infrastructure
- skill-area:model-serving-deployment·SkillAreaModel Serving and Deployment
- skill-area:model-serving-operations·SkillAreaModel Serving
- skill-area:model-optimisation·SkillAreaModel Optimisation
Incoming edges
contains1
- specialization:recommendation-infrastructure·Specialization
prerequisite_for_learning1
- skill-area:model-evaluation·SkillAreaModel Evaluation & Selection
requires_expertise2
- responsibility:inference-latency-sla·ResponsibilityInference latency SLA
- role:machine-learning-ops-engineer·RoleMachine Learning Ops Engineer
tool_used_by4
- tool:vllm·ToolvLLM
- tool:tensorrt·ToolTensorRT
- tool:triton-inference·ToolTriton Inference Server
- tool:onnx-runtime·ToolONNX Runtime
used_for6
- tool:mlflow·ToolMLflow
- tool:bentoml·ToolBentoML
- tool:vllm·ToolvLLM
- tool:tensorrt·ToolTensorRT
- tool:triton-inference·ToolTriton Inference Server
- tool:onnx-runtime·ToolONNX Runtime