II.
LibraryProcess overview
Reference · livelib-process:gpu-programming--ml-inference-optimization
specializations/gpu-programming/ml-inference-optimization overview
Machine Learning Inference Optimization - Workflow for optimizing GPU-accelerated ML model inference for production deployment, covering quantization, batching, and kernel fusion.
Attributes
displayName
specializations/gpu-programming/ml-inference-optimization
description
Machine Learning Inference Optimization - Workflow for optimizing GPU-accelerated ML model
inference for production deployment, covering quantization, batching, and kernel fusion.
libraryPath
library/specializations/gpu-programming/ml-inference-optimization.js
specialization
gpu-programming
references
- - TensorRT Documentation: https://docs.nvidia.com/deeplearning/tensorrt/
- - ONNX Runtime: https://onnxruntime.ai/
- - Quantization: https://pytorch.org/docs/stable/quantization.html
example
const result = await orchestrate('specializations/gpu-programming/ml-inference-optimization', {
modelName: 'resnet50',
framework: 'pytorch',
targetLatency: 5,
quantization: 'int8'
});
usesAgents
- ml-inference-optimizer
Outgoing edges
lib_applies_to_domain1
- domain:scientific-computing·DomainScientific Computing
lib_belongs_to_specialization1
- specialization:gpu-programming·Specialization
lib_implements_workflow1
- workflow:ml-model-lifecycle·WorkflowML Model Lifecycle
lib_involves_role2
- role:computational-scientist·RoleComputational Scientist
- role:ml-engineer·RoleMachine Learning Engineer
lib_requires_skill_area2
- skill-area:cuda-kernels·SkillAreaCUDA Kernel Programming
- skill-area:compute-shaders·SkillAreaCompute Shaders
uses_agent1
- lib-agent:gpu-programming--ml-inference-optimizer·LibraryAgentml-inference-optimizer
Incoming edges
None.