AIM - AMD Inference Microservice

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

What AIM Does

AIM abstracts the complexity of inference deployment by providing:

Simple Service Deployment: Deploy inference endpoints with minimal configuration using AIMService resources
Automatic Optimization: Configure workloads for latency or throughput optimization with preset profiles
Model Catalog Management: Maintain a catalog of available models across cluster and namespace scopes
HTTP Routing Integration: Expose services through Gateway API with customizable path templates
Resource Management: Handle GPU allocation, resource requirements, and scaling automatically

Quick Example

Deploy an inference service:

apiVersion: aim.silogen.ai/v1alpha1
kind: AIMService
metadata:
  name: llama-chat
  namespace: ml-team
spec:
  model:
    image: ghcr.io/silogen/aim-meta-llama-llama-3-1-8b-instruct:0.7.0
  replicas: 2
  routing:
    enabled: true
    gatewayRef:
      name: inference-gateway
      namespace: gateways
    pathTemplate: "{.metadata.namespace}/{.metadata.name}"

AIM Engine automatically:

Resolves the model container image
Selects an appropriate runtime configuration
Deploys a KServe InferenceService
Creates HTTP routing through Gateway API via the path ml-team/llama-chat

Documentation

Usage Guides: Practical guides for deploying and configuring inference services
- Services - Deploy and manage inference endpoints
- Runtime Configuration - Configure credentials and settings
Concepts: Deep dive into AIM Engine architecture and internals
- Models - Model catalog and discovery mechanism
- Templates - Runtime profiles and discovery
- Runtime Config - Resolution algorithm and architecture
- Model Caching - Cache lifecycle and deletion behavior

Getting Started

Deploy a service: Start with the Services usage guide to deploy your first inference endpoint
Configure authentication: Set up credentials for private registries using Runtime Configuration
Explore advanced features: Learn about automatic template selection, model caching, and custom routing in the Concepts documentation

Architecture

AIM builds on Kubernetes and KServe to provide:

Declarative API: Define inference services using Kubernetes custom resources
Multi-tenancy: Namespace-scoped and cluster-scoped resources support team isolation
GitOps-friendly: All configuration expressed as YAML for version control and automation
Gateway API Integration: Modern, standards-based HTTP routing

Support

For issues, questions, or contributions, please refer to the main project repository.