AIMService
AIMService is the primary user-facing resource for deploying inference endpoints. It combines a model image, service template, runtime configuration, and optional HTTP routing to produce a KServe InferenceService with Gateway API integration.
This document describes the AIMService specification, explains how runtime configurations flow into deployed services, describes the template resolution and derivation behavior, and details the routing configuration options.
Specification
The following is a full example specification of the AIMService. The only required field is spec.aimImageName, the others are optional.
apiVersion: aim.silogen.ai/v1alpha1
kind: AIMService
metadata:
name: llama-chat
namespace: ml-team
spec:
aimImageName: meta-llama-3-8b
templateRef: llama-3-8b-latency
runtimeConfigName: team-config
replicas: 2
resources:
limits:
cpu: "6"
memory: 48Gi
requests:
cpu: "3"
memory: 32Gi
overrides:
metric: throughput
precision: fp16
gpuSelector:
count: 2
model: MI300X
routing:
enabled: true
gatewayRef:
name: inference-gateway
namespace: gateways
routeTemplate: "/{.metadata.namespace}/{.metadata.labels['team']}/chat"
Fields
Warning
Model caching is not yet active
| Field | Type | Description |
|---|---|---|
aimImageName |
string | Canonical model identifier that maps to an AIMImage or AIMClusterImage resource. This identifies which model container image to deploy. |
templateRef |
string | Name of an AIMServiceTemplate or AIMClusterServiceTemplate that defines the runtime profile. When omitted, the controller uses the image's defaultServiceTemplate field, falling back to the service name if no default is specified. |
runtimeConfigName |
string | Name of the AIMRuntimeConfig or AIMClusterRuntimeConfig to use for registry credentials, storage defaults, and routing configuration. Defaults to default when omitted. |
replicas |
int32 | Number of inference service replicas to deploy. Defaults to 1. |
resources |
ResourceRequirements | Container resource requirements. When specified, these override template and image defaults. See Resource resolution for the complete precedence order. |
overrides |
AIMServiceOverrides | Template parameter overrides for this service. When specified, the controller creates a derived template incorporating these overrides. See Template derivation for details. |
env |
[]EnvVar | Environment variables for model download authentication (e.g., HuggingFace tokens). Applied when templates are derived from this service. |
imagePullSecrets |
[]LocalObjectReference | Secrets for pulling private container images. Applied when templates are derived from this service. |
cacheModel |
bool | When true, ensures model artifacts are cached before the service starts. Takes effect when the referenced template does not already enable caching. Defaults to false. |
routing |
AIMServiceRouting | HTTP routing configuration. When enabled, the controller creates an HTTPRoute for Gateway API traffic management. See Routing configuration for details. |
AIMServiceOverrides
The overrides field allows customizing template parameters without creating explicit template resources:
| Field | Type | Description |
|---|---|---|
metric |
string | Optimization goal: latency for interactive workloads, throughput for batch processing. |
precision |
string | Numeric precision: auto, fp4, fp8, fp16, fp32, bf16, int4, int8. Lower precision reduces memory usage and increases throughput. |
gpuSelector |
AimGpuSelector | GPU requirements specifying count (number of GPUs per replica) and model (GPU type such as MI300X or MI325X). |
AIMServiceRouting
The routing field controls HTTP exposure through Gateway API:
| Field | Type | Description |
|---|---|---|
enabled |
bool | Enables HTTP routing management. When true, the controller creates an HTTPRoute resource. |
gatewayRef |
ParentReference | Identifies the Gateway parent for the HTTPRoute. Required when routing is enabled. |
annotations |
map[string]string | Annotations to apply to the created HTTPRoute resource. |
routeTemplate |
string | HTTP path template rendered using JSONPath expressions. See Routing templates for details. |
Runtime configuration resolution
Runtime configurations supply credentials, storage defaults, and routing parameters. The resolution process works as follows:
- The controller examines the
runtimeConfigNamefield (defaults todefaultwhen omitted). - It first searches for an
AIMRuntimeConfigin the service's namespace with that name. - If a namespace config is found, it is used exclusively—there is no field-level merging with cluster configs.
- If no namespace config exists, the controller falls back to an
AIMClusterRuntimeConfigwith the same name. - The resolved configuration is recorded in
status.effectiveRuntimeConfigfor audit purposes.
When a service references a config that does not exist at either scope, reconciliation fails and the service enters a Degraded state with condition reason RuntimeConfigMissing until the config is created.
See Runtime Configuration for complete details on the resolution model and available configuration options.
Resource resolution
The controller merges resource requirements from three tiers, with higher tiers taking precedence:
- Service-level:
spec.resourceson the AIMService (highest precedence). - Template-level:
spec.resourceson the resolved AIMServiceTemplate. - Image-level:
spec.resourceson the AIMImage or AIMClusterImage (lowest precedence).
After merging, if GPU resource requests or limits are still unset, the controller populates them from the discovery metadata stored in the template's status (status.profile.metadata.gpu_count). This ensures the resulting KServe InferenceService always requests the appropriate number of GPU devices unless explicitly overridden.
Template resolution and derivation
The template resolution process determines which template configuration the service will use. This process handles explicit template references, default template lookup, and automatic template derivation when overrides are specified.
Resolution process
When a service is created or updated, the controller follows this resolution sequence:
-
Explicit templateRef: If
spec.templateRefis specified, the controller searches for a template with that name (namespace-scoped first, then cluster-scoped). If the template is not found, the service enters aDegradedstate. -
Default template lookup: If
templateRefis omitted, the controller examines the referenced AIMImage (namespace-scoped first, then cluster-scoped) and uses itsdefaultServiceTemplatefield. If no default template is configured, the service enters aDegradedstate with an appropriate error message. -
Override handling: If
spec.overridesis specified, the controller modifies the template name by appending a hash suffix and creates a derived template. See Template derivation below.
The controller enforces explicit configuration: services must either reference an existing template via spec.templateRef or rely on a configured defaultServiceTemplate on the image. There is no automatic template creation unless spec.overrides is specified. This prevents unexpected behavior and ensures clear configuration management.
The resolved template reference, including whether it is namespace-scoped or cluster-scoped, is recorded in status.resolvedTemplate.
Template derivation and overrides
When spec.overrides is specified, the controller automatically creates a derived namespace-scoped template that incorporates the override values. This allows services to customize runtime parameters without manually creating template resources.
The derivation process works as follows:
-
The controller resolves the base template name using the resolution process described above.
-
It computes a hash of the
overridesstructure and appends a suffix like-ovr-2926054fto the base template name. This ensures each unique override combination gets its own template while allowing multiple services with identical overrides to share a derived template. -
The controller searches for an existing template that matches the derived spec. If a match is found (either namespace-scoped or cluster-scoped), the service uses that template.
-
If no match is found, the controller creates a new namespace-scoped template with the derived name. The template is labeled with
app.kubernetes.io/managed-by: aimandaim.silogen.ai/derived-template: "true". -
The derived template undergoes discovery like any other template. Once discovery completes and the template becomes available, the service proceeds with deployment.
Example: A service with templateRef: base-template and overrides: {metric: throughput} might produce a derived template named base-template-ovr-8fa3c921. Multiple services with identical overrides will share this derived template.
Note: Template derivation ensures the final name does not exceed Kubernetes name length limits (63 characters) by truncating the base name if necessary.
Routing configuration
When spec.routing.enabled is true, the controller creates an HTTPRoute resource that forwards traffic through the specified Gateway. The HTTP path prefix is determined by evaluating the route template.
Routing templates
Route templates use JSONPath expressions wrapped in {...} and are rendered against the entire AIMService object. The controller evaluates templates in this precedence order:
spec.routing.routeTemplateon the service (highest precedence).spec.routing.routeTemplatefrom the resolved runtime config.- Default:
/<namespace>/<service-uid>.
During rendering, the controller:
- Evaluates each JSONPath expression (e.g.,
{.metadata.namespace},{.metadata.labels['team']}). - Lowercases and enforces RFC 1123 conventions each path segment.
- Trims duplicate slashes and the trailing slash.
- Validates that the final path is ≤ 200 characters.
If template evaluation fails (invalid JSONPath syntax, missing label/annotation, multi-value result, or path exceeds 200 characters), the service enters a Degraded state with condition reason RouteTemplateInvalid. The controller creates/updates the InferenceService but skips HTTPRoute creation until the template issue is resolved.
Routing template examples
Valid template expressions:
# Namespace-based path
routeTemplate: "/{.metadata.namespace}/{.metadata.name}"
# Label-based path (label must exist)
routeTemplate: "/team/{.metadata.labels['team']}/{.metadata.name}"
# Static path with namespace
routeTemplate: "/inference/{.metadata.namespace}/llm"
Invalid template expressions:
# Field doesn't exist - will degrade service
routeTemplate: "/{.spec.model}"
# Missing label - will degrade service if label absent
routeTemplate: "/{.metadata.labels['nonexistent']}"
The resolved HTTP path is published in status.routing.path for reference. To inspect the generated HTTPRoute, use:
Status
The status field reflects reconciliation progress and provides observability into the service lifecycle:
| Field | Type | Description |
|---|---|---|
status |
enum | High-level lifecycle state: Pending, Starting, Running, Failed, Degraded. |
observedGeneration |
int64 | Most recent generation observed by the controller. |
conditions |
[]Condition | Detailed conditions including Resolved, RuntimeReady, RoutingReady, CacheReady, Ready, Progressing, Failure. |
resolvedRuntimeConfig |
AIMResolvedRuntimeConfig | Reference to the runtime config used (namespace or cluster scope) and a hash of its spec. |
resolvedImage |
AIMResolvedReference | Reference to the AIMImage or AIMClusterImage resolved for this service. |
resolvedTemplate |
AIMServiceResolvedTemplate | Reference to the template used, including its name, namespace (if applicable), scope, and UID. Shows derived template names when overrides are applied. |
routing |
AIMServiceRoutingStatus | Contains the resolved HTTP path when routing is enabled and successfully configured. |
Status conditions
The controller maintains these condition types:
Resolved: True when the image, template, and runtime config have been successfully resolved.CacheReady: True when required model caches are present or cache warming has completed.RuntimeReady: True when the KServe InferenceService is ready to serve traffic.RoutingReady: True when routing is enabled and the HTTPRoute has been successfully created and accepted by the Gateway.Ready: True when all other conditions are satisfied and the service is fully operational.Progressing: True while the controller is actively working toward readiness.Failure: True when a terminal or recoverable failure has occurred.
Example status - runtime config missing
status:
status: Degraded
conditions:
- type: Failure
status: "True"
reason: RuntimeConfigMissing
message: AIMRuntimeConfig "team-config" not found in namespace "ml-team"
- type: RuntimeReady
status: "False"
reason: RuntimeConfigMissing
message: Cannot configure runtime without AIMRuntimeConfig
Example status - route template invalid
status:
status: Degraded
conditions:
- type: Failure
status: "True"
reason: RouteTemplateInvalid
message: 'failed to evaluate route template "{.metadata.labels[''team'']}": label "team" not found'
- type: RuntimeReady
status: "True"
reason: RuntimeReady
message: InferenceService is ready
- type: RoutingReady
status: "False"
reason: RouteTemplateInvalid
message: HTTPRoute creation skipped due to invalid route template
Example status - template not found
status:
status: Degraded
conditions:
- type: Failure
status: "True"
reason: TemplateNotFound
message: 'Template "llama-latency" not found. Create the template or verify the template name.'
- type: Resolved
status: "False"
reason: TemplateNotFound
message: 'Template "llama-latency" not found. Create the template or verify the template name.'
- type: RuntimeReady
status: "False"
reason: TemplateNotFound
message: Referenced template does not exist
- type: Progressing
status: "False"
reason: TemplateNotFound
message: Cannot proceed without template
Example status - no default template configured
status:
status: Degraded
conditions:
- type: Failure
status: "True"
reason: TemplateNotFound
message: 'No template reference specified and no default template found on the image. Provide spec.templateRef or configure the image''s defaultServiceTemplate field.'
- type: Resolved
status: "False"
reason: TemplateNotFound
message: 'No template reference specified and no default template found on the image. Provide spec.templateRef or configure the image''s defaultServiceTemplate field.'
- type: RuntimeReady
status: "False"
reason: TemplateNotFound
message: Referenced template does not exist
Events and debugging
The controller emits Kubernetes events on the AIMService object to provide visibility into reconciliation activities:
| Event Type | Reason | Description |
|---|---|---|
| Normal | RuntimeConfigResolved | Runtime config successfully resolved and applied. |
| Warning | DefaultRuntimeConfigNotFound | The implicit default runtime config was not found (non-fatal). |
| Warning | RuntimeConfigMissing | An explicitly referenced runtime config does not exist (fatal). |
| Warning | RouteTemplateInvalid | Route template evaluation failed. Includes the specific error. |
| Normal | TemplateResolved | Template successfully resolved or created. |
| Normal | InferenceServiceCreated | KServe InferenceService created. |
| Normal | InferenceServiceUpdated | KServe InferenceService updated. |
Debugging commands
Inspect the service status:
kubectl -n <namespace> get aimservice <name> -o yaml
kubectl -n <namespace> describe aimservice <name>
View the InferenceService:
View the HTTPRoute (when routing is enabled):
Check controller logs:
Related documentation
- Runtime Configuration - Details on AIMRuntimeConfig resolution and configuration options
- Images and Templates - Understanding AIMImage and AIMServiceTemplate resources
- Template Caching - Model artifact caching and pre-warming (if available)