API Reference

Packages

config.kaiwo.silogen.ai/v1alpha1

config.kaiwo.silogen.ai/v1alpha1

Package v1alpha1 contains API Schema definitions for the kaiwo configuration v1alpha1 API group.

Resource Types

KaiwoConfig
KaiwoConfigList

KaiwoConfig

KaiwoConfig manages the Kaiwo operator's configuration which can be modified during runtime.

Appears in: - KaiwoConfigList

Field	Description	Default	Validation
`apiVersion` string	`config.kaiwo.silogen.ai/v1alpha1`
`kind` string	`KaiwoConfig`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` KaiwoConfigSpec	Spec defines the desired state for the Kaiwo operator configuration.

KaiwoConfigList

KaiwoConfigList contains a list of KaiwoConfig resources.

Field	Description	Default	Validation
`apiVersion` string	`config.kaiwo.silogen.ai/v1alpha1`
`kind` string	`KaiwoConfigList`
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` KaiwoConfig array

KaiwoConfigSpec

KaiwoConfigSpec defines the desired configuration for the Kaiwo operator's configuration. There should typically be only one KaiwoConfig resource in the cluster.

Appears in: - KaiwoConfig

Field	Description	Default
`ray` KaiwoRayConfig	Ray defines the Ray-specific settings	{ }
`data` KaiwoStorageConfig	Storage defines the storage-specific settings	{ }
`nodes` KaiwoNodeConfig	Nodes defines the node configuration settings	{ }
`scheduling` KaiwoSchedulingConfig	Scheduling contains the configuration Kaiwo uses for workload scheduling	{ }
`resourceMonitoring` KaiwoResourceMonitoringConfig	ResourceMonitoring defines the resource-monitoring specific settings	{ }
`defaultClusterQueueName` string	DefaultClusterQueueName is the name of the default cluster queue that is used for workloads that don't explicitly specify a cluster queue.	kaiwo
`defaultClusterQueueCohortName` string	DefaultClusterQueueCohortName is the name of the default cohort that is used for the default cluster queue. ClusterQueues in the same cohort can share resources.	kaiwo
`dynamicallyUpdateDefaultClusterQueue` boolean	DynamicallyUpdateDefaultClusterQueue defines whether the Kaiwo operator should dynamically update default "kaiwo" clusterqueue. If set to true, the operator will make sure that the default clusterqueue is always up to date and reflects total resources available. If nodes are added or removed, the operator will update the default clusterqueue to reflect the current state of the cluster.	false

KaiwoNodeConfig

Appears in: - KaiwoConfigSpec

Field	Description	Default
`defaultGpuResourceKey` string	DefaultGpuResourceKey defines the default GPU resource key that is used to reserve GPU capacity for pods	amd.com/gpu
`defaultGpuTaintKey` string	DefaultGpuTaintKey is the key that is used to taint GPU nodes	kaiwo.silogen.ai/gpu
`excludeMasterNodesFromNodePools` boolean	ExcludeMasterNodesFromNodePools allows excluding the master node(s) from the node pools	false
`addTaintsToGpuNodes` boolean	AddTaintsToGpuNodes if set to true, will add the DefaultGpuTaintKey taint to the GPU nodes	false

KaiwoRayConfig

KaiwoRayConfig contains the Ray-specific configuration that Kaiwo uses.

Appears in: - KaiwoConfigSpec

Field	Description	Default	Validation
`defaultRayImage` string	DefaultRayImage is the image that is used for Ray workloads if no image is provided in the workload CRD	ghcr.io/silogen/rocm-ray:6.4
`headPodMemory` string	HeadPodMemory is the amount of memory that is requested for the Ray head pod	16Gi

KaiwoResourceMonitoringConfig

KaiwoResourceMonitoringConfig configures the resource monitoring feature. Note that the following must be set as environmental variables inside the Kaiwo controller manager as these cannot be updated without restarting the operator process.

Enabling the resource monitoring feature (RESOURCE_MONITORING_ENABLED=true)
Setting the metrics endpoint (RESOURCE_MONITORING_METRICS_ENDPOINT=...)
Setting the polling interval (RESOURCE_MONITORING_POLLING_INTERVAL=30s)

Appears in: - KaiwoConfigSpec

Field	Description	Default	Validation
`lowUtilizationThreshold` float	LowUtilizationThreshold is the threshold which, if the metric goes under, the workload is considered underutilized. The threshold is interpreted as the percentage utilization versus the requested capacity.	1	Minimum: 0
`targetNamespaces` string array	TargetNamespaces is a list of namespaces to apply the monitoring to. If not supplied or empty, all namespaces apart from kube-system will be inspected. However, only pods associated with KaiwoJobs or KaiwoServices are impacted.
`profile` string	Profile chooses the target resource to monitor.	gpu	Enum: [gpu]
`terminateUnderutilized` boolean	TerminateUnderutilized will terminate workloads that are underutilizing resources if set to `true`	false
`terminateUnderutilizedAfter` string	TerminateUnderutilizedAfter specifies the duration after which the workload will be terminated if it has been underutilizing resources (for this amount of time)	24h	Pattern: `^([0-9]+(s\\|m\\|h))+$`

KaiwoSchedulingConfig

KaiwoSchedulingConfig contains the configuration Kaiwo uses for workload scheduling

Appears in: - KaiwoConfigSpec

Field	Description	Default	Validation
`kubeSchedulerName` string	KubeSchedulerName defines the default scheduler name that is used to schedule the workload	kaiwo-scheduler
`pendingThresholdForPreemption` string	PendingThresholdForPreemption is the threshold that is used to determine if a workload is awaiting for compute resources to be available. If the workload is requesting GPUs and pending for longer than this threshold, kaiwo will start preempting workloads that have exceeded their duration deadline and are using GPUs of the same vendor as the pending workload.	5m

KaiwoStorageConfig

Appears in: - KaiwoConfigSpec

Field	Description	Default
`defaultStorageClass` string	DefaultStorageClass is the storage class that is used for workloads that don't explicitly specify a storage class.
`defaultDataMountPath` string	DefaultDataMountPath is the default path for the data storage and downloads that gets mounted in the workload pods. This value can be overwritten in the workload CRD.	/workload
`defaultHfMountPath` string	DefaultHfMountPath is the default path for the HuggingFace that gets mounted in the workload pods. The `HF_HOME` environmental variable is also set to this value. This value can be overwritten in the workload CRD.	/hf_cache