Maintenance Guide
This section outlines common maintenance tasks for the Kaiwo system.
Upgrading Kaiwo
- Check Release Notes: Before upgrading, review the release notes for the target version on the GitHub Releases page. Pay attention to any breaking changes, dependency updates, or manual migration steps required.
- Backup (Optional but Recommended): Consider backing up relevant configurations, especially your
KaiwoQueueConfig
CRD (kubectl get kaiwoqueueconfig kaiwo -o yaml > kaiwoqueueconfig_backup.yaml
). -
Apply New Manifest: Apply the
install.yaml
manifest for the new version usingkubectl apply --server-side
.4. Verify Upgrade:export KAIWO_NEW_VERSION=vX.Y.Z # Set to the target version kubectl apply -f https://github.com/silogen/kaiwo/releases/download/${KAIWO_NEW_VERSION}/install.yaml --server-side
- Check that the Kaiwo operator pod restarts and uses the new image version:
- Monitor operator logs for any errors during startup:
- Ensure Kaiwo CRDs remain functional and workloads continue to be processed.
- Upgrade Dependencies: If the Kaiwo release notes indicate required upgrades for dependencies (Kueue, Ray, Cert-Manager, etc.), perform those upgrades according to their respective documentation.
Certificate Rotation
- Cert-Manager (Default): If using the default setup with Cert-Manager, certificate rotation for webhooks is typically handled automatically based on the
Certificate
resources created during installation. Monitor Cert-Manager logs and certificate expiry (kubectl get certificates -n kaiwo-system
) if issues arise.
Operator Pod Management
- Restarting: If the operator becomes unresponsive, you can restart it by deleting the pod: The Deployment will automatically create a new pod.
- Scaling: The Kaiwo operator deployment typically runs with a single replica due to leader election (
--leader-elect=true
). Scaling is generally not required unless leader election is disabled (not recommended for production).
Cleaning Up Resources
- Workloads: Users can delete their workloads using
kaiwo manage
orkubectl delete kaiwojob <name>
/kubectl delete kaiwoservice <name>
. The Kaiwo operator's finalizers ensure associated resources (like underlying Jobs/Deployments, PVCs created by Kaiwo download jobs) are cleaned up. - Kueue Resources: Resources managed by
KaiwoQueueConfig
(Flavors, ClusterQueues, PriorityClasses) are typically deleted if removed from thekaiwo
KaiwoQueueConfig
spec. - Uninstalling Kaiwo: To completely remove Kaiwo, delete the installation manifest and clean up CRDs and namespaces:
Remember to also uninstall dependencies if they are no longer needed.
# Replace vX.Y.Z with the installed version export KAIWO_VERSION=vX.Y.Z kubectl delete -f https://github.com/silogen/kaiwo/releases/download/${KAIWO_VERSION}/install.yaml # Delete CRDs (use with caution - this will delete ALL KaiwoJob/Service/QueueConfig resources) kubectl delete crd kaiwojobs.kaiwo.silogen.ai kubectl delete crd kaiwoservices.kaiwo.silogen.ai kubectl delete crd kaiwoqueueconfigs.kaiwo.silogen.ai # ... delete other Kaiwo CRDs if any # Delete namespace kubectl delete namespace kaiwo-system