Metrics Evaluation Workloads¶
This helm chart implements evaluation of LLMs using the BERTscore metric, comparing generated answers to a gold standard.
The necessary Kubernetes and Helm files are stored here in /workloads/llm-evaluation-judge/helm, while the evaluation package source code and docker image build files are stored in /docker/llm-evaluation.
Helm and Kubernetes files¶
The Helm templates are stored in /workloads/llm-evaluation-metrics/helm/templates, the main template workload template being metrics_evaluation_template_with_download.yaml. Default parameters can be found in values.yaml, with user-defined configurations stored in /overrides. We have included a few example override files for typical use cases.
A few extra resources are defined in templates/.
We use a ConfigMap (templates/configmap.yaml) to mount files directly to the cluster when running the workload. Anything stored in the mount/ directory will be mounted.
Docker Container¶
We define an associated evaluation package in /docker/llm-evaluation. This contains code to call the inference container, and subsequently run the BERTscore metric evaluation, writing results to MinIO storage.
This package is installed into a docker image, which can be used to run the evaluation container in the helm template. We use a Makefile to push new images to our GitHub registry. See the /docker/llm-evaluation/README.md for more details.
Running¶
To run this evaluation workload with helm, use the template command and pipe it to kubectl apply: