Finetuning with LLaMA-Factory¶

This is a Helm Chart for running a finetuning job using LLaMA-Factory

The output is saved with MinIO in the directory specified by checkpointsRemote.

Configuration¶

Include any parameters for LLaMA-Factory in the llamaFactoryConfig parameter. See the override file overrides/finetune-lora.yaml for an example and the LLaMA-Factory documentation for more details.

Running the workload¶

The simplest is to run helm template and pipe the result to kubectl create.

Example command using the example override file overrides/finetune-lora.yaml:

helm template workloads/llm-finetune-llama-factory/helm \
  --values workloads/llm-finetune-llama-factory/helm/overrides/finetune-lora.yaml \
  --name-template finetune-lora-llama-factory \
  | kubectl create -f -

Data specification¶

Specify the name of data set used for training as dataset. This can include datasets predefined in LLaMA-Factory or those defined in datasetInfo. Use commas to separate multiple data sets.

To use other datasets, create an entry in datasetInfo following the LLaMA-Factory dataset info format. Note that LLaMA-Factory directly supports loading datasets from HuggingFace, ModelScope, or s3/gcs cloud storage by setting the urls according to the documentation.

This workload adds a custom way to load data from MinIO. In datasetInfo specify the path to the dataset in the remote bucket as pathRemote, and the workload will load the file and update the configuration. See the override file overrides/finetune-model_data_from_minio.yaml for an example of finetuning where the data and model are loaded from MinIO.

Model specification¶

To use a base model from HuggingFace or other source directly supported by LLaMA-Factory, specify the model name in modelName.

Alternatively to use a model from MinIO, specify the path to the model in modelRemote.

Either modelName or modelRemote must be specified. If both are included, the model from modelRemote is used.

Cleanup¶

After the jobs are completed, please delete the resources created. In particular for multi-node ray jobs, a PersistentVolumeClaim is used as shared storage and persists on the cluster after the job is completed.

To delete the resources, you can run the same helm template command, only replacing kubectl create with kubectl delete, e.g.:

helm template workloads/llm-finetune-llama-factory/helm \
  --values workloads/llm-finetune-llama-factory/helm/overrides/finetune-lora.yaml \
  --name-template finetune-lora-llama-factory \
  | kubectl delete -f -

Multi-node finetuning with ray¶

The chart supports multi-node jobs by setting nodes to an integer greater than 1. Doing so enables ray and creates a RayJob instead. An example config is provided in overrides/finetune-lora-ray.yaml. The example also shows how to use DeepSpeed ZeRO Stage 2 to partition the gradients. To enable DeepSpeed, set the deepspeed parameter in the LLaMA-Factory config to point to one of the deepspeed configs included in LLaMA-Factory or a dictionary.

When configuring ray jobs, the resources you are requesting (nodes and gpusPerNode) are automatically specified for LLaMA-Factory, and do not need to be included separately in the llamaFactoryConfig.

Limitations¶

unsloth and bitsandbytes are not installed in the currently used image, so any functionality using those libraries will not work.