Finetuning with LLaMA-Factory¶
This is a Helm Chart for running a finetuning job using LLaMA-Factory
The output is saved with MinIO in the directory specified by checkpointsRemote
.
Configuration¶
Include any parameters for LLaMA-Factory in the llamaFactoryConfig
parameter. See the override file overrides/finetune-lora.yaml
for an example and the LLaMA-Factory documentation for more details.
Running the workload¶
The simplest is to run helm template
and pipe the result to kubectl create
.
Example command using the example override file overrides/finetune-lora.yaml
:
helm template workloads/llm-finetune-llama-factory/helm \
--values workloads/llm-finetune-llama-factory/helm/overrides/finetune-lora.yaml \
--name-template finetune-lora-llama-factory \
| kubectl create -f -
Data specification¶
Specify the name of data set used for training as dataset
. This can include datasets predefined in LLaMA-Factory or those defined in datasetInfo
. Use commas to separate multiple data sets.
To use other datasets, create an entry in datasetInfo
following the LLaMA-Factory dataset info format. Note that LLaMA-Factory directly supports loading datasets from HuggingFace, ModelScope, or s3/gcs cloud storage by setting the urls according to the documentation.
This workload adds a custom way to load data from MinIO. In datasetInfo
specify the path to the dataset in the remote bucket as pathRemote
, and the workload will load the file and update the configuration. See the override file overrides/finetune-model_data_from_minio.yaml
for an example of finetuning where the data and model are loaded from MinIO.
Model specification¶
To use a base model from HuggingFace or other source directly supported by LLaMA-Factory, specify the model name in modelName
.
Alternatively to use a model from MinIO, specify the path to the model in modelRemote
.
Either modelName
or modelRemote
must be specified. If both are included, the model from modelRemote
is used.
Cleanup¶
After the jobs are completed, please delete the resources created. In particular for multi-node ray jobs, a PersistentVolumeClaim
is used as shared storage and persists on the cluster after the job is completed.
To delete the resources, you can run the same helm template
command, only replacing kubectl create
with kubectl delete
, e.g.:
helm template workloads/llm-finetune-llama-factory/helm \
--values workloads/llm-finetune-llama-factory/helm/overrides/finetune-lora.yaml \
--name-template finetune-lora-llama-factory \
| kubectl delete -f -
Multi-node finetuning with ray¶
The chart supports multi-node jobs by setting nodes
to an integer greater than 1. Doing so enables ray and creates a RayJob instead. An example config is provided in overrides/finetune-lora-ray.yaml
. The example also shows how to use DeepSpeed ZeRO Stage 2 to partition the gradients. To enable DeepSpeed, set the deepspeed
parameter in the LLaMA-Factory config to point to one of the deepspeed configs included in LLaMA-Factory or a dictionary.
When configuring ray jobs, the resources you are requesting (nodes
and gpusPerNode
) are automatically specified for LLaMA-Factory, and do not need to be included separately in the llamaFactoryConfig
.
Limitations¶
unsloth
and bitsandbytes
are not installed in the currently used image, so any functionality using those libraries will not work.