Finetuning with VeRL¶

This is a Helm Chart for running a finetuning job using VeRL

The output is saved with MinIO in the directory specified by checkpointsRemote.

Configuration¶

Include any parameters for VeRL in the verlConfig parameter. See the override file overrides/ppo_qwen_gsm8k.yaml for an example and the VeRL documentation for more details.

Running the workload¶

The simplest is to run helm template and pipe the result to kubectl create.

Example command using the example override file overrides/ppo_qwen_gsm8k.yaml:

helm template workloads/llm-finetune-verl/helm \
  --values workloads/llm-finetune-verl/helm/overrides/ppo_qwen_gsm8k.yaml \
  --name-template ppo-qwen-gsm8k-verl \
  | kubectl create -f -

Data specification¶

VeRL requires that the data is prepared for the policy training in a particular way.

Some example data preprocess scripts are provided, to use one of these, specify the name of data set used for training as dataset. Available datasets are "full_hh_rlhf", "geo3k", "gsm8k", "hellaswag", "math_dataset".

To use your own datasets from MinIO, specify the path as datasetRemote. It should point to a directory with files that have already been appropriately processed (train.parquet and test.parquet).

Model specification¶

To use a base model from HuggingFace or other source directly supported by LLaMA-Factory, specify the model name in modelName.

Alternatively to use a model from MinIO, specify the path to the model in modelRemote.

Either modelName or modelRemote must be specified. If both are included, the model from modelRemote is used.

Cleanup¶

After the jobs are completed, please delete the resources created. To delete the resources, you can run the same helm template command, only replacing kubectl create with kubectl delete, e.g.:

helm template workloads/llm-finetune-verl/helm \
  --values workloads/llm-finetune-verl/helm/overrides/ppo_qwen_gsm8k.yaml \
  --name-template ppo-qwen-gsm8k-verl \
  | kubectl delete -f -