llm-inference-megatron-lm
Pretrained LLM inference with Megatron-LM on MI300X¶
This Helm Chart deploys the LLM Inference Megatron-LM workload.
Prerequisites¶
The following prerequisites should be met before deploying this workload::
- Helm: Install
helm
. Refer to the Helm documentation for instructions. - Secrets: Secret containing the S3 storage provider's HMAC credentials should be created in the namespace, where workload runs. Default secret has name
minio-credentials
with keysminio-access-key
andminio-secret-key
.
Deploying the Workload¶
To deploy workload pipe the result of helm template
command to kubectl apply
. Generally, full command looks as follows
helm template [release-name] <helm-chart-dir> [-f <path/to/overrides/xyz.yaml>] [--set <name>=<value>] | kubectl apply -f - [-n namespace]
An example of the command that will use default helm values and deploys workload to a default namespoace is shown below
Interacting with Deployed Model¶
Verify Deployment¶
Check the deployment status:
You should see a deployment with a name starting with the prefixllm-inference-megatron-lm-
up and running.
To see service deployed by the workload run
The service should have a name starting with llm-inference-megatron-lm-
. Note the port exposed by the service, it is expected to be the port 80
.
Port Forwarding¶
Forward the port to access the service to the local machine at e.g. port 5000
. Assuming the service is named llm-inference-megatron-lm-20250522-1521
and the port exposed is 80
, use the following command:
Test the model inference service¶
Send a request to the service to get a reply from the model using curl
command:
curl -X PUT -H "Content-Type: application/json" -d '{"prompts": ["This is a test prompt."], "tokens_to_generate": 50}' http://localhost:5000/api
Another way to interact with the inference service is to use the test_manual.py
script located in the llm-inference-megatron-lm/helm/mount
directory. This script prompts for multiple questions interactively. To run it, use the following command (assuming repo root is your current directory):
You can also run a quick sanity check to evaluate the coherence of the model by using the coherence.py
script located in the llm-inference-megatron-lm/helm/mount
directory. This file contains multiple questions along with their corresponding expected answers. When the model's generated response matches the expected answer, a point is awarded. At the end, the user can assess the model's coherence performance based on the total score. To run the evaluation, use the following command: