llm-inference-megatron-lm
Pretrained LLM inference with Megatron-LM on MI300X¶
This Helm Chart deploys the LLM Inference Megatron-LM workload.
Prerequisites¶
The following prerequisites should be met before deploying this workload::
- Helm: Install
helm. Refer to the Helm documentation for instructions. - Secrets: Secret containing the S3 storage provider's HMAC credentials should be created in the namespace, where workload runs. Default secret has name
minio-credentialswith keysminio-access-keyandminio-secret-key.
Deploying the Workload¶
To deploy workload pipe the result of helm template command to kubectl apply. Generally, full command looks as follows
helm template [release-name] <helm-chart-dir> [-f <path/to/overrides/xyz.yaml>] [--set <name>=<value>] | kubectl apply -f - [-n namespace]
An example of the command that will use default helm values and deploys workload to a default namespoace is shown below
Interacting with Deployed Model¶
Verify Deployment¶
Check the deployment status:
You should see a deployment with a name starting with the prefixllm-inference-megatron-lm- up and running.
To see service deployed by the workload run
The service should have a name starting with llm-inference-megatron-lm-. Note the port exposed by the service, it is expected to be the port 80.
Port Forwarding¶
Forward the port to access the service to the local machine at e.g. port 5000. Assuming the service is named llm-inference-megatron-lm-20250522-1521 and the port exposed is 80, use the following command:
Test the model inference service¶
Send a request to the service to get a reply from the model using curl command:
curl -X PUT -H "Content-Type: application/json" -d '{"prompts": ["This is a test prompt."], "tokens_to_generate": 50}' http://localhost:5000/api
Another way to interact with the inference service is to use the test_manual.py script located in the llm-inference-megatron-lm/helm/mount directory. This script prompts for multiple questions interactively. To run it, use the following command (assuming repo root is your current directory):
You can also run a quick sanity check to evaluate the coherence of the model by using the coherence.py script located in the llm-inference-megatron-lm/helm/mount directory. This file contains multiple questions along with their corresponding expected answers. When the model's generated response matches the expected answer, a point is awarded. At the end, the user can assess the model's coherence performance based on the total score. To run the evaluation, use the following command: