Deploying PEFT models with Python client library

Last updated: Jun 26, 2025

To deploy a PEFT model with watsonx.ai Python client library, start by creating and deploying the base foundation model asset, followed by creating and deploying the LoRA adapter model asset, which enables the fine-tuned model to be deployed and used for online inferencing. This process allows for the deployment of fine-tuned models with PEFT techniques, enabling real-time predictions.

Before you begin

Review requirements for deploying PEFT models, including supported models, hardware and software requirements, and deployment types. For more information, see Requirements for deploying PEFT models with REST API.
You must authenticate by generating and entering your API key.
You must enable the auto_update_model option during training to create an asset for the LoRA adapter model asset in the watsonx.ai Runtime repository.

Creating the base foundation model asset

Create a watsonx.ai Runtime model asset by providing the fine-tune model details by using the watsonx.ai Python client library.

The following code sample shows how to create the base foundation model asset for a supported foundation model:

sw_spec_id = client.software_specifications.get_id_by_name('watsonx-cfm-caikit-1.1')

metadata = {
    client.repository.ModelMetaNames.NAME: 'Base FT model',
    client.repository.ModelMetaNames.SOFTWARE_SPEC_ID: sw_spec_id,
    client.repository.ModelMetaNames.TYPE: "base_foundation_model_1.0"
}

stored_base_model_details = client.repository.store_model(model="meta-llama/llama-3-1-8b", meta_props=metadata)

stored_base_model_asset_id = client.repository.get_model_id(stored_base_model_details)
stored_base_model_asset_id

Deploying the base foundation model asset

When you create an online deployment for your base foundation model, you must set the enable_lora parameter to true in the JSON payload so that you can deploy the LoRA or QLoRA adapters by using the base foundation model.

Follow these steps to create an online deployment for the base foundation model asset with Python client library:

Retrieve the hardware specification for your base foundation model asset:
```
hw_spec_id = client.hardware_specifications.get_id_by_name("WX-S")
```

Deploy the base foundation model by using the base foundation model asset:

from datetime import datetime

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "FT DEPLOYMENT SDK - project Lora",
    client.deployments.ConfigurationMetaNames.ONLINE: {
        "parameters": {
            "foundation_model": {
                "enable_lora": True,
            },
        },
    },
    client.deployments.ConfigurationMetaNames.SERVING_NAME : f"ft_sdk_deployment_{datetime.now().strftime   ('%Y_%m_%d_%H%M%S')}",
    client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
        "id": hw_spec_id,
        "num_nodes": 1
    },
}

base_model_deployment_details = client.deployments.create(stored_base_model_asset_id, meta_props)

You can specify the base_model_deployment_id from the deployment details:

base_model_deployment_id = base_model_deployment_details['metadata']['id']
base_model_deployment_id

Deploying the LoRA or QLoRA adapter model asset

Use the deployed base foundation model to deploy the LoRA adapters as an additional layer on the base foundation model.

Follow these steps to deploy the LoRA or QLoRA adapter model:

Create the LoRA or QLoRA deployment by providing the LoRA or QLoRA base model asset and the deployment ID of the base foundation model.

from datetime import datetime

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "LORA ADAPTER DEPLOYMENT SDK",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SERVING_NAME : f"lora_deployment_{datetime.now().strftime   ('%Y_%m_%d_%H%M%S')}",
    client.deployments.ConfigurationMetaNames.BASE_DEPLOYMENT_ID: base_model_deployment_id,
}

lora_adapter_deployment_details = client.deployments.create(tuned_model_id, meta_props)

You can specify lora_adapter_deployment_id from deployment details.

lora_adapter_deployment_id = lora_adapter_deployment_details['metadata']['id']
lora_adapter_deployment_id

Learn more

Inferencing deployed PEFT models

Parent topic: Deploying Parameter-Efficient Fine-Tuned (PEFT) models

Was the topic helpful?

0/1000