Deploying PEFT models with Python client library
To deploy a PEFT model with watsonx.ai Python client library, start by creating and deploying the base foundation model asset, followed by creating and deploying the LoRA adapter model asset, which enables the fine-tuned model to be deployed and used for online inferencing. This process allows for the deployment of fine-tuned models with PEFT techniques, enabling real-time predictions.
Before you begin
- Review requirements for deploying PEFT models, including supported models, hardware and software requirements, and deployment types. For more information, see Requirements for deploying PEFT models with REST API.
- You must authenticate by generating and entering your API key.
- You must enable the
auto_update_model
option during training to create an asset for the LoRA adapter model asset in the watsonx.ai Runtime repository.
Creating the base foundation model asset
Create a watsonx.ai Runtime model asset by providing the fine-tune model details by using the watsonx.ai Python client library.
The following code sample shows how to create the base foundation model asset for a supported foundation model:
sw_spec_id = client.software_specifications.get_id_by_name('watsonx-cfm-caikit-1.1')
metadata = {
client.repository.ModelMetaNames.NAME: 'Base FT model',
client.repository.ModelMetaNames.SOFTWARE_SPEC_ID: sw_spec_id,
client.repository.ModelMetaNames.TYPE: "base_foundation_model_1.0"
}
stored_base_model_details = client.repository.store_model(model="meta-llama/llama-3-1-8b", meta_props=metadata)
stored_base_model_asset_id = client.repository.get_model_id(stored_base_model_details)
stored_base_model_asset_id
Deploying the base foundation model asset
When you create an online deployment for your base foundation model, you must set the enable_lora
parameter to true
in the JSON payload so that you can deploy the LoRA or QLoRA adapters by using the base foundation
model.
Follow these steps to create an online deployment for the base foundation model asset with Python client library:
-
Retrieve the hardware specification for your base foundation model asset:
hw_spec_id = client.hardware_specifications.get_id_by_name("WX-S")
-
Deploy the base foundation model by using the base foundation model asset:
from datetime import datetime meta_props = { client.deployments.ConfigurationMetaNames.NAME: "FT DEPLOYMENT SDK - project Lora", client.deployments.ConfigurationMetaNames.ONLINE: { "parameters": { "foundation_model": { "enable_lora": True, }, }, }, client.deployments.ConfigurationMetaNames.SERVING_NAME : f"ft_sdk_deployment_{datetime.now().strftime ('%Y_%m_%d_%H%M%S')}", client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: { "id": hw_spec_id, "num_nodes": 1 }, } base_model_deployment_details = client.deployments.create(stored_base_model_asset_id, meta_props)
-
You can specify the
base_model_deployment_id
from the deployment details:base_model_deployment_id = base_model_deployment_details['metadata']['id'] base_model_deployment_id
Deploying the LoRA or QLoRA adapter model asset
Use the deployed base foundation model to deploy the LoRA adapters as an additional layer on the base foundation model.
Follow these steps to deploy the LoRA or QLoRA adapter model:
-
Create the LoRA or QLoRA deployment by providing the LoRA or QLoRA base model asset and the deployment ID of the base foundation model.
from datetime import datetime meta_props = { client.deployments.ConfigurationMetaNames.NAME: "LORA ADAPTER DEPLOYMENT SDK", client.deployments.ConfigurationMetaNames.ONLINE: {}, client.deployments.ConfigurationMetaNames.SERVING_NAME : f"lora_deployment_{datetime.now().strftime ('%Y_%m_%d_%H%M%S')}", client.deployments.ConfigurationMetaNames.BASE_DEPLOYMENT_ID: base_model_deployment_id, } lora_adapter_deployment_details = client.deployments.create(tuned_model_id, meta_props)
-
You can specify
lora_adapter_deployment_id
from deployment details.lora_adapter_deployment_id = lora_adapter_deployment_details['metadata']['id'] lora_adapter_deployment_id
Learn more
Parent topic: Deploying Parameter-Efficient Fine-Tuned (PEFT) models