Query on LoRA Layers and Encrypted Fine-Tuning with Concrete ML

hamid · November 28, 2024, 8:09pm

Dear Concrete ML Team,

I am exploring the use of Concrete ML for encrypted fine-tuning of a PyTorch model using LoRA layers. My goal is to train only the LoRA layers under FHE while keeping the rest of the model frozen. I have two specific questions:

Quantisation and LoRA Layers:

I plan to quantise the model (including LoRA layers) post-training. However, this raises a question about how to obtain quantisation parameters for the LoRA layers. If the quantisation process is performed on pre-training data (task 1) with LoRA layers deactivated, what would be the recommended approach to ensure the quantisation parameters for the LoRA layers are accurate for their use during fine-tuning (task 2)?

Fine-Tuning Setup with FHE:

Instead of using the HybridFHEModel, I would like to set up a system with FHEModelDev, FHEModelClient, and FHEModelServer. The workflow I envision involves:

• The client encrypting the input data and sending it to the server.

• The server fine-tuning only the LoRA layers under FHE.

Is this workflow feasible with the LoRA layers from my PyTorch model as described? If so, are there any key considerations or limitations I should be aware of when implementing this setup?

Thank you for your time and kind guidance.

Best regards,
Hamid

andrei-stoian-zama · December 2, 2024, 10:09am

You’re right, quantization for LoRA is somewhat tricky. For now we implement quantization that is calibrated on training data and stays constant during training. We’re not yet sure this works fine every time. In the GPT2 notebook you can see the quantized loss curve and it is higher than the fp32 loss curve.

An alternative approach to quantization is to add dynamic quantization in the RemoteLayers of the hybrid model. You can compute the quantization parameters on the fly.

You can use FheModelClient/Dev/Server on the Remotelayers of the hybrid model. For now the communication overhead is pretty large. We will improve the API, speed and communication overhead of the LoRA hybrid fine-tuning in the next release (early January). If you need more details please contact me at hello@zama.ai