Dear Concrete ML Team,
I am exploring the use of Concrete ML for encrypted fine-tuning of a PyTorch model using LoRA layers. My goal is to train only the LoRA layers under FHE while keeping the rest of the model frozen. I have two specific questions:
- Quantisation and LoRA Layers:
I plan to quantise the model (including LoRA layers) post-training. However, this raises a question about how to obtain quantisation parameters for the LoRA layers. If the quantisation process is performed on pre-training data (task 1) with LoRA layers deactivated, what would be the recommended approach to ensure the quantisation parameters for the LoRA layers are accurate for their use during fine-tuning (task 2)?
- Fine-Tuning Setup with FHE:
Instead of using the HybridFHEModel, I would like to set up a system with FHEModelDev, FHEModelClient, and FHEModelServer. The workflow I envision involves:
• The client encrypting the input data and sending it to the server.
• The server fine-tuning only the LoRA layers under FHE.
Is this workflow feasible with the LoRA layers from my PyTorch model as described? If so, are there any key considerations or limitations I should be aware of when implementing this setup?
Thank you for your time and kind guidance.
Best regards,
Hamid