How to aggregate encrypted xgboost models via bagging under a federated learning framework

giga · April 2, 2025, 1:43pm

Dear Support,
since it is the very first time I write here I want first of all thanks for the great work you all did.

I am using CocreteXGBClassifier defined as follows:
from concrete.ml.sklearn import XGBClassifier as ConcreteXGBClassifier
without any problem in training.

Now I need to share the model which as been trained locally (on the client side) with the server and make the server aggregate it with similar models coming from different clients.

Is there a way to “merge” all the trees from the clients without breaking FHE encryption?
If it is not the case, is there an aggregation I can perform server side compatible with FHE encrypted models? Can you provide any example?

I have already gone through your example about federated learning but I couldn’t find where FHE is used.

Thanks in advance for any help you may provide.
Giovanna

andrei-stoian-zama · April 4, 2025, 2:46pm

First of all, do all the trees have the exact same structure? Same number of nodes and same branching structure?

If they do, then you need to do some aggregation on the thresholds and on the leaf nodes. That is doable in FHE quite easily if the aggregation logic is something like a sum or average.

If the trees have different structure… what is the aggregation algorithm?

giga · April 4, 2025, 3:49pm

Thanks for your prompt answer.

I add some details relating the federated learning aggregation by means of bagging (of course this is just one option out of many) when no protection is applied:

xgboost is run with exactly the same hyperparameters for all M clients
when a client is done with local training, it sends its model to the server, each model “contains” N trees
the server receives all the M models and defines a new model using the M*N trees and sends it back to all clients
each client at round i will receive an aggregated model “containing” iMN trees; it runs local training once again and send only the N newly defined trees, computed during training, back to the server
go back to step 3

Now my solution should be able to:

encrypt the xgboost model and run training without leaving encryption (already done in concrete ML Xgboost)
merge two encrypted xgboost models returning one aggregated encrypted xgboost model (encryption is never left)
load the new aggregated model and continue training

ideally, this steps are equivalent to:

extract the encrypted set of trees from encrypted model A, B, C
merge the encrypted sets into a new unique encrypted set of trees (as I were merging two encrypted lists of trees into one)
load the new model back and continue training; new model X is obtained at the end of training
extracting the last N trees from the encrypted new model X returning a still encrypted object (set of trees)

Hope I was able to explain a bit better.

Of course, any directions about what API to use in particular will be greatly appreciated. Any alternative solution will be welcome as well.

Thanks in advance for the attention,
Giovanna

andrei-stoian-zama · April 23, 2025, 9:18am

Thanks for explaining, I can’t judge whether this is doable just from the description. Let me know if you encounter issues during implementation.