Concrete-ML allows users to do model.predict(X_batch, execute_in_fhe=True). However, currently, the FHE inference is done one example at a time using a loop.
The main reason is that Concrete-ML currently compiles models with a batch of 1 (generic case) which builds a static FHE circuit that can not run over a different shape than the one given a compilation time.
But the samples are coming from the same distributions and also the integer bounds of the circuit are also calculated from training inputset so batching with multiple samples should work,may be i am missing something please correct me if i am wrong
You are right that batching could work, but it would not bring any speed-ups as it does in deep learning frameworks. However, the memory consumption for a single inference would go up proportional to the batch size. Knowing that the memory consumption of some models may be in the gigabytes, using batches is not a very interesting approach.