I encountered some results that left me puzzled while using Concrete-ML to run a DNN application. When I passed the argument device=‘cuda’ to the compile_torch_model function, I noticed that most of the PBS operations still utilized the CPU backend, while only a small portion of the operations used the CUDA backend (including PBS generated by round and relu operations).
Moreover, these results were influenced by different parameter choices. For example, when the rounding_threshold_bits variable was not set, no PBS from round operations were generated, and all PBS for relu operations used the CUDA backend.
This leads to my question: Is this backend selection an intentional behavior? At which level is the decision to switch between backends made? Is it controlled in Concrete-ML (I couldn’t locate this logic so far), or is it determined in the Concrete runtime? Is there any way I can explicitly specify which backend to use?