I noticed that Concrete-ML utilizes all the cores of my computer. I would like to ask how it achieves parallel execution. I don’t think it’s using OpenMP because setting OMP_NUM_THREADS doesn’t affect the execution time. I suspect tfhe-rs implements parallelism using rayon. I looked into the implementation of programmable_bootstrap_lwe_ciphertext_mem_optimized
, and it ultimately calls functions like polynomial_wrapping_monic_monomial_mul_and_subtract
. The implementation of these functions is a for loop:
for ((dst, src), src_orig) in dst.iter_mut().zip(src).zip(src_orig) {
*dst = src.wrapping_neg().wrapping_sub(*src_orig);
}
But it doesn’t seem to be executing in parallel as it doesn’t use rayon. So, I would like to ask, how exactly does Concrete achieve parallel execution? Is it tfhe-rs using rayon? If so, then functions like polynomial_wrapping_monic_monomial_mul_and_subtract
wouldn’t be the main computational functions. Where can I find the files that contain the main computational functions?