I noticed that Concrete-ML utilizes all the cores of my computer. I would like to ask how it achieves parallel execution. I don’t think it’s using OpenMP because setting OMP_NUM_THREADS doesn’t affect the execution time. I suspect tfhe-rs implements parallelism using rayon. I looked into the implementation of `programmable_bootstrap_lwe_ciphertext_mem_optimized`

, and it ultimately calls functions like `polynomial_wrapping_monic_monomial_mul_and_subtract`

. The implementation of these functions is a for loop:

```
for ((dst, src), src_orig) in dst.iter_mut().zip(src).zip(src_orig) {
*dst = src.wrapping_neg().wrapping_sub(*src_orig);
}
```

But it doesn’t seem to be executing in parallel as it doesn’t use rayon. So, I would like to ask, how exactly does Concrete achieve parallel execution? Is it tfhe-rs using rayon? If so, then functions like `polynomial_wrapping_monic_monomial_mul_and_subtract`

wouldn’t be the main computational functions. Where can I find the files that contain the main computational functions?