Unexpected results with GPU enabled

Hello,
I’m performing statistical operations on encrypted data using concrete. I’m using an aws instance which has gpu and I’m testing the results with use_gpu and dataflow_parallelize.

I noticed that the execution and compilation takes a lot longer when I have use_gpu=True. However dataflow_parallelize=True is comparatively faster.

Could you please confirm whether this is expected behavior or am I missing something while having use_gpu=True.
Also, I noticed I couldn’t enable use_gpu and dataflow_parallelize at the same time. is use_gpu parallelized by default? if not then is it possible to parallelize it?

Following is the code I’m currently testing

@fhe.module()
class Mean:
    @fhe.function({"array": "encrypted"})
    def calculate_mean(array):
        sum = fhe.zero()
        for a in array:
            sum += a

        return fhe.refresh(sum)

inputset = [np.random.randint(0, lrange, size=l_s) for _ in range(5)]
m_compile = Mean.compile({"calculate_mean": inputset})

As an example. array length is 16384 and range is 1 - 64

Thank you