I have implemented a few statistical operations on concrete and I’m testing it on a GPU machine but I noticed that the performance is better when not using gpu (use_gpu=False).
I have tensorized my code for calculating variance (numerator only).
@fhe.compiler({“array”: “encrypted”})
def calculate_variance_numerator_tensorized(array):
n = array.size
# Vectorized operations
array_sq = array * array # elementwise square
array_sum = np.sum(array) # sum of all elements
array_sq_sum = np.sum(array_sq) # sum of squares
# list_into_sum = Σ(array[i] * array_sum)
list_into_sum = np.sum(array * array_sum)
component_one = array_sq_sum * (n * n)
component_two = list_into_sum * (n * 2)
component_three = array_sum * array_sum * n
return fhe.refresh((component_three + component_one) - component_two)
inputset = [np.random.randint(0, lrange, size=lsize) for _ in range(5)]
inputset.append(np.full(lsize, lrange))
circuit = calculate_variance_numerator_tensorized.compile(inputset, dataflow_parallelize = parallelize, use_gpu = gpu)
Could you please confirm why the performance on cpu is better than gpu? I also noticed that dataflow_parallelize doesn’t work on tensorized code, is it because it is already parallelized?
That hard to say why in this case CPU is better than CPU, it often depends on how computation subgraph can be offloaded to the GPU and the actual GPU workload, if this workload is not enought you can lose more time to transfer data on GPU than keeping on CPU.
What do you mean by dataflow_parallelize doesn’t work on tensorized code? Actually it should be orthogonal.
I have a dedicated instance on aws for testing the code using gpu, which currently has one gpu. Does that mean I need more gpus? or is it just the offloading that’s increasing the time?
For dataflow_parallelize, when I try to enable it on tensorized code (variance numerator) I get the following error
Does it have something to do with numpy methods?
For a separate statistical operation (five point summary) I am using np.max() and np.min() methods. With dataflow_parallelize, it returns:
python3: symbol lookup error: /tmp/tmpz5k6t7o4/sharedlib.so: undefined symbol: _dfr_make_ready_future