Hello,
I’ve implemented bitonic sort algorithm using concrete and have been trying to optimize it. However, it takes a long time as the array size increases, particularly with larger input ranges.
# Helper function to compare and swap two elements
def compare_and_swap(a, b, direction):
c = (a > b) if direction == 1 else (a < b)
a_new = c * b + (1 - c) * a
b_new = c * a + (1 - c) * b
return a_new, b_new
# Helper function to perform the bitonic merge
def bitonic_merge(arr, low, cnt, direction):
if cnt > 1:
k = cnt // 2
for i in range(low, low + k):
arr[i], arr[i + k] = compare_and_swap(arr[i], arr[i + k], direction)
bitonic_merge(arr, low, k, direction)
bitonic_merge(arr, low + k, k, direction)
# Helper function to perform bitonic sort recursively
def bitonic_sort(arr, low, cnt, direction):
if cnt > 1:
k = cnt // 2
bitonic_sort(arr, low, k, 1) # Sort first half in ascending order
bitonic_sort(arr, low + k, k, 0) # Sort second half in descending order
bitonic_merge(arr, low, cnt, direction) # Merge both halves
return arr
@fhe.compiler({"array": "encrypted"})
def sort_array_bitonic(array):
array = bitonic_sort(array, 0, array.size, 1)
return array
I tried to replace compare_and_swap
logic with fhe.if_then_else() but I get this error
RuntimeError: A subgraph within the function you are trying to compile cannot be fused because of a node, which is marked explicitly as non-fusable
I also attempted to reduce multiplications using a community-suggested approach:
a_new = c * (b - a) + a
b_new = a + b - a_new
But this version actually increases the execution time.
I’ve been testing the algorithm with the following input set:
inputset = [np.random.randint(1, lrange, size=lsize) for _ in range(5)]
I’ve tried tweaking these parameters (e.g., changing the range or size), but unfortunately, this didn’t help improve performance or reduce compilation time.
Questions:
- Why does execution time increase when using the more efficient arithmetic logic (fewer multiplications)?
- Is there a recommended approach to make this work efficiently for arrays of length 32 or more?
I’ve attempted larger arrays, but the compilation either takes extremely long or never completes. - Would using a more powerful system (e.g., high-memory cloud instance or GPU-enabled environment) help get past the long compilation phase?
Or is the bottleneck more algorithmic or FHE-specific than hardware-related?