Optimizing Bitonic Sort for Larger Arrays in Concrete FHE

Hello,

I’ve implemented bitonic sort algorithm using concrete and have been trying to optimize it. However, it takes a long time as the array size increases, particularly with larger input ranges.

# Helper function to compare and swap two elements
def compare_and_swap(a, b, direction):
    c = (a > b) if direction == 1 else (a < b)
    a_new = c * b + (1 - c) * a
    b_new = c * a + (1 - c) * b
    return a_new, b_new


# Helper function to perform the bitonic merge
def bitonic_merge(arr, low, cnt, direction):
    if cnt > 1:
        k = cnt // 2
        for i in range(low, low + k):
            arr[i], arr[i + k] = compare_and_swap(arr[i], arr[i + k], direction)
        bitonic_merge(arr, low, k, direction)
        bitonic_merge(arr, low + k, k, direction)


# Helper function to perform bitonic sort recursively
def bitonic_sort(arr, low, cnt, direction):
    if cnt > 1:
        k = cnt // 2
        bitonic_sort(arr, low, k, 1)  # Sort first half in ascending order
        bitonic_sort(arr, low + k, k, 0)  # Sort second half in descending order
        bitonic_merge(arr, low, cnt, direction)  # Merge both halves

    return arr


@fhe.compiler({"array": "encrypted"})
def sort_array_bitonic(array):
    array = bitonic_sort(array, 0, array.size, 1)
    return array

I tried to replace compare_and_swap logic with fhe.if_then_else() but I get this error

RuntimeError: A subgraph within the function you are trying to compile cannot be fused because of a node, which is marked explicitly as non-fusable

I also attempted to reduce multiplications using a community-suggested approach:

a_new = c * (b - a) + a
b_new = a + b - a_new

But this version actually increases the execution time.

I’ve been testing the algorithm with the following input set:

inputset = [np.random.randint(1, lrange, size=lsize) for _ in range(5)]

I’ve tried tweaking these parameters (e.g., changing the range or size), but unfortunately, this didn’t help improve performance or reduce compilation time.

Questions:

  1. Why does execution time increase when using the more efficient arithmetic logic (fewer multiplications)?
  2. Is there a recommended approach to make this work efficiently for arrays of length 32 or more?
    I’ve attempted larger arrays, but the compilation either takes extremely long or never completes.
  3. Would using a more powerful system (e.g., high-memory cloud instance or GPU-enabled environment) help get past the long compilation phase?
    Or is the bottleneck more algorithmic or FHE-specific than hardware-related?

Hello,

For the sake of clarity, it’s a followup of this thread: Implementing Comparison Strategies

Hello @rraj,

Sorry, it’s taking a bit of time, but people will look at your question. My 2 cents now, but people directly from the Concrete compiler will be more precise:

  1. Why does execution time increase when using the more efficient arithmetic logic (fewer multiplications)?

Concrete is all about compiling a circuit. Sometimes, a circuit may have less operations on encrypted values but be smaller because somehow, the contraints on these operations are higher, and so crypto params need to be larger and the circuit is slower.

  1. Is there a recommended approach to make this work efficiently for arrays of length 32 or more?

Sorting encrypted arrays will never be very fast, I’m afraid.

I’ve attempted larger arrays, but the compilation either takes extremely long or never completes.

Certainly because your circuit becomes more complex with large arrays. You might want to look at composable functions and modules [Combining compiled functions | Concrete], to see how you could have functions which are callable again and again without having circuits which are larger and larger.

  1. Would using a more powerful system (e.g., high-memory cloud instance or GPU-enabled environment) help get past the long compilation phase?
    Or is the bottleneck more algorithmic or FHE-specific than hardware-related?

Yes, it’s better to take the most powerful machine you can. Also, there is a GPU flag in Concrete: depending on the circuit, it can be much faster or not. Give it a try if you have the right machine.

Cheers