RuntimeError: Lowering from FHE to TFHE failed (GPU)

Hello,

I have implemented a bitonic sort algorithm on concrete and I’m testing it on GPU. I’m getting an error when I pass an array length 128 or higher. However, it works fine if I’m not using GPU.
Also, the result is not accurate either. I have tested the algorithm on plaintext array and that sorts the array correctly. Following is the error, followed by the code:

loc(“/home/ubuntu/concreteOct06/bitonic_tensorized.py”:70:0): error: failed to legalize unresolved materialization from ‘tensor<8x!FHE.eint<10>>’ to ‘tensor<8x3x!TFHE.glwe<sk?>>’ that remained live after conversion
Traceback (most recent call last):
File “/home/ubuntu/concreteOct06/bitonic_tensorized.py”, line 83, in
sort_compile = sort_array.compile(inputset, dataflow_parallelize=parallelize, use_gpu=gpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/decorators.py”, line 156, in compile
return self.compiler.compile(
^^^^^^^^^^^^^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/compiler.py”, line 203, in compile
fhe_module = self._module_compiler.compile(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/module_compiler.py”, line 437, in compile
output = FheModule(
^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/module.py”, line 759, in init
self.execution_runtime.init()
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/utils.py”, line 58, in init
self._val = self._init()
^^^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/module.py”, line 738, in init_execution
execution_server = Server.create(
^^^^^^^^^^^^^^
File “/home/ubuntu/conc/lib/python3.12/site-packages/concrete/fhe/compilation/server.py”, line 213, in create
library = compiler.compile(
^^^^^^^^^^^^^^^^^
RuntimeError: Lowering from FHE to TFHE failed

def compare_and_swap_vectorized(arr, j, k, direction):
    n = arr.size
    idx = np.arange(n)
    ixj = np.bitwise_xor(idx, j)
    sel = ixj > idx
    i_sel = idx[sel]
    l_sel = ixj[sel]
    a = arr[i_sel]
    b = arr[l_sel]

    # local dir bit: convert to int
    dir_bit = ((i_sel & k) == 0).astype(np.int64)
    gt = (a > b).astype(np.int64)
    lt = (a < b).astype(np.int64)
    swap_mask = dir_bit * gt + (1 - dir_bit) * lt
    if direction == 0:
        swap_mask = 1 - swap_mask  # invert mask for descending
    # use integer mask instead of boolean
    arr[i_sel] = swap_mask * b + (1 - swap_mask) * a
    arr[l_sel] = swap_mask * a + (1 - swap_mask) * b

    return arr