Hello,

I am using trivial examples prior to tackling my big project to make sure I get some understanding of best way to do things. Along those lines, I thought I’d ask - what is the best way to get fastest performance (and maybe parallelization) when averaging large list of two fhe.int16?

See the things I’ve tried below. But I basically have two files with int16 samples, and I want to integer average the pairs (a+b)//2. There are about 80k samples, and it is taking hours and hours.

I have a threadripper cpu with 64 cores and an nvidia RTX3070. I am looking for any way to optimize the execution. One thought was to try to enable data parallelization, but I can’t see how to do that with direct circuit. Or even know if that’s a good idea.

Does anyone have any thoughts on optimum way to do this?

See below for some of the approaches I’ve tried.

First I was doing this:

```
with open("new_line1.raw", mode="rb") as file_line1:
line1=file_line1.read()
with open("new_line2.raw", mode="rb") as file_line2:
line2=file_line2.read()
outfile = open("out_linear_fhe_onestep.raw", mode="wb")
stream1 = struct.unpack("h" * ((len(line1)) // 2), line1)
stream2 = struct.unpack("h" * ((len(line2)) // 2), line2)
@fhe.circuit({"s1": "encrypted", "s2": "encrypted"})
def circuit(s1:fhe.int16, s2: fhe.int16):
return ((s1+s2)//2)
print (circuit)
for e1,e2 in itertools.zip_longest(stream1,stream2):
if e1 is None:
e1 = 0;
if e2 is None:
e2 = 0;
outfile.write(struct.pack("h",circuit.decrypt(circuit.run(circuit.encrypt(e1,e2)))))
```

But that doesn’t lend itself to be parallelized.

Now I am doing:

```
BATCH_SIZE = 1024
with open("data/line1_8k.raw", mode="rb") as file_line1:
line1=file_line1.read()
with open("data/line2_8k.raw", mode="rb") as file_line2:
line2=file_line2.read()
outfile = open("out_linear_fhe_np.raw", mode="wb")
stream1 = struct.unpack("h" * ((len(line1)) // 2), line1)
stream2 = struct.unpack("h" * ((len(line2)) // 2), line2)
np_stream1 = np.array(stream1,dtype='i2')
np_stream2 = np.array(stream2,dtype='i2')
if (np_stream1.size>np_stream2.size):
mod = np_stream1.size%BATCH_SIZE
new=np_stream1.size+(BATCH_SIZE-mod)
elif (np_stream1.size<np_stream2.size):
mod = np_stream2.size%BATCH_SIZE
new=np_stream2.size+(BATCH_SIZE-mod)
np_stream1.resize(new)
np_stream2.resize(new)
stream = np.stack((np_stream1,np_stream2), axis=1)
loops = np_stream1.size//BATCH_SIZE
split_stream=np.split(stream,loops)
```

```
@fhe.circuit({"a": "encrypted"})
def circuit(a:fhe.tensor[fhe.int16,BATCH_SIZE,2]):
foo = np.floor_divide(np.sum(a,axis=1),2).astype(fhe.int16)
return foo
print (circuit)
enc_stream=[]
for arr in split_stream:
enc_stream.append(circuit.encrypt(arr))
mix_stream=[]
for param in enc_stream:
mix_stream.append(circuit.run(param))
for param in mix_stream:
out=circuit.decrypt(param)
out.astype('int16').tofile(outfile)
```

Any thoughts?

All help extremely appreciated.

Thank you.

Ron.