# Calculating Mean Value of Two-Dimensional Matrix

Hello. I have already created a topic for my use case. I need to calculate the mean value of a two-dimensional matrix (like data_mean in the sample code below). As a NumPy operation for returning mean value is not supported what I have to do for this issue.

`````` def _calculate_msr(self, data, rows, cols):
"""Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""
sub_data = data[rows][:, cols]

data_mean = np.mean(sub_data)
row_means = np.mean(sub_data, axis=1)
col_means = np.mean(sub_data, axis=0)

residues = sub_data - row_means[:, np.newaxis] - col_means + data_mean
squared_residues = residues * residues

msr = np.mean(squared_residues)
row_msr = np.mean(squared_residues, axis=1)
col_msr = np.mean(squared_residues, axis=0)

return msr, row_msr, col_msr
``````

Any help is greatly appreciated!
I would be happy to provide more detailed information if needed.

Thank you!

Hello,

The new version of Concrete Numpy will support `np.sum`. So, you’ll be able to do:

• `np.sum(x) // x.size` for `data_mean`
• `np.sum(x, axis=1) // x.shape[1]` for `row_means`
• `np.sum(x, axis=0) // x.shape[0]` for `col_means`

Other than that, you might need to do `squared_residues = residues ** 2` as we don’t support encrypted multiplications yet.
Lastly, we don’t support `np.newaxis` in indexing so you might need to use reshape there.

The new version should be available in the upcoming months.

Let us know if you have any other questions!
Thanks.

2 Likes

Thank you for your response. I tried to first modify a simple function written in README of CNP (current version 0.5.0) for computing mean value, I faced these following issues:

1. `np.sum(x) // x.size ` terminated by this error `AttributeError: 'NPTracer' object has no attribute 'size'`

2. inputset in my case is a two dimensional numpy array and I need a similar input for the function but results in `RuntimeError: argument #0has not the expected number of dimension, got 2 expected 1` (encrypting circuit on the input) and `TypeError: 'NoneType' object is not iterable` (compiling inputset).

• inputset: `np.random.randint(0,600, size=(2884,17))`

• input: `sub_data = data[rows][:, cols] where |rows| < 2884 & |cols| < 17`

1. As the resulting encrypted data_mean should be used later for further computation, how it would be possible to do so (maybe using `encrypted_result`:
`encrypted_result = circuit.run(public_args)`)?

Hello, I would appreciate someone answer the above issue.

Hello @ShVS, sorry for keeping you waiting.

With the release of concrete-numpy `v0.6.0` today, you can use `np.sum` so, you should be able to do what you wanted. Here is an example:

``````import concrete.numpy as cnp
import numpy as np

@cnp.compiler({"x": "encrypted"})
def function(x):
return np.sum(x) // x.size

inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = function.compile(inputset)

sample = np.array([[10, 20], [4, 6]])
assert circuit.encrypt_run_decrypt(sample) == 10
``````

Let us know if it’s working as you expect in `v0.6.0`.
Umut

1 Like

Hi and thank you for the response. Hopefully the sample code here is working with the latest version of concrete-numpy.

According to my original issue, still I have some relevant issues:

• you mentioned that `np.newaxis` in indexing is not supported so I have to use reshape here but when I apply reshape on encrypted vector:
``````AttributeError: 'PublicResult' object has no attribute 'reshape'
``````
• Even for squaring like `squared_residues = residues ** 2` I got error:
``````TypeError: unsupported operand type(s) for ** or pow(): 'PublicResult' and 'int'
``````
• and I would appreciate it if you could possibly suggest me how I can proceed with addition and subtraction here as input (sample) / inputsets are making troubles and no appropriate results can be achieved so far.

I tried to rewrite the above mentioned code into CNP like this:

``````  def cnp_datamean(self, data):
return np.sum(data) // data.size

def cnp_rowmean(self, data):
return np.sum(data, axis=1) // data.shape[1]

def cnp_colmean(self, data):
return np.sum(data, axis=0) // data.shape[0]

return value1 + value2

def cnp_square(self, value1):
return value1 ** 2

def cnp_sub(self, value1, value2):
return value1 - value2

def getEnc_mean(self, inputset, data):
compiler = cnp.Compiler(self.cnp_datamean, {"data": "encrypted"})
circuit = compiler.compile(inputset)
data = data.astype('uint8')
print(circuit.encrypt_run_decrypt(data))
circuit.keygen()
public_args = circuit.encrypt(data)
encrypted_datamean = circuit.run(public_args)
return encrypted_datamean

def getEnc_rowmean(self, inputset, data):
compiler = cnp.Compiler(self.cnp_rowmean, {"data": "encrypted"})
circuit = compiler.compile(inputset)
data = data.astype('uint8')
circuit.keygen()
public_args = circuit.encrypt(data)
encrypted_rowmean = circuit.run(public_args)
return encrypted_rowmean

def getEnc_colmean(self, inputset, data):
compiler = cnp.Compiler(self.cnp_colmean, {"data": "encrypted"})
circuit = compiler.compile(inputset)
data = data.astype('uint8')
circuit.keygen()
public_args = circuit.encrypt(data)
encrypted_colmean = circuit.run(public_args)
return encrypted_colmean

def getEnc_subtraction(self, vec1, vec2):
compiler = cnp.Compiler(self.cnp_sub, {"value1": "encrypted", "value2": "encrypted"})
inputset = [
((np.random.randint(0, 30, size=(2, 2)) for _ in range(10)), (np.random.randint(0, 30) for _ in range(10)))
]
circuit = compiler.compile(inputset)
sample = [
(vec1, vec2),
]
# sample = sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)
compiler = cnp.Compiler(self.cnp_add, {"value1": "encrypted", "value2": "encrypted"})
inputset = [
((np.random.randint(0, 30, size=(2, 2)) for _ in range(10)), (np.random.randint(0, 30) for _ in range(10)))
]
circuit = compiler.compile(inputset)
sample = [
(vec1, vec2),
]
# sample = sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)

def getEnc_square(self, vec1):
compiler = cnp.Compiler(self.cnp_square(), {"value1": "encrypted"})
inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = compiler.compile(inputset)
sample = [vec1]
# sample = sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)
encrypted_square = circuit.run(public_args)
return encrypted_square

def getEnc_msr(self, squared_residues):
compiler = cnp.Compiler(self.cnp_datamean, {"data": "encrypted"})
inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = compiler.compile(inputset)
sample = squared_residues
sample = sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)
encrypted_datamean = circuit.run(public_args)
return encrypted_datamean

def getEnc_rowmsr(self, squared_residues):
compiler = cnp.Compiler(self.cnp_rowmean, {"data": "encrypted"})
inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = compiler.compile(inputset)
sample = squared_residues
sample= sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)
encrypted_rowmean = circuit.run(public_args)
return encrypted_rowmean

def getEnc_colmsr(self, squared_residues):
compiler = cnp.Compiler(self.cnp_colmean, {"data": "encrypted"})
inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = compiler.compile(inputset)
sample = squared_residues
sample = sample.astype(int)
circuit.keygen()
public_args = circuit.encrypt(sample)
encrypted_colmean = circuit.run(public_args)
return encrypted_colmean

def _calculate_msr(self, data, rows, cols):
"""Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""

sub_data = data[rows][:, cols]

inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]

data_mean = self.getEnc_mean(inputset, sub_data)

row_means = self.getEnc_rowmean(inputset, sub_data)

col_means = self.getEnc_colmean(inputset, sub_data)

row_means = row_means.reshape((sub_data.shape[0], 1))

residues_p1 = self.getEnc_subtraction(sub_data, row_means)
residues =  self.getEnc_subtraction(residues_p1, residues_p2)

squared_residues = self.getEnc_square(residues)

msr = self.getEnc_msr(squared_residues)

row_msr = self.getEnc_rowmsr(squared_residues)

col_msr = self.getEnc_colmsr(squared_residues)

return msr, row_msr, col_msr

``````

Hi @ShVS,

Currently, with concrete-numpy, you cannot take the output of a circuit and give it to another circuit.

In your code, I see you are trying to do:

``````data_mean = self.getEnc_mean(inputset, sub_data)
row_means = self.getEnc_rowmean(inputset, sub_data)
col_means = self.getEnc_colmean(inputset, sub_data)

row_means = row_means.reshape((sub_data.shape[0], 1))

residues_p1 = self.getEnc_subtraction(sub_data, row_means)
residues =  self.getEnc_subtraction(residues_p1, residues_p2)
``````

Which means you are compiling a function, evaluating it on some data, and then you try to use the encrypted result of that evaluation in another circuit.

Like I said, this is not possible. However, you can create one huge circuit that does all of these operations in one go:

``````import concrete.numpy as cnp
import numpy as np

@cnp.compiler({"data": "encrypted"})
def calculate(data):
data_mean = np.sum(data) // data.size
row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
col_means = np.sum(data, axis=0) // data.shape[0]

residues_p1 = data + (-row_means)
residues_p2 = col_means + data_mean
residues = residues_p1 + (-residues_p2)

squared_residues = residues ** 2

msr = np.sum(squared_residues) // squared_residues.size
row_msr = np.sum(squared_residues, axis=1, keepdims=True) // squared_residues.shape[1]
col_msr = np.sum(squared_residues, axis=0) // squared_residues.shape[0]

return msr, row_msr, col_msr

inputset = [np.random.randint(0, 10, size=(2, 2)) for _ in range(10)]
circuit = calculate.compile(inputset)

sample = [[3, 2], [5, 1]]
print(circuit.encrypt_run_decrypt(sample))
``````

Right now, the only problem with this code is that we don’t support multiple return values. To work around it, you can create and compile 3 different functions for each return value:

``````import concrete.numpy as cnp
import numpy as np

class EncryptedMsrCalculator:
msr_circuit: cnp.Circuit
row_msr_circuit: cnp.Circuit
column_msr_circuit: cnp.Circuit

def __init__(self, inputset):
@cnp.compiler({"data": "encrypted"})
def msr_calculator(data):
data_mean = np.sum(data) // data.size
row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
col_means = np.sum(data, axis=0) // data.shape[0]

residues_p1 = data + (-row_means)
residues_p2 = col_means + data_mean
residues = residues_p1 + (-residues_p2)

squared_residues = residues ** 2
return np.sum(squared_residues) // squared_residues.size

self.msr_circuit = msr_calculator.compile(inputset)

@cnp.compiler({"data": "encrypted"})
def row_msr_calculator(data):
data_mean = np.sum(data) // data.size
row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
col_means = np.sum(data, axis=0) // data.shape[0]

residues_p1 = data + (-row_means)
residues_p2 = col_means + data_mean
residues = residues_p1 + (-residues_p2)

squared_residues = residues ** 2
return np.sum(squared_residues, axis=1) // squared_residues.shape[1]

self.row_msr_circuit = row_msr_calculator.compile(inputset)

@cnp.compiler({"data": "encrypted"})
def column_msr_calculator(data):
data_mean = np.sum(data) // data.size
row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
col_means = np.sum(data, axis=0) // data.shape[0]

residues_p1 = data + (-row_means)
residues_p2 = col_means + data_mean
residues = residues_p1 + (-residues_p2)

squared_residues = residues ** 2
return np.sum(squared_residues, axis=0) // squared_residues.shape[0]

self.column_msr_circuit = column_msr_calculator.compile(inputset)

def evaluate(self, sample):
return (
self.msr_circuit.encrypt_run_decrypt(sample),
self.row_msr_circuit.encrypt_run_decrypt(sample),
self.column_msr_circuit.encrypt_run_decrypt(sample),
)

inputset = [np.random.randint(0, 5, size=(2, 2)) for _ in range(10)]
calculator = EncryptedMsrCalculator(inputset)

sample = [[3, 2], [4, 1]]
print(calculator.evaluate(sample))
``````

In the future, you’ll be able to return multiple values from a single circuit.

Let me know if you have more questions!
Umut

1 Like

I sincerely appreciate your help and comprehensive explanation by giving an example for calculating mean values.

Thank you!
Shokofeh

1 Like

Let us know if you have any other questions!

Umut

Thanks a lot, it’s working on very small data set but my case is on large volume of gene expression data sets; I will create other relevant issues later on

Thank you!
Shokofeh

Thank you for helping me find the mean value. I want to ask for one practical approach as you said before for working with large data set (300,50). Here I bring a short example of how possibly it might work for a matrix whose dimensions are changing in each iteration.

``````@cnp.compiler({"data": "encrypted"})
def msr_calculator(data):
if data.size <= 2:
return mean_cal(data)

gcd = math.gcd(data.shape[0], data.shape[1])
new_x_axis = data.shape[0] // gcd
new_y_axis = data.shape[1] // gcd
new_dis = data.reshape((new_x_axis, new_y_axis))
mean_cal(new_dis)
return msr_calculator(new_dis)

@cnp.compiler({"data": "encrypted"})
def mean_cal(data):
data_mean = np.sum(data) // data.size
row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
col_means = np.sum(data, axis=0) // data.shape[0]

residues_p1 = data + (-row_means)
residues_p2 = col_means + data_mean
residues = residues_p1 + (-residues_p2)

squared_residues = residues ** 2
return np.sum(squared_residues) // squared_residues.size

inputset = [np.random.randint(0, 30, size=(300,50)) for _ in range(10)]

self.msr_circuit = msr_calculator.compile(
inputset,
cnp.Configuration(
dump_artifacts_on_unexpected_failures=False,
enable_unsafe_features=True,
use_insecure_key_cache=True,
insecure_key_cache_location=".keys",
),
)

``````

Thank you again for your brilliant and outstanding solutions and there’s no need to answer before weekend

Shokofeh

1 Like

Let me know if you have more questions.

Umut

1 Like

Thank you! my question in the previous comment was about managing larger dataset and at the same time considering bit width limit. I would be happy to provide more detailed information if needed.

Oh I haven’t realized it was a question I’ll provide a generic implementation sometime this week

1 Like

Hey @ShVS,

Here is a generic implementation of calculating lots of different means:

``````import concrete.numpy as cnp
import numpy as np

def smallest_prime_divisor(n):
if n % 2 == 0:
return 2

for i in range(3, int(np.sqrt(n)) + 1):
if n % i == 0:
return i

return n

def mean_of_vector(x):
assert x.size != 0
if x.size == 1:
return x[0]

group_size = smallest_prime_divisor(x.size)
if x.size == group_size:
return np.round(np.sum(x) / x.size).astype(np.int64)

groups = []
for i in range(x.size // group_size):
start = i * group_size
end = start + group_size
groups.append(x[start:end])

mean_of_groups = []
for group in groups:
mean_of_groups.append(np.round(np.sum(group) / group_size).astype(np.int64))

return mean_of_vector(cnp.array(mean_of_groups))

def mean_of_matrix(x):
return mean_of_vector(x.flatten())

def mean_of_rows_of_matrix(x):
means = []
for i in range(x.shape[0]):
means.append(mean_of_vector(x[i]))
return cnp.array(means)

def mean_of_columns_of_matrix(x):
means = []
for i in range(x.shape[1]):
means.append(mean_of_vector(x[:, i]))
return cnp.array(means)

@cnp.compiler({"x": "encrypted"})
def function(x):
return mean_of_vector(x)  # you can utilize any of the functions above

inputset = [
np.random.randint(0, 50, size=(100,))
for _ in range(10)
]

artifacts = cnp.DebugArtifacts(".artifacts")
circuit = function.compile(
inputset,
cnp.Configuration(
enable_unsafe_features=True,
use_insecure_key_cache=True,
insecure_key_cache_location=".keys",
),
artifacts,
)
artifacts.export()

for i in range(10):
sample = np.random.randint(0, 50, size=(100,))
print(np.mean(sample), "became", circuit.encrypt_run_decrypt(sample))
``````

An example run outputs:

``````24.94 became 25
24.71 became 25
22.97 became 23
23.55 became 23
25.25 became 25
25.74 became 26
26.28 became 26
27.11 became 27
26.08 became 26
25.97 became 26
``````

Keep in mind that key generation is taking a bit long with this example

Also, this is not a perfect solution, the biggest obstacle is, it’s using prime divisors for grouping, and if you have 31 or 62 elements for example, it’ll end up with adding up 31 numbers, so you’ll encounter bit-width issues.

This situation can be improved with some compromises, you can do the calculation with the first 30 and ignore the last one for example, or you can include the last one in one of the small 2-sums.

Since those approaches are for specific situations, I haven’t implemented anything like it above. To play around with it, just modify `mean_of_vector` function above.

Hope this helps
Umut

1 Like

Thank you so much for the whole explanation
I tried to integrate the proposed solution with the code but here are some problems I faced:

• Still the main obstacle is squaring (`squared_residues = residues ** 2` ), should be other ways around except quantization?

• Previously we get a mean according to the axis=1 for rows and 0 for columns. I wanted to make these changes in the last piece of code `mean_of_rows_of_matrix` and `mean_of_columns_of_matrix` but resulting in bound error.

• and lastly for computing means of rows, keepdims was set to True, the similar `np.sum` function gives an error: `ValueError: Encrypted arrays can only be created from scalars`

Many thanks,
Shokofeh

Hi @ShVS,

You’re welcome

• I didn’t understand what you mean by `should be other ways around`, could you elaborate further.
• This implementation doesn’t need to use the axis, as row/column extraction is done manually. `mean_of_columns_of_matrix(x)` has the same semantics as `np.mean(x, axis=0)`, and `mean_of_rows_of_matrix(x)` has the same semantics as `np.mean(x, axis=1)`.
• For the last error, it’s because `cnp.array(...)` extension doesn’t support creating arrays from other arrays. You can use `np.concatenate(...)` instead, but I don’t think it’s necessary for your use case, as you’ll see below

Lastly, if you want to have a similar API to keepdims, you can modify the functions above as:

``````def mean_of_rows_of_matrix(x, keepdims=False):
means = []
for i in range(x.shape[0]):
mean = mean_of_vector(x[i])
means.append(mean if not keepdims else [mean])
return cnp.array(means)

def mean_of_columns_of_matrix(x, keepdims=False):
means = []
for i in range(x.shape[1]):
mean = mean_of_vector(x[:, i])
means.append(mean)
return cnp.array(means if not keepdims else [means])
``````

Or you can reshape after you calculate means:

``````row_means_without_keeping_dims = mean_of_rows_of_matrix(x)
row_means_with_keeping_dims = row_means_without_keeping_dims.reshape((-1, 1))
``````

Which corresponds to:

``````np.mean(x, axis=1).reshape((-1, 1)) == np.mean(x, axis=1, keepdims=True)
``````

Hope this helps

Let me know if you have any other questions!
Umut

1 Like

Regarding the first point, what I mean is that getting a mean value can be solved by grouping. Another obstacle then is squaring. Earlier we had `squared_residues = residues ** 2` which increases the bit width. For maintaining the permitted bits here, is there any possible solutions?