Calculating Mean Value of Two-Dimensional Matrix

ShVS · August 8, 2022, 9:58am

Hello. I have already created a topic for my use case. I need to calculate the mean value of a two-dimensional matrix (like data_mean in the sample code below). As a NumPy operation for returning mean value is not supported what I have to do for this issue.

 def _calculate_msr(self, data, rows, cols):
        """Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""
        sub_data = data[rows][:, cols]

        data_mean = np.mean(sub_data)
        row_means = np.mean(sub_data, axis=1)
        col_means = np.mean(sub_data, axis=0)

        residues = sub_data - row_means[:, np.newaxis] - col_means + data_mean
        squared_residues = residues * residues

        msr = np.mean(squared_residues)
        row_msr = np.mean(squared_residues, axis=1)
        col_msr = np.mean(squared_residues, axis=0)

        return msr, row_msr, col_msr

Any help is greatly appreciated!
I would be happy to provide more detailed information if needed.

Thank you!

umutsahin · August 9, 2022, 9:11am

Hello,

The new version of Concrete Numpy will support np.sum. So, you’ll be able to do:

np.sum(x) // x.size for data_mean
np.sum(x, axis=1) // x.shape[1] for row_means
np.sum(x, axis=0) // x.shape[0] for col_means

Other than that, you might need to do squared_residues = residues ** 2 as we don’t support encrypted multiplications yet.
Lastly, we don’t support np.newaxis in indexing so you might need to use reshape there.

The new version should be available in the upcoming months.

Let us know if you have any other questions!
Thanks.

ShVS · August 9, 2022, 2:23pm

Thank you for your response. I tried to first modify a simple function written in README of CNP (current version 0.5.0) for computing mean value, I faced these following issues:

np.sum(x) // x.size terminated by this error AttributeError: 'NPTracer' object has no attribute 'size'
inputset in my case is a two dimensional numpy array and I need a similar input for the function but results in RuntimeError: argument #0has not the expected number of dimension, got 2 expected 1 (encrypting circuit on the input) and TypeError: 'NoneType' object is not iterable (compiling inputset).

inputset: np.random.randint(0,600, size=(2884,17))
input: sub_data = data[rows][:, cols] where |rows| < 2884 & |cols| < 17

As the resulting encrypted data_mean should be used later for further computation, how it would be possible to do so (maybe using encrypted_result:
encrypted_result = circuit.run(public_args))?

Thank you in advance!

ShVS · August 30, 2022, 2:49pm

Hello, I would appreciate someone answer the above issue.

umutsahin · August 31, 2022, 2:18pm

Hello @ShVS, sorry for keeping you waiting.

With the release of concrete-numpy v0.6.0 today, you can use np.sum so, you should be able to do what you wanted. Here is an example:

import concrete.numpy as cnp
import numpy as np

@cnp.compiler({"x": "encrypted"})
def function(x):
    return np.sum(x) // x.size

inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
circuit = function.compile(inputset)

sample = np.array([[10, 20], [4, 6]])
assert circuit.encrypt_run_decrypt(sample) == 10

Let us know if it’s working as you expect in v0.6.0.
Umut

ShVS · September 6, 2022, 9:51am

Hi and thank you for the response. Hopefully the sample code here is working with the latest version of concrete-numpy.

According to my original issue, still I have some relevant issues:

you mentioned that np.newaxis in indexing is not supported so I have to use reshape here but when I apply reshape on encrypted vector:

AttributeError: 'PublicResult' object has no attribute 'reshape'

Even for squaring like squared_residues = residues ** 2 I got error:

TypeError: unsupported operand type(s) for ** or pow(): 'PublicResult' and 'int'

and I would appreciate it if you could possibly suggest me how I can proceed with addition and subtraction here as input (sample) / inputsets are making troubles and no appropriate results can be achieved so far.

I tried to rewrite the above mentioned code into CNP like this:

  def cnp_datamean(self, data):
        return np.sum(data) // data.size

    def cnp_rowmean(self, data):
        return np.sum(data, axis=1) // data.shape[1]

    def cnp_colmean(self, data):
        return np.sum(data, axis=0) // data.shape[0]

    def cnp_add(self, value1, value2):
        return value1 + value2

    def cnp_square(self, value1):
        return value1 ** 2

    def cnp_sub(self, value1, value2):
        return value1 - value2

    def getEnc_mean(self, inputset, data):
        compiler = cnp.Compiler(self.cnp_datamean, {"data": "encrypted"})
        circuit = compiler.compile(inputset)
        data = data.astype('uint8')
        print(circuit.encrypt_run_decrypt(data))
        circuit.keygen()
        public_args = circuit.encrypt(data)
        encrypted_datamean = circuit.run(public_args)
        return encrypted_datamean

    def getEnc_rowmean(self, inputset, data):
        compiler = cnp.Compiler(self.cnp_rowmean, {"data": "encrypted"})
        circuit = compiler.compile(inputset)
        data = data.astype('uint8')
        circuit.keygen()
        public_args = circuit.encrypt(data)
        encrypted_rowmean = circuit.run(public_args)
        return encrypted_rowmean

    def getEnc_colmean(self, inputset, data):
        compiler = cnp.Compiler(self.cnp_colmean, {"data": "encrypted"})
        circuit = compiler.compile(inputset)
        data = data.astype('uint8')
        circuit.keygen()
        public_args = circuit.encrypt(data)
        encrypted_colmean = circuit.run(public_args)
        return encrypted_colmean

    def getEnc_subtraction(self, vec1, vec2):
        compiler = cnp.Compiler(self.cnp_sub, {"value1": "encrypted", "value2": "encrypted"})
        inputset = [
            ((np.random.randint(0, 30, size=(2, 2)) for _ in range(10)), (np.random.randint(0, 30) for _ in range(10)))
        ]
        circuit = compiler.compile(inputset)
        sample = [
            (vec1, vec2),
        ]
        # sample = sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_addition = circuit.run(public_args)
        return encrypted_addition
def getEnc_addition(self, vec1, vec2):
        compiler = cnp.Compiler(self.cnp_add, {"value1": "encrypted", "value2": "encrypted"})
        inputset = [
            ((np.random.randint(0, 30, size=(2, 2)) for _ in range(10)), (np.random.randint(0, 30) for _ in range(10)))
        ]
        circuit = compiler.compile(inputset)
        sample = [
            (vec1, vec2),
        ]
        # sample = sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_addition = circuit.run(public_args)
        return encrypted_addition

    def getEnc_square(self, vec1):
        compiler = cnp.Compiler(self.cnp_square(), {"value1": "encrypted"})
        inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
        circuit = compiler.compile(inputset)
        sample = [vec1]
        # sample = sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_square = circuit.run(public_args)
        return encrypted_square

    def getEnc_msr(self, squared_residues):
        compiler = cnp.Compiler(self.cnp_datamean, {"data": "encrypted"})
        inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
        circuit = compiler.compile(inputset)
        sample = squared_residues
        sample = sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_datamean = circuit.run(public_args)
        return encrypted_datamean

    def getEnc_rowmsr(self, squared_residues):
        compiler = cnp.Compiler(self.cnp_rowmean, {"data": "encrypted"})
        inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
        circuit = compiler.compile(inputset)
        sample = squared_residues
        sample= sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_rowmean = circuit.run(public_args)
        return encrypted_rowmean

    def getEnc_colmsr(self, squared_residues):
        compiler = cnp.Compiler(self.cnp_colmean, {"data": "encrypted"})
        inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]
        circuit = compiler.compile(inputset)
        sample = squared_residues
        sample = sample.astype(int)
        circuit.keygen()
        public_args = circuit.encrypt(sample)
        encrypted_colmean = circuit.run(public_args)
        return encrypted_colmean


    def _calculate_msr(self, data, rows, cols):
        """Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""

        sub_data = data[rows][:, cols]

        inputset = [np.random.randint(0, 30, size=(2, 2)) for _ in range(10)]

        data_mean = self.getEnc_mean(inputset, sub_data)

        row_means = self.getEnc_rowmean(inputset, sub_data)
     
        col_means = self.getEnc_colmean(inputset, sub_data)

        row_means = row_means.reshape((sub_data.shape[0], 1))

        residues_p1 = self.getEnc_subtraction(sub_data, row_means)
        residues_p2 = self.getEnc_addition(col_means, data_mean)
        residues =  self.getEnc_subtraction(residues_p1, residues_p2)

        squared_residues = self.getEnc_square(residues)

        msr = self.getEnc_msr(squared_residues)
        
        row_msr = self.getEnc_rowmsr(squared_residues)
        
        col_msr = self.getEnc_colmsr(squared_residues)

        return msr, row_msr, col_msr

umutsahin · September 8, 2022, 7:05am

Hi @ShVS,

Currently, with concrete-numpy, you cannot take the output of a circuit and give it to another circuit.

In your code, I see you are trying to do:

data_mean = self.getEnc_mean(inputset, sub_data)
row_means = self.getEnc_rowmean(inputset, sub_data)
col_means = self.getEnc_colmean(inputset, sub_data)

row_means = row_means.reshape((sub_data.shape[0], 1))

residues_p1 = self.getEnc_subtraction(sub_data, row_means)
residues_p2 = self.getEnc_addition(col_means, data_mean)
residues =  self.getEnc_subtraction(residues_p1, residues_p2)

Which means you are compiling a function, evaluating it on some data, and then you try to use the encrypted result of that evaluation in another circuit.

Like I said, this is not possible. However, you can create one huge circuit that does all of these operations in one go:

import concrete.numpy as cnp
import numpy as np

@cnp.compiler({"data": "encrypted"})
def calculate(data):
    data_mean = np.sum(data) // data.size
    row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
    col_means = np.sum(data, axis=0) // data.shape[0]

    residues_p1 = data + (-row_means)
    residues_p2 = col_means + data_mean
    residues = residues_p1 + (-residues_p2)

    squared_residues = residues ** 2

    msr = np.sum(squared_residues) // squared_residues.size
    row_msr = np.sum(squared_residues, axis=1, keepdims=True) // squared_residues.shape[1]
    col_msr = np.sum(squared_residues, axis=0) // squared_residues.shape[0]

    return msr, row_msr, col_msr

inputset = [np.random.randint(0, 10, size=(2, 2)) for _ in range(10)]
circuit = calculate.compile(inputset)

sample = [[3, 2], [5, 1]]
print(circuit.encrypt_run_decrypt(sample))

Right now, the only problem with this code is that we don’t support multiple return values. To work around it, you can create and compile 3 different functions for each return value:

import concrete.numpy as cnp
import numpy as np

class EncryptedMsrCalculator:
    msr_circuit: cnp.Circuit
    row_msr_circuit: cnp.Circuit
    column_msr_circuit: cnp.Circuit

    def __init__(self, inputset):
        @cnp.compiler({"data": "encrypted"})
        def msr_calculator(data):
            data_mean = np.sum(data) // data.size
            row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
            col_means = np.sum(data, axis=0) // data.shape[0]

            residues_p1 = data + (-row_means)
            residues_p2 = col_means + data_mean
            residues = residues_p1 + (-residues_p2)

            squared_residues = residues ** 2
            return np.sum(squared_residues) // squared_residues.size

        self.msr_circuit = msr_calculator.compile(inputset)

        @cnp.compiler({"data": "encrypted"})
        def row_msr_calculator(data):
            data_mean = np.sum(data) // data.size
            row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
            col_means = np.sum(data, axis=0) // data.shape[0]

            residues_p1 = data + (-row_means)
            residues_p2 = col_means + data_mean
            residues = residues_p1 + (-residues_p2)

            squared_residues = residues ** 2
            return np.sum(squared_residues, axis=1) // squared_residues.shape[1]

        self.row_msr_circuit = row_msr_calculator.compile(inputset)

        @cnp.compiler({"data": "encrypted"})
        def column_msr_calculator(data):
            data_mean = np.sum(data) // data.size
            row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
            col_means = np.sum(data, axis=0) // data.shape[0]

            residues_p1 = data + (-row_means)
            residues_p2 = col_means + data_mean
            residues = residues_p1 + (-residues_p2)

            squared_residues = residues ** 2
            return np.sum(squared_residues, axis=0) // squared_residues.shape[0]

        self.column_msr_circuit = column_msr_calculator.compile(inputset)

    def evaluate(self, sample):
        return (
            self.msr_circuit.encrypt_run_decrypt(sample),
            self.row_msr_circuit.encrypt_run_decrypt(sample),
            self.column_msr_circuit.encrypt_run_decrypt(sample),
        )

inputset = [np.random.randint(0, 5, size=(2, 2)) for _ in range(10)]
calculator = EncryptedMsrCalculator(inputset)

sample = [[3, 2], [4, 1]]
print(calculator.evaluate(sample))

In the future, you’ll be able to return multiple values from a single circuit.

Let me know if you have more questions!
Umut

ShVS · September 9, 2022, 11:26am

Hi @umutsahin

I sincerely appreciate your help and comprehensive explanation by giving an example for calculating mean values.

Thank you!
Shokofeh

umutsahin · September 9, 2022, 11:29am

Glad to help

Let us know if you have any other questions!

Umut

ShVS · September 9, 2022, 11:31am

Thanks a lot, it’s working on very small data set but my case is on large volume of gene expression data sets; I will create other relevant issues later on

Thank you!
Shokofeh

ShVS · September 23, 2022, 3:29pm

Hi @umutsahin

Thank you for helping me find the mean value. I want to ask for one practical approach as you said before for working with large data set (300,50). Here I bring a short example of how possibly it might work for a matrix whose dimensions are changing in each iteration.

@cnp.compiler({"data": "encrypted"})
def msr_calculator(data):
    if data.size <= 2:
        return mean_cal(data)

    gcd = math.gcd(data.shape[0], data.shape[1])
    new_x_axis = data.shape[0] // gcd
    new_y_axis = data.shape[1] // gcd
    new_dis = data.reshape((new_x_axis, new_y_axis))
    mean_cal(new_dis)
    return msr_calculator(new_dis)

@cnp.compiler({"data": "encrypted"})
def mean_cal(data):
    data_mean = np.sum(data) // data.size
    row_means = np.sum(data, axis=1, keepdims=True) // data.shape[1]
    col_means = np.sum(data, axis=0) // data.shape[0]

    residues_p1 = data + (-row_means)
    residues_p2 = col_means + data_mean
    residues = residues_p1 + (-residues_p2)

    squared_residues = residues ** 2
    return np.sum(squared_residues) // squared_residues.size

inputset = [np.random.randint(0, 30, size=(300,50)) for _ in range(10)]

self.msr_circuit = msr_calculator.compile(
    inputset,
    cnp.Configuration(
        dump_artifacts_on_unexpected_failures=False,
        enable_unsafe_features=True,
        use_insecure_key_cache=True,
        insecure_key_cache_location=".keys",
    ),
)

Thank you again for your brilliant and outstanding solutions and there’s no need to answer before weekend

Shokofeh

umutsahin · September 26, 2022, 1:19pm

Glad to help

Let me know if you have more questions.

Umut

ShVS · September 26, 2022, 1:52pm

Thank you! my question in the previous comment was about managing larger dataset and at the same time considering bit width limit. I would be happy to provide more detailed information if needed.

umutsahin · September 26, 2022, 2:34pm

Oh I haven’t realized it was a question I’ll provide a generic implementation sometime this week

umutsahin · September 28, 2022, 8:28am

Hey @ShVS,

Here is a generic implementation of calculating lots of different means:

import concrete.numpy as cnp
import numpy as np

def smallest_prime_divisor(n):
    if n % 2 == 0:
        return 2

    for i in range(3, int(np.sqrt(n)) + 1):
        if n % i == 0:
            return i

    return n

def mean_of_vector(x):
    assert x.size != 0
    if x.size == 1:
        return x[0]

    group_size = smallest_prime_divisor(x.size)
    if x.size == group_size:
        return np.round(np.sum(x) / x.size).astype(np.int64)

    groups = []
    for i in range(x.size // group_size):
        start = i * group_size
        end = start + group_size
        groups.append(x[start:end])

    mean_of_groups = []
    for group in groups:
        mean_of_groups.append(np.round(np.sum(group) / group_size).astype(np.int64))

    return mean_of_vector(cnp.array(mean_of_groups))

def mean_of_matrix(x):
    return mean_of_vector(x.flatten())

def mean_of_rows_of_matrix(x):
    means = []
    for i in range(x.shape[0]):
        means.append(mean_of_vector(x[i]))
    return cnp.array(means)

def mean_of_columns_of_matrix(x):
    means = []
    for i in range(x.shape[1]):
        means.append(mean_of_vector(x[:, i]))
    return cnp.array(means)

@cnp.compiler({"x": "encrypted"})
def function(x):
    return mean_of_vector(x)  # you can utilize any of the functions above

inputset = [
    np.random.randint(0, 50, size=(100,))
    for _ in range(10)
]

artifacts = cnp.DebugArtifacts(".artifacts")
circuit = function.compile(
    inputset,
    cnp.Configuration(
        enable_unsafe_features=True,
        use_insecure_key_cache=True,
        insecure_key_cache_location=".keys",
    ),
    artifacts,
)
artifacts.export()

for i in range(10):
    sample = np.random.randint(0, 50, size=(100,))
    print(np.mean(sample), "became", circuit.encrypt_run_decrypt(sample))

An example run outputs:

24.94 became 25
24.71 became 25
22.97 became 23
23.55 became 23
25.25 became 25
25.74 became 26
26.28 became 26
27.11 became 27
26.08 became 26
25.97 became 26

Keep in mind that key generation is taking a bit long with this example

Also, this is not a perfect solution, the biggest obstacle is, it’s using prime divisors for grouping, and if you have 31 or 62 elements for example, it’ll end up with adding up 31 numbers, so you’ll encounter bit-width issues.

This situation can be improved with some compromises, you can do the calculation with the first 30 and ignore the last one for example, or you can include the last one in one of the small 2-sums.

Since those approaches are for specific situations, I haven’t implemented anything like it above. To play around with it, just modify mean_of_vector function above.

Hope this helps
Umut

ShVS · October 11, 2022, 3:05pm

Hi @umutsahin

Thank you so much for the whole explanation
I tried to integrate the proposed solution with the code but here are some problems I faced:

Still the main obstacle is squaring (squared_residues = residues ** 2 ), should be other ways around except quantization?
Previously we get a mean according to the axis=1 for rows and 0 for columns. I wanted to make these changes in the last piece of code mean_of_rows_of_matrix and mean_of_columns_of_matrix but resulting in bound error.
and lastly for computing means of rows, keepdims was set to True, the similar np.sum function gives an error: ValueError: Encrypted arrays can only be created from scalars

Many thanks,
Shokofeh

umutsahin · October 12, 2022, 10:27am

Hi @ShVS,

You’re welcome

I didn’t understand what you mean by should be other ways around, could you elaborate further.
This implementation doesn’t need to use the axis, as row/column extraction is done manually. mean_of_columns_of_matrix(x) has the same semantics as np.mean(x, axis=0), and mean_of_rows_of_matrix(x) has the same semantics as np.mean(x, axis=1).
For the last error, it’s because cnp.array(...) extension doesn’t support creating arrays from other arrays. You can use np.concatenate(...) instead, but I don’t think it’s necessary for your use case, as you’ll see below

Lastly, if you want to have a similar API to keepdims, you can modify the functions above as:

def mean_of_rows_of_matrix(x, keepdims=False):
    means = []
    for i in range(x.shape[0]):
        mean = mean_of_vector(x[i])
        means.append(mean if not keepdims else [mean])
    return cnp.array(means)

def mean_of_columns_of_matrix(x, keepdims=False):
    means = []
    for i in range(x.shape[1]):
        mean = mean_of_vector(x[:, i])
        means.append(mean)
    return cnp.array(means if not keepdims else [means])

Or you can reshape after you calculate means:

row_means_without_keeping_dims = mean_of_rows_of_matrix(x)
row_means_with_keeping_dims = row_means_without_keeping_dims.reshape((-1, 1))

Which corresponds to:

np.mean(x, axis=1).reshape((-1, 1)) == np.mean(x, axis=1, keepdims=True)

Hope this helps

Let me know if you have any other questions!
Umut

ShVS · October 12, 2022, 4:35pm

Perfect! that was very helpful and answers my questions.
Regarding the first point, what I mean is that getting a mean value can be solved by grouping. Another obstacle then is squaring. Earlier we had squared_residues = residues ** 2 which increases the bit width. For maintaining the permitted bits here, is there any possible solutions?

umutsahin · October 13, 2022, 8:37am

Unfortunately, I don’t think there is a way around this one. However, we will have support for 16-bit unsigned integers with table lookups in the near future, so you might accomplish what you need.

Let me know if you have any other questions
Umut