Concrete Numpy Application in ML

ShVS · August 1, 2022, 12:54pm

Hello. I have a project to apply FHE on gene expression data (Numpy array of row and columns –https://arep.med.harvard.edu/biclustering/yeast.matrix) and analyze encrypted input data with Biclustering algorithm (like this sample algorithm: biclustlib/cca.py at master · padilha/biclustlib · GitHub). I have read the documentations of Concrete Numpy but I do not know how I should proceed (e.g., defining functions).
Some parts of the algorithms that should be done homomorphically:

 def _calculate_msr(self, data, rows, cols):
        """Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""
        sub_data = data[rows][:, cols]

        data_mean = np.mean(sub_data)
        row_means = np.mean(sub_data, axis=1)
        col_means = np.mean(sub_data, axis=0)

        residues = sub_data - row_means[:, np.newaxis] - col_means + data_mean
        squared_residues = residues * residues

        msr = np.mean(squared_residues)
        row_msr = np.mean(squared_residues, axis=1)
        col_msr = np.mean(squared_residues, axis=0)

        return msr, row_msr, col_msr

    def _calculate_msr_col_addition(self, data, rows, cols):
        """Calculate the mean squared residues of the columns for the node addition step."""
        sub_data = data[rows][:, cols]
        sub_data_rows = data[rows]

        data_mean = np.mean(sub_data)
        row_means = np.mean(sub_data, axis=1)
        col_means = np.mean(sub_data_rows, axis=0)

        col_residues = sub_data_rows - row_means[:, np.newaxis] - col_means + data_mean
        col_squared_residues = col_residues * col_residues
        col_msr = np.mean(col_squared_residues, axis=0)

        return col_msr

    def _calculate_msr_row_addition(self, data, rows, cols):
        """Calculate the mean squared residues of the rows and of the inverse of the rows for
        the node addition step."""
        sub_data = data[rows][:, cols]
        sub_data_cols = data[:, cols]

        data_mean = np.mean(sub_data)
        row_means = np.mean(sub_data_cols, axis=1)
        col_means = np.mean(sub_data, axis=0)

        row_residues = sub_data_cols - row_means[:, np.newaxis] - col_means + data_mean
        row_squared_residues = row_residues * row_residues
        row_msr = np.mean(row_squared_residues, axis=1)

        inverse_residues = -sub_data_cols + row_means[:, np.newaxis] - col_means + data_mean
        row_inverse_squared_residues = inverse_residues * inverse_residues
        row_inverse_msr = np.mean(row_inverse_squared_residues, axis=1)

        return row_msr, row_inverse_msr

Any help is greatly appreciated!
I would be happy to provide more detailed information if needed.

Thank you!

ayoub · August 1, 2022, 1:48pm

Hello @ShVS,

You can follow these steps to be able to compile and execute your computation using CNP:

Define clearly the variables that your computation requires (which ones should be encrypted), then define a python function with these variables
In this function, you can make calls to already implemented methods and functions to achieve your target computation
Compile your function as described in the README of CNP compiler = hnp.NPFHECompiler(function_name, {"arg1": "encrypted", ...}), replacing arguments by the ones you defined in your function

Some computation might not be supported, and may require updating the original algorithm, by using different operations, while keeping the same target result. Feel free to open other discussions specifically on how to go about some unsupported operations.

ShVS · August 5, 2022, 12:55pm

Hello,

Thank you for your helpful information. I tried to start writing my first function (for instance finding the shape of a two-dimensional data).

data should be encrypted
The result of function have to be returned homomorphically

class concretefun: (#want to use CNP) 

    def __init__(self, input_data):
        self.data = input_data

    def shape(self):
        return self.data.shape

    def compile_shape(self):
        compiler = hnp.NPFHECompiler(self.shape, {"x": "encrypted", "y": "encrypted"})
        circuit = compiler.compile_on_inputset(self.data)
        return circuit

class secca: (#want to use encrypted shape of data)

 cnp = concretefun(data)
 num_rows, num_cols = cnp.compile_shape()

Terminated with this error “AttributeError: ‘int’ object has no attribute ‘traced_computation’”.
I am not sure that I have understood the case how to use CNP.
Is there any similar implementation in machine learning that you suggest me reading before diving deep into coding?

Thank you in advance!

ayoub · August 5, 2022, 1:41pm

Hello,

I don’t think I do understand the usecase. Even if data is encrypted, the shape is a public information, it doesn’t get encrypted. The values of the inputs to the function can be encrypted, and used but not their shape. If you want to compute on an encrypted tensor’s shape, then you would have to consider it as an input to the computation, and pass it encrypted.

We do have another high level package that is more focused on machine learning if that’s what you are looking for: GitHub - zama-ai/concrete-ml: Concrete-ML is an open-source set of tools which aims to simplify the use of fully homomorphic encryption (FHE) for data scientists. Particular care was given to the simplicity of our Python package in order to make it usable by any data scientist, even those without prior cryptography knowledge.

ShVS · August 5, 2022, 2:05pm

I have come across concrete-numpy as my the unsupervised machine learning algorithm (i.e, biclustering algorithm) is not part of concrete-ml. so that, I could implement my own code from scratch with concrete-numpy.
More precisely, What I generally mean by the block of code was to show how I can return the encrypted result (is it the result of circuit? ) and what I mean similar implementation in ML was by concrete-numpy and concrete-ml).

Thank you!

ayoub · August 5, 2022, 2:27pm

I would advise you to start from the simple example in the README, and trying to update the compiled function to your target computation, while being aware of few additional points:

The inputset is a list of inputs that you should be able to unpack and call the function with.
{"x": "encrypted", "y": "encrypted"} is the description of the function’s inputs, so update that according to your function.
Returned values aren’t just encrypted if they represent a plaintext value, but if the return value is a result of a computation on encrypted data, then it would be encrypted