Hello. I have a project to apply FHE on gene expression data (Numpy array of row and columns –https://arep.med.harvard.edu/biclustering/yeast.matrix) and analyze encrypted input data with Biclustering algorithm (like this sample algorithm: biclustlib/cca.py at master · padilha/biclustlib · GitHub). I have read the documentations of Concrete Numpy but I do not know how I should proceed (e.g., defining functions).
Some parts of the algorithms that should be done homomorphically:
def _calculate_msr(self, data, rows, cols):
"""Calculate the mean squared residues of the rows, of the columns and of the full data matrix."""
sub_data = data[rows][:, cols]
data_mean = np.mean(sub_data)
row_means = np.mean(sub_data, axis=1)
col_means = np.mean(sub_data, axis=0)
residues = sub_data - row_means[:, np.newaxis] - col_means + data_mean
squared_residues = residues * residues
msr = np.mean(squared_residues)
row_msr = np.mean(squared_residues, axis=1)
col_msr = np.mean(squared_residues, axis=0)
return msr, row_msr, col_msr
def _calculate_msr_col_addition(self, data, rows, cols):
"""Calculate the mean squared residues of the columns for the node addition step."""
sub_data = data[rows][:, cols]
sub_data_rows = data[rows]
data_mean = np.mean(sub_data)
row_means = np.mean(sub_data, axis=1)
col_means = np.mean(sub_data_rows, axis=0)
col_residues = sub_data_rows - row_means[:, np.newaxis] - col_means + data_mean
col_squared_residues = col_residues * col_residues
col_msr = np.mean(col_squared_residues, axis=0)
return col_msr
def _calculate_msr_row_addition(self, data, rows, cols):
"""Calculate the mean squared residues of the rows and of the inverse of the rows for
the node addition step."""
sub_data = data[rows][:, cols]
sub_data_cols = data[:, cols]
data_mean = np.mean(sub_data)
row_means = np.mean(sub_data_cols, axis=1)
col_means = np.mean(sub_data, axis=0)
row_residues = sub_data_cols - row_means[:, np.newaxis] - col_means + data_mean
row_squared_residues = row_residues * row_residues
row_msr = np.mean(row_squared_residues, axis=1)
inverse_residues = -sub_data_cols + row_means[:, np.newaxis] - col_means + data_mean
row_inverse_squared_residues = inverse_residues * inverse_residues
row_inverse_msr = np.mean(row_inverse_squared_residues, axis=1)
return row_msr, row_inverse_msr
Any help is greatly appreciated!
I would be happy to provide more detailed information if needed.
You can follow these steps to be able to compile and execute your computation using CNP:
Define clearly the variables that your computation requires (which ones should be encrypted), then define a python function with these variables
In this function, you can make calls to already implemented methods and functions to achieve your target computation
Compile your function as described in the README of CNP compiler = hnp.NPFHECompiler(function_name, {"arg1": "encrypted", ...}), replacing arguments by the ones you defined in your function
Some computation might not be supported, and may require updating the original algorithm, by using different operations, while keeping the same target result. Feel free to open other discussions specifically on how to go about some unsupported operations.
Thank you for your helpful information. I tried to start writing my first function (for instance finding the shape of a two-dimensional data).
data should be encrypted
The result of function have to be returned homomorphically
class concretefun: (#want to use CNP)
def __init__(self, input_data):
self.data = input_data
def shape(self):
return self.data.shape
def compile_shape(self):
compiler = hnp.NPFHECompiler(self.shape, {"x": "encrypted", "y": "encrypted"})
circuit = compiler.compile_on_inputset(self.data)
return circuit
class secca: (#want to use encrypted shape of data)
cnp = concretefun(data)
num_rows, num_cols = cnp.compile_shape()
Terminated with this error “AttributeError: ‘int’ object has no attribute ‘traced_computation’”.
I am not sure that I have understood the case how to use CNP.
Is there any similar implementation in machine learning that you suggest me reading before diving deep into coding?
I don’t think I do understand the usecase. Even if data is encrypted, the shape is a public information, it doesn’t get encrypted. The values of the inputs to the function can be encrypted, and used but not their shape. If you want to compute on an encrypted tensor’s shape, then you would have to consider it as an input to the computation, and pass it encrypted.
I have come across concrete-numpy as my the unsupervised machine learning algorithm (i.e, biclustering algorithm) is not part of concrete-ml. so that, I could implement my own code from scratch with concrete-numpy.
More precisely, What I generally mean by the block of code was to show how I can return the encrypted result (is it the result of circuit? ) and what I mean similar implementation in ML was by concrete-numpy and concrete-ml).
I would advise you to start from the simple example in the README, and trying to update the compiled function to your target computation, while being aware of few additional points:
The inputset is a list of inputs that you should be able to unpack and call the function with.
{"x": "encrypted", "y": "encrypted"} is the description of the function’s inputs, so update that according to your function.
Returned values aren’t just encrypted if they represent a plaintext value, but if the return value is a result of a computation on encrypted data, then it would be encrypted