I have a question about the integer value range: the documentation says “Integer: Integers are supported within a specific range determined by the encryption scheme’s quantization parameters. Default range is 1 to 15. 0 being used for the NaN. Values outside this range will cause a ValueError to be raised during the pre-processing stage.” how can I change the default values?
Are you looking to increase the range of values that are supported (e.g. allow 1…255 or more) ? Unfortunately that is not easy to do for now. The DataFrame API is in the early stages of development but we’re eager to learn how you would like to use it:
What range would you like to have ? Could you further describe your use case?
Below is a short simulated dataset of financial transactions. Typically, the dataset is more complex. The data is also used to determine fraudulent activities, risks and others. I have used that data dataset for some experiments with concrete ml and have gotten the “ValueError”.
Let me know if you have questions.
action,maxCount,minCount,averageAmount,stdAmount,freq
CASH_IN,1,1,121136.63,0,0.07869
CASH_IN,2,2,101944.59,59537.83,0.03256
CASH_IN,3,3,108129.39,86582.83,0.01928
CASH_IN,4,4,91965.32,85176.53,0.01348
CASH_IN,5,9,87777.76,96453.9,0.04352
CASH_IN,10,99,72111.89,109280.58,0.36453
CASH_IN,100,299,73228.2,125384.24,0.35864
CASH_IN,300,799,90330.96,161652.35,0.08657
CASH_IN,800,1499,120144.18,215191.42,0.00264
CASH_IN,1500,4999,65051.8,101197.32,0.00009
CASH_OUT,1,1,65547.63,0,0.49649
CASH_OUT,2,2,69549.99,35030.76,0.23037
CASH_OUT,3,3,72851,48778.45,0.11437
CASH_OUT,4,4,77646.85,58970.61,0.0594
CASH_OUT,5,9,88133.19,75468.76,0.07386
CASH_OUT,10,99,118107.22,123933.12,0.01678
CASH_OUT,100,299,77875,125699.79,0.00705
CASH_OUT,300,799,81103.09,141223.71,0.00166
CASH_OUT,800,1499,84557.16,158316.02,0.00003
DEBIT,1,1,1976.98,0,0.64197
DEBIT,2,2,1542.23,386.6,0.1638
DEBIT,3,3,1608.57,605.66,0.06916
DEBIT,4,4,1772.27,779.09,0.03653
DEBIT,5,9,2830.88,1399.18,0.05778
DEBIT,10,99,2047.51,1350.27,0.03021
DEBIT,100,299,1392.06,731.55,0.00042
DEBIT,10000,200000,2096.53,11344.42,0.00013
DEPOSIT,10000,200000,3273435.88,55547514.33,1
PAYMENT,1,1,2335.99,0,0.29392
PAYMENT,2,2,2252.28,1160.55,0.15869
PAYMENT,3,3,2196.74,1548.54,0.10713
PAYMENT,4,4,2138.32,1761.19,0.07943
PAYMENT,5,9,2092.43,2018.36,0.30748
PAYMENT,10,99,1942.24,2350.24,0.05306
PAYMENT,100,299,1819.17,3490.51,0.00028
PAYMENT,300,799,898.89,1779.32,0.00002
TRANSFER,1,1,65898.1,0,0.53248
TRANSFER,2,2,76741.79,39507.71,0.19408
TRANSFER,3,3,91898.09,62612.8,0.09332
TRANSFER,4,4,106291.08,80543.74,0.05189
TRANSFER,5,9,153781.46,136010.43,0.08671
TRANSFER,10,99,287313.47,286424.17,0.04112
TRANSFER,100,299,751405.55,912376.26,0.00037
TRANSFER,300,799,1265876.84,1417984.63,0.00003
TRANSFER,800,1499,1326202.6,1215892.27,0.00001
TRANSFER,1500,4999,103783.5,188002.5,0.00001
For now DataFrame supports only supports 4-bit integers and your minCount/maxcount columns require up to 16bits. Furthermore we only support floating point quantization to 4-bits and the floating point values you gave have a dynamic range that may quantize poorly to 4 bits. Still you can make this work:
I would suggest the following workarounds:
- convert the minCount/maxcount columns to floating point so that they use quantization
- do ordinal encoding of the minCount/maxcount values so that you only have 15 possible values. Say representation value 1 corresponds to real values 0-20, 2 ← 20-50, … 15 ← 4000…5000