Could somebody clarify how LUT (Lookup Table) rotation for bootstrapping works?

So the first part is correct we have to make sure the padding bit is 0, it requires good knowledge of what you are doing or tools, like our shortint API which will make sure that padding bit stays at 0 (unless you use unchecked ops but that should only happen if you know what you are doing).

For the negative numbers I encourage you to look at 2’s complement encoding of integers : Two's complement - Wikipedia

As for the last question you can manage the modulus as long as the function is negacyclic, meaning you cannot use any function, only negacyclic ones, it may not be suitable for certain use case, because as you don’t know the content of the padding bit you can never rely on the fact it’s 0 and so need to always have negacyclic function that can work with that padding bit being 1 or 0 without being known in advance

So the magic I don’t understand about a negacyclic polynomial is how it can actually handle the padding bit. Of course, if the negacyclic polynomial’s degree is 4, it can encode 4 positive coefficients and 4 negative coefficients, total 8 different numbers. However, the reason why it cannot implement a modulus is because if we rotate the polynomial coefficient passing the boundary between X^0 and X^n, it suddenly becomes negative, which breaks the modulus sequence we want.

In the example of a 4-degree polynomial V, we ideally want its rotations give the following outputs in order for us to implement a modulus 8 domain {-4, -3, -2, -1, 0, 1, 2, 3}:
0 left-shift: The constant coefficient should be 0
1 left-shift: The constant coefficient should be 1
2 left-shift: The constant coefficient should be 2
3 left-shift: The constant coefficient should be 3
4 left-shift: The constant coefficient should be -4 (≡ 4 mod 8)
5 left-shift: The constant coefficient should be -3 (≡ 5 mod 8)
6 left-shift: The constant coefficient should be -2 (≡ 6 mod 8)
7 left-shift: The constant coefficient should be -1 (≡ 7 mod 8)

This way, as we increase the rotation counts infinitely, the output coefficient will wrap around like this: 0, 1, 2, 3,-4, -3, -2, -1, 0, 1, 2, 3, -4, -3, -2, -1, … (These are the correct number-wrapping sequence of modulo 8)

But in reality, such a 4-degree negacyclic polynomial does not exist. In reality, we have this kind of polynomial V:

V = 0 + X + 2X^2 + 3X^3

As we left-shift 5~8 positions, the corresponding coefficient values should be a negative of those at left-shift 0~4. In the above example, if the left-shifts 0~4 gave the coefficient values 0,2,3,4, then the left-shifts 5~8 will give -0, -1, -2, -3. This is not -4,-3,-2,-1, which we wanted to implement the modulus 8. In other words, it would give the following outputs:
0 left-shift: The constant coefficient is 0
1 left-shift: The constant coefficient is 1
2 left-shift: The constant coefficient is 2
3 left-shift: The constant coefficient is 3
4 left-shift: The constant coefficient is -0 (Wrong)
5 left-shift: The constant coefficient is -1 (Wrong)
6 left-shift: The constant coefficient is -2 (Wrong)
7 left-shift: The constant coefficient is -3 (Wrong)

If we continue the rotation, the output coefficient will have the following broken sequence: 0, 1, 2, 3, 0, -1, -2, -3, 0, 1, 2, 3, -0, -1, -2, -3, …
These are not the correct wrapping number sequence of modulus 8.

If we have a different polynomial like this:
V = 1 + 2X + 3X^2 + 4X^3

Rotating this still does not give use the correct modulo 8 sequence:
1, 2, 3, 4, -1, -2, -3, -4, 1, 2, 3, 4, … (This number sequence does not wrap around 8 consecutive numbers)

I wonder if you could give a simple working example of 4-degree negacyclic polynomial that can successfully implement modulo 8. (Or even modulo 4).

This does not mean it’s broken it means the Identity LUT, i.e. [0, 1, 2, 3] is not a negacyclic function :slight_smile:

It says that the definition of a negacyclic polynomial is that a cyclic s hift of its coefficients, where the leading coefficient moves to the constant term and is negated, still results in a polynomial within the same system or ring, adhering to the ring’s operations and rules. Thus, in case of

V(X) = 0 + 1X + 2X^2 + 3X^3 in Z_8[ X ] / (X^4 + 1), where Z_8 = {-4, -3, -2 ,-1, 0, 1, 2, 3}

All of its 2N = 8 possible shifts belong to the same polynomial ring:
[0, 1, 2, 3]
[1, 2, 3, 0]
[2, 3, 0, -1]
[3, 0, -1, -2]
[0, -1, -2, -3]
[-1. -2. -3, 0]
[-2, -3, 0, 1]
[-3, 0, 1, 2]

Also in case of V[ X ] = 1 + 2X + 3X^2 + 4X^3, all the following 2N=8 possible shifts are in the same polynomial ring:
[1, 2, 3, 4]
[2, 3, 4, -1]
[3, 4, -1, -2]
[4, -1, -2, -3]
[-1, -2, -3, -4]
[-2, -3, -4, 1]
[-3, -4, 1, 2]
[-4, 1, 2, 3]

Nonetheless, their constant-coefficient terms do not give our desired modulo-8 rotation sequence of {-4 -3, -2, -1, 0, 1, 2, 3, -4, -3, -2, -1, 0, 1, 2, 3, …}

The first V(X) gives the rotation sequence {0, 1, 2, 3, 0,. -1, -2, -3, …}
The second V(X) gives rotation sequence {1, 2, 3, 4, -1, -2, -3, -4}

What V on earth can correctly encode modulo 8 when it is rotated, given Z_8 = {-4, -3, -2, -1, 0, 1, 2, 3}? Could there be just one example by any chance…?

well the link you shared TFHE Deep Dive - Part IV - Programmable Bootstrapping indicates that the sign function they evaluate ctrl + F sign or negacyclic is negacyclic and returns either q/8 for positive values and -q/8 for negative values

for a given N

any f such that f(x + N) = -f(x) is negacyclic

Yes, the link indeed explains an AND logic gate’s bootstrapping and this makes sense. However, I wonder if there is any one example that works for a number domain instead of a bit-wise binary domain.

The link’s example works because its plaintext has only 2 states: + and -, and it placed the + part in the positive side of the negacyclic polynomial rotation, and - to the other side. That’s why it works so easily. But if the plaintext has more than 2 states, how LUT actually works?

The link’s article explained everything about TFHE assuming that the plaintext m is a number in a large number ring, like Z_8 or even Z_1024, but I failed to come up with a working LUT polynomial that would work for a number domain.

I spent a week to understand how LUT would work for a number domain (even for simple p=4, n=4, q=8 is fine), but failed to come up with a working LUT polynomial example…

See https://eprint.iacr.org/2021/1402.pdf

Ctrl + f negacyclic it breaks things down, a lot of functions are negacyclic and should work for « number domains » they are not necessarily useful and in practice for power of 2 moduli at least in TFHE-rs uses exclusively LUTs with the assumption that the padding bit is 0

Maybe the bit you are missing is needing to map

[q/2; q[ to [-q/2; 0[

Thanks very much, now I understood. In your link, the LUT with programmable bootstrapping is as follows:

V(X) = f(0)X^0 + f(1)X^1 + … + f(n-1)X^{n-1}

, where f(z) is a programmable function.

If we want to use the entire range [0, p-1) for \mu, then f(z) should work as follows:

  • If the rotation count ‘r’ of V(X) is within 0 <= r mod 2n < n, then f(z) returns z
  • If the rotation count ‘r’ of V(X) is within n <= r mod 2n < 2n, then f(z) returns -z (in order to cancel out the sign-flipping side effect of rotating Z(X) ).

Importantly, f(z) should be a homomorphically computable logic (e.g., homomorphic addition, homomorphic multiply), because we are assuming to do blind rotation during bootstrapping. I wonder if my understanding is correct.

===========
My Proposed Alternative Solution
And as a another solution besides using the negacyclic function, I wonder if the following works:

Let’s define a new n-degree function f as follows whose coefficient is 1 for all terms:

f(Z) = 1 + 1X + 1X^2 + … + 1X^{n-1}

Step 1. During bootstrapping, when we rotate V(X) by r positions (where r = \rescaledDelta * m + e), we also rotate f(Z) by r positions.

Step 2. We extract the constant position’s coefficient V(X).

  • If 0 <= r mod 2n < n, then we must have extracted LWE(\delta * m)
  • If n <= r mod 2n < 2n, then we must have extracted LWE(-\delta * m), due to the weird (negacyclic) coefficients behavior we get when rotating V(X).

Step 3. We also extract the constant position’s coefficient f(X).

  • If 0 <= r mod 2n < n, then we must have extracted LWE(1).
  • If n <= r mod 2n < 2n, then we must have extracted LWE(-1).

Step 4. We homomorhically multiply the outputs of Step 2 and Step 3.

  • If 0 <= r mod 2n < n, then we get LWE(\delta m) * LWE (1) = LWE(\delta * m * 1) = LWE(\delta * m)
  • If n <= r mod 2n < 2n, then we get LWE(-\delta m) * LWE (-1) = LWE(-\delta * m * -1) LWE(\delta * m)

Therefore, if we multiply rotated V(X)'s constant-term coefficient with that of f(Z), then the negative signs cancel out when n <= r mod 2n < 2n, thus we are always guaranteed to get LWE(\delta m), which can eliminate the sign-flipping problem.

Do you think my solution works?

Strictly the function can be anything that can be expressed as a look up tavle, the result will have a flipped sign if the padding bit was 1, depending on your use case it is not desirable and in practice again we don’t do that in TFHE-rs

Only issue with your second solution is that you need two PBS one for the function, one for the sign but most importantly the homomorphic multiplication between 2 LWEs is not defined as they both represent an element of the torus and multiplication is not defined between two torus element, there are tricks to compute the multiplication nonetheless but they are expensive and as one of them uses PBSes it just might not work because of the padding bit not being 0

All that to say that we can sometime use the negacyclic property of the PBS as a trick but most of the time, at least for now, we side step the problem and just use the N usable values of the LUT when the padding bit is 0

I see, so using a padding bit is the most practical solution. But if you use a padding, I still wonder how you would deal with negative numbers. Last time you showed a link about two’s complement, but I didn’t get how this can be a solution if we have to do some subtractions that would result in negative values, like:

2 - 3

The result will be -1, which is 11 in the two-bit world (e.g., Z_4). If we assume all data types are unsigned values, then 2 - 3 will be 3, which is again 11 in the two-bit world. If we subtract numbers, there are cases where the MSB will be 1. How do you handle this example’s case?

The link you shared Two's complement - Wikipedia says:

When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.

Thus, if we enforce an application not to use the MSB as 1, this means we will have to give up using negative numbers.

Other available options of using MSB (for both 0 and 1) are implementing a bit-wise logic circuit, or piping the output of LUT to a negacyclic function.

Is my reasoning correct?

well no we can choose our encoding e.g. with a padding bit always set to 0 and 2 bits we can choose to represent negative values in a similar way

000 = 0
001 = 1
010 = -2
011 = -1

q = 4 it’s just a way to see the modular representation in [-q/2; q/2[ instead of [0; q[

and if you use the padding bit, then as said earlier you map [q/2; q[ to [-q/2; 0[

000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1

q = 8

then you need to make sure you are using negacyclic luts properly which respect the encoding

I saw you asked a question here about bivariate lut Which division algorithm does Zama for TFHE?

the way it works is we concatenate things

over 2 bits to compute the AND gate for example, MSB is LHS and LSB is RHS

00 (both inputs false) => 0
01 (lhs input false) => 0
10 (rhs input false) => 0
11(both input true) => 1

then to compute the lut we do

PBS(lhs * 2 + rhs) = PBS(concat(lhs, rhs))

it is just a way to encode a lut for concatenated inputs

this generalizes to bigger inputs over more bits

00 | 11 ← this would be an LHS over 2 bits = 0 concatenated with an RHS over 2 bits = 3

and you would do PBS(lhs * 4 + rhs)

Thanks, your every answer is invaluable, as these pieces of knowledge can be hardly learned from elsewhere. It’ also amazing the you already know all of these flawlessly.

Regarding your explanation on division, you mean each bit is a standalone THFE ciphertext and we must build a logic circuit for binary division like this, right?
image
Is this also what Zama is doing, instead of doing division on a ciphertext as a whole number? Although other techniques like Ralph-Newton can divide a number as a whole, Zama is not using this because this is computationally inefficient?

==============

Regarding the 0 padding bit:
Case 1. if we do not use a negacyclic function, then our ciphertext’s MSB should remain to be zero, meaning that we should not rotate LUT V more than N-1 positions to the left. In this case, as we rotate V to the left by {0, 1, 2, … N-1} positions, the rotated V’s constant term coefficient will be {m_0, … m_1, … m_2, … m_{p/2 - 1} }. Here, we have repetitions between each m_{i} and m_{i+1} pair group, because we encounter for multiple noisy ciphertext states (m_i + e_{1,2,3,…} } that are mapped to the same plaintext m_i. And importantly, we should not rotate V(X) more than N-1 positions, because if we rotate V by {N, N+1, N+2, … 2N-1} positions, then the rotated V’s constant term coefficient will be {-m_0, … -m_1, … -m_2, … -m_{p/2 - 1} }, whose signs are flipped.

Case 2. On the other hand, if we use a negacyclic function, then the ciphertext’s MSB is okay to be 1. This means we are okay to rotate V(X) up to 2N positions to the left. So here I have a question. As we rotate V(X) to the left by {0, 1, … N-1} positions, the rotated V(X)'s constant term coefficients will be, like before, {m_0, … m_1, … m_{p/2 - 1} }. Then, as we continue to {N, N+1, … 2N-1} rotations, this time we will get {m_0, … m_1, … m_{p/2 - 1} }, whose negative signs are cancelled out by the negacyclic function. Note that although V(X) can have total 2N coefficient states, the latter N states are identical with the first N states (as the negacyclic function removed the negative signs). This is a problem, because now we never get the other half of plaintext states: {m_{p/2}, m_{p/2 + 1}, … m_{p-1} }. How do we get these states?

I am sorry for my continued question, but I really need to understand this correctly… I always appreciate you patient and considerate answers.

To be fair I have been worky daily on crypto related implementations for something like 3 years now, makes things easier to learn when you need to get them right and working :slight_smile:

division: we process things bit by bit but we could have several bits in a ciphertext, a look up table can be used to mask and extract the right bit we are interested in even if there are several in a ciphertext block(I would recommend keeping on the other post for it as it’s a whole topic on its own)

Case 1 : yes

Case 2 : the big difficulty that you just discovered is that we need a negacyclic function meaning we really have the freedom for only half of all the possible values, the negacyclic property then forces our hand for the other half but if your computations are compatible with it then you can use the full p not only p/2 which arguably is more efficient as we bootstrap more useful bits than with the padding bit = 0 but on the other hand it’s a small nightmare in itself to use, so most of the time (i.e. for now almost always) we keep the padding bit set to 0, as then the N values we have control over map exactly to the p/2 plaintext space

My pleasure

Thanks, I got the division part. I think I am almost understanding the whole bootstrapping now, but one small uncertain part I am concerned about is the following:

This makes sense if you are talking about programmable bootstrapping of mapping the input to the output in LUT (e.g., 011 mapped to -1).

However, do you also mean that we can represent a whole plaintext number as the above? For example, suppose we have a 4-bit plaintext and we want to compute 2 - 3, each of which will be represented as 0010 (2) and 0011 (3). If we do a binary subtraction of 0010 - 0011 = 1111 (-1), now we have 1 at the MSB. Our 0-padding scheme wanted to ensure that the MSB always stays 0. However, even if we want to encode -1 as 0111, our computer CPUs will compute 0010 - 0011 = 1111, because this is the way how our CPU and machine code are designed to do binary subtraction given the 4-bit data type. Therefore, if our application wants to use negative numbers, it seems that we will inevitably have 1 at the MSB after computations like 2 - 3 (i.e., 0010 - 0011). How can we ensure the MSB to be still 0 after such computation like 2 - 3, or 1 - 4?

==================
p.s. I think the solution is not necessarily enforcing 0 to the MSB, but enforcing the application to rotate the LUT V(X) only at most n-1 positions. This way, the application will read the constant term coefficient in V(X) only one version for each coefficient (i.e., only one-signed version, either + or -), not two versions (i.e., two contradicting signs of + and - for the same coefficient). For example, given q = 16, we can allow the application to use the following 8 plaintext numbers instead of 16 numbers:

1010: -4
1011: -3
1110: -2
1111: -1
0000: 0
0001: 1
0010: 2
0011: 3

Suppose our LUT V(X) is an 8-degree polynomial, and let’s ignore the noise (e) for now (i.e., the noise does not contribute to the ciphertext). Then, using the above 8 numbers, V(X) will be rotated to at most n-1 different positions. Therefore, we can prevent any coefficients from being reused after a sign-flip. However, for example, if our application used one more number as below:

0100: 4

Then, the polynomial V(X) can be rotated maximum 9 positions. But since it has only 8 terms, some term would be reused after its coefficient’s sign gets flipped. Then, we will get a contradicting bootstrapping map for this sign-flipping coefficient. So, we should ensure that the application uses only continous half of p values in the plaintext domain (while the MSB is allowed to be both 1 and 0). If the application abides by this, then we can avoid the negacyclic rotation problem. Is my understanding correct?

(x - 3) mod 4

Power of 2 should work nicely with the modulus

Or you first add the modulus then subtract to keep the proper modulus

(x + 4 - 3) mod 4

I am sorry, but I think I got a little lost… Is your formula (x - 3) mod 4 talking about how to implement this encoding?

You said (x - 3) mod 4, and do we plug in x={0, 1, -2, -1}? Then, we get:
x=0 : -3 mod 4 = 1 (01)
x=1 : -2 mod 4 = 2 (10)
x=-2 : -5 mod 4 = 3 (11)
x=-1 : -4 mod 4 = 0 (00)

Because 3 is in reality -1 with q = 4 so -3 = -(-1) = +1

If you need more help in the crypto side of things consider asking questions in the beginner channel of fhe.org discord : https://discord.fhe.org/

The community forum is more meant for usage of Zama’s products

Cheers

Thanks for the Discord forum which seems to be useful.

While I was working on TFHE implementations with Zama I wanted to get some of my understanding clear about the logic behind it. While I was taking my private note of summary, more questions came up, so I asked:

Thanks for all your clarifications and they were truly great advice and guidance to me.

Glad it was useful to you