Efficiency dropdown

Norrin_Radix · February 13, 2025, 3:40pm

Hi, I had 2 functions that took respectively 330 s and 140 s to run last sunday, and on monday, during the day, their computation time increased steadily until it was multiplied by 4. Those are mostly repetitive calls to mul_lsb for short Int. On monday I was still using tfhe-rs 10.2, and I did not touch the structure of the code. At the end of the day, I moved to 11.2 with no difference. I refactored my code but it happened before I refactored it. Do you have any idea what could be the origin of that performance drop? It’s really annoying because I pretty much liked the first duration. It seems a first set of calculations takes approximately 3s per mul_lsb while a second one takes 4 times that duration, and those operations happen much more frequently ( but they are exactly the same types of operations, so it is quite the mystery ).

IceTDrinker · February 13, 2025, 3:47pm

Hello @Norrin_Radix

Without code and machine specs we cannot say…

One possibility if you are running on a laptop is heat, meaning the machine will slow down because of power and temperature target limits.

Other possibility is you are accumulating a lot of values in RAM and at some point it becomes a lot to handle.

Recommendation in that case : don’t bench on laptops. Restarting may help.

Norrin_Radix · February 13, 2025, 3:52pm

I thought it could be the ram but restarting it did not change anything, I’ll try again now that the big function is refactored.

let v_index_11:Vec<usize> = vec![0,1,2,3,4,5,6,7,8,9,10,11,12,13];
        let v_ap7:Vec<Ciphertext> = v_index_11.par_iter().map(|&i| 
                {       
                        println!("in the process {}",i); 
                        if      i == 0 { sk.mul_lsb(&v_first_prod[0],&sum_07)   }   
                        else if i == 1 { sk.mul_lsb(&v_first_prod[7],&sum_17)   }   
                        else if i == 2 { sk.mul_lsb(&v_first_prod[13],&sum_27)  }    
                        else if i == 3 { sk.mul_lsb(&v_first_prod[22],&sum_37)  }
                        else if i == 4 { sk.mul_lsb(&v_first_prod[25],&sum_47)  }  
                        else if i == 5 { sk.mul_lsb(&v_first_prod[27],&sum_57)  }   
                        else if i == 6 { sk.mul_lsb(&v_first_prod[1],&sum_67)   }   
                        else if i == 7 { sk.mul_lsb(&v_first_prod[8],&sum_77)   }  
                        else if i == 8 { sk.mul_lsb(&v_first_prod[23],&sum_87)  }   
                        else if i == 9 { sk.mul_lsb(&v_first_prod[26],&sum_97)  }  
                        else if i ==10 { sk.mul_lsb(&v_first_prod[2],&sum_107)  }   
                        else if i ==11 { sk.mul_lsb(&v_first_prod[24],&sum_117) }
                        else if i ==12 { sk.mul_lsb(&v_second_prod[1],&sum_127) }
                        else           { sk.mul_lsb(&v_second_prod[21],&sum_137)}  
                }
                        
                ).collect();

The computations that take time are those:

and yes it’s a laptop, a pretty old one, it’s from 2015.

IceTDrinker · February 13, 2025, 3:57pm

@Norrin_Radix

so you could refactor the if/else if with a match in rust (though it does not make much difference potentially)

The fact it’ a laptop means it will heat up, I can tell you that when we develop algorithms we can never benchmark on development laptop because of heat induced performance fluctuations.

Are you sure it is the EXACT same code run between friday and today ? as any change (especially with threading) could induce large performance overhead if misused.

Norrin_Radix · February 13, 2025, 4:04pm

well I don’t recall changing much. I have that drops since monday and I changed it a lot since with no improvement nor degradation. I let the computer cool for the whole night and it was still too long in the morning. So there is a first series of computations of products between ciphertexts, and from that a second series of product of ciphertexts from the first ones. If I had enough threads, this would take only 2 multiplication time, but I have merely a dual core. The second series happen to be much more long for each multiplication and I do not get why.

first set looks like

  let v_index_1:Vec<usize> = vec![0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27];
        let v_first_prod:Vec<tfhe::shortint::Ciphertext> = v_index_1.par_iter().map( |&i|
        {        
                println!("in the process {}",i); 
                if      i < 7 { sk.mul_lsb(&pol1.v[0],&pol1.v[1+i])}
                else if i < 13{ sk.mul_lsb(&pol1.v[1],&pol1.v[2+(i%7)])}
                else if i < 18{ sk.mul_lsb(&pol1.v[2],&pol1.v[3+(i%13)])}
                else if i < 22{ sk.mul_lsb(&pol1.v[3],&pol1.v[4+(i%18)])}
                else if i < 25{ sk.mul_lsb(&pol1.v[4],&pol1.v[5+(i%22)])}
                else if i < 27{ sk.mul_lsb(&pol1.v[5],&pol1.v[6+(i%25)])}
                else          { sk.mul_lsb(&pol1.v[6],&pol1.v[7]) }
        }
        ).collect();
        println!("first_prod computed");
        let v_index_2:Vec<usize> = vec![0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21];
        let v_second_prod:Vec<Ciphertext> = v_index_2.par_iter().map(|&i| 
        {  
                println!("in the process {}",i);      
                if      i == 0 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[7])  }   //a0a1*a1a2
                else if i == 1 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[13]) }   //a0a1*a2a3
                else if i == 2 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[14]) }   //a0a1*a2a4
                else if i == 3 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[15]) }   //a0a1*a2a5
                else if i == 4 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[17]) }   //a0a1*a2a7
                else if i == 5 { sk.mul_lsb(&v_first_prod[0],&v_first_prod[18]) }   //a0a1*a3a4
                else if i == 6 { sk.mul_lsb(&v_first_prod[1],&v_first_prod[13]) }   //a0a2*a2a3
                else if i == 7 { sk.mul_lsb(&v_first_prod[1],&v_first_prod[18]) }   //a0a2*a3a4
                else if i == 8 { sk.mul_lsb(&v_first_prod[7],&v_first_prod[13]) }   //a1a2*a2a3
                else if i == 9 { sk.mul_lsb(&v_first_prod[7],&v_first_prod[18]) }   //a1a2*a3a4
                else if i ==10 { sk.mul_lsb(&v_first_prod[7],&v_first_prod[19]) }   //a1a2*a3a5
                else if i ==11 { sk.mul_lsb(&v_first_prod[8],&v_first_prod[27]) }   //a1a3*a6a7
                else if i ==12 { sk.mul_lsb(&v_first_prod[13],&v_first_prod[21]) }  //a2a3*a3a7
                else if i ==13 { sk.mul_lsb(&v_first_prod[18],&v_first_prod[27]) }  //a3a4*a6a7
                else if i ==14 { sk.mul_lsb(&v_first_prod[19],&v_first_prod[25]) }  //a3a5*a5a6
                else if i ==15 { sk.mul_lsb(&v_first_prod[19],&v_first_prod[26]) }  //a3a5*a5a7
                else if i ==16 { sk.mul_lsb(&v_first_prod[21],&v_first_prod[25]) }  //a3a7*a5a6
                else if i ==17 { sk.mul_lsb(&v_first_prod[22],&v_first_prod[25]) }  //a4a5*a5a6
                else if i ==18 { sk.mul_lsb(&v_first_prod[22],&v_first_prod[26]) }  //a4a5*a5a7
                else if i ==19 { sk.mul_lsb(&v_first_prod[22],&v_first_prod[27]) }  //a4a5*a6a7
                else if i ==20 { sk.mul_lsb(&v_first_prod[23],&v_first_prod[27]) }  //a4a6*a6a7
                else           { sk.mul_lsb(&v_first_prod[25],&v_first_prod[27]) } //a5a6*a6a7
        }
        ).collect();        
        println!("second_prod computed");
        (v_first_prod,v_second_prod)

and the second series looks like

et v_index_4:Vec<usize> = vec![0,1,2,3,4,5,6,7,8,9,10,11,12,13];
        let v_ap1:Vec<Ciphertext> = v_index_4.par_iter().map(|&i| 
                {      
                        println!("in the process {}",i);  
                        if      i == 0 { sk.mul_lsb(&v_first_prod[0],&sum_01) }   
                        else if i == 1 { sk.mul_lsb(&v_first_prod[7],&sum_11) }   
                        else if i == 2 { sk.mul_lsb(&v_first_prod[13],&sum_21) }   
                        else if i == 3 { sk.mul_lsb(&v_first_prod[22],&sum_31) }
                        else if i == 4 { sk.mul_lsb(&v_first_prod[25],&sum_41) }  
                        else if i == 5 { sk.mul_lsb(&v_first_prod[1],&sum_51) }   
                        else if i == 6 { sk.mul_lsb(&v_first_prod[8],&sum_61) }   
                        else if i == 7 { sk.mul_lsb(&v_first_prod[27],&sum_71) }  
                        else if i == 8 { sk.mul_lsb(&v_first_prod[23],&sum_81) }   
                        else if i == 9 { sk.mul_lsb(&v_first_prod[26],&sum_91) }  
                        else if i ==10 { sk.mul_lsb(&v_second_prod[1],&sum_101)}
                        else if i ==11 { sk.mul_lsb(&v_first_prod[2],&sum_111)}   
                        else if i ==12 { sk.mul_lsb(&v_first_prod[24],&sum_121)}   
                        else           { sk.mul_lsb(&v_first_prod[9],&v_second_prod[21])}  
                }
                        
                ).collect();

I do not see the difference

IceTDrinker · February 13, 2025, 4:33pm

for indices you could do

(start..=stop).into_par_iter().map(|&i|...)

only problem with what you bring us, is we can’t do anything about it because we don’t know what the code was before and what changed in the meantime… Make sure to have your laptop plugged into a socket, performance changes as well if the laptop is on battery or plugged