Let’s take QGPT2Evaluate.ipynb, including Multi-Head Attention GPT-2 Model, as an example.
I have two methods to attempt analyze the computational complexity of FHE: quantitatively analyze the actual execution time, and qualitatively evaluate the computational complexity in theory(my doubt)
.
1. Quantitatively analyze the actual execution time
1.1 Execute it without FHE
proj_12_heads_qgpt2.set_fhe_mode(fhe="disable")
t1 = time.time()
output_logits_clear = proj_12_heads_qgpt2(input_ids).logits
print(f"Time of inference without Multi-Head Attention Model(fhe=\"disable\"): {(time.time() - t1):.2f}")
1.2 Execute it with FHE
circuit_12_heads = proj_12_heads_qgpt2.compile(input_ids)
proj_12_heads_qgpt2.set_fhe_mode(fhe="execute")
t2 = time.time()
output_logits_executed = proj_12_heads_qgpt2(input_ids).logits
print(f"Time of inference with Multi-Head Attention Model(fhe=\"execute\"): {(time.time() - t2):.2f}")
Therefore, we can qualitatively analyze the complexity of FHE by comparing the two times above.
However, “Execute it with FHE” may take long time to run.
So, I’m curious if there is a quantitative theory to analyze the computational complexity of this Multi-Head Attention Model in QGPT2Evaluate.ipynb.
2. How to qualitatively evaluate the computational complexity of FHE in theory?
Thank you very much. I would greatly appreciate any information you can provide.