Hello, I’m trying the hybrid model example, but while testing the inference with gpt2 model, I got a deserialize error, the output as follows, and the concrete-ml is installed with docker
Using device: cpu
Number of tokens:
Prompt:
**********
**********
8 tokens in 'Computations on encrypted data can help'
**********
**********
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Computations on encrypted data can Traceback (most recent call last):
File "infer_hybrid_llm_generate.py", line 78, in <module>
output_ids = model.generate(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1479, in generate
return self.greedy_search(
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2340, in greedy_search
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 1074, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 888, in forward
outputs = block(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 390, in forward
attn_outputs = self.attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/concrete/ml/torch/hybrid_model.py", line 269, in forward
y = self.remote_call(x)
File "/usr/local/lib/python3.8/dist-packages/concrete/ml/torch/hybrid_model.py", line 332, in remote_call
decrypted_prediction = client.deserialize_decrypt_dequantize(encrypted_result)[0]
File "/usr/local/lib/python3.8/dist-packages/concrete/ml/deployment/fhe_client_server.py", line 353, in deserialize_decrypt_dequantize
deserialized_decrypted_quantized_result = self.deserialize_decrypt(
File "/usr/local/lib/python3.8/dist-packages/concrete/ml/deployment/fhe_client_server.py", line 329, in deserialize_decrypt
deserialized_encrypted_quantized_result = fhe.Value.deserialize(
File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/value.py", line 47, in deserialize
return Value(NativeValue.deserialize(serialized_data))
File "/usr/local/lib/python3.8/dist-packages/concrete/compiler/value.py", line 68, in deserialize
return Value.wrap(_Value.deserialize(serialized_value))
RuntimeError: Failed to deserialize Value
And the server seems works well and output as follows:
INFO: 127.0.0.1:44578 - "GET /list_shapes HTTP/1.1" 200 OK
INFO: 127.0.0.1:44580 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44582 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44584 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44586 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44588 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44590 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44592 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44594 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44596 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44598 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44600 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44602 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44604 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44606 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44608 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44610 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44612 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44614 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44616 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44618 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44620 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44622 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44624 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44626 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44628 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44630 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44632 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44634 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44636 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44638 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44640 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44642 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44644 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44646 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44648 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44650 - "POST /add_key HTTP/1.1" 200 OK
INFO: 127.0.0.1:44652 - "GET /get_client HTTP/1.1" 200 OK
INFO: 127.0.0.1:44654 - "POST /add_key HTTP/1.1" 200 OK
2024-03-12 07:23:02.771 | INFO | __main__:compute:167 - Reading uploaded data...
2024-03-12 07:23:02.817 | INFO | __main__:compute:170 - Uploaded data read in 0.04540824890136719 seconds
2024-03-12 07:23:02.817 | INFO | concrete.ml.torch.hybrid_model:compute:820 - It took 0.0001571178436279297 seconds to load the key
2024-03-12 07:23:02.882 | INFO | concrete.ml.torch.hybrid_model:compute:826 - It took 0.06443285942077637 seconds to load the circuit
2024-03-12 07:23:09.478 | INFO | concrete.ml.torch.hybrid_model:compute:836 - fhe inference of input of shape (1, 8, 768) took 6.595885515213013
2024-03-12 07:23:09.478 | INFO | concrete.ml.torch.hybrid_model:compute:837 - Results size is 178.87525939941406 Mb
INFO: 127.0.0.1:34118 - "POST /compute HTTP/1.1" 200 OK
I have tried with docker environment zamafhe/concrete-ml:1.4.1 and pip installed in python 3.10, but they both have same errors.