Observing difference in outputs from decoder with IO bindings.
See original GitHub issueHi @Ki6an Was trying to implement IO bindings for the decoder part of the model. Used the same code from your repo to convert the model to ONNX. After loading the model and making predictions using the decoder session directly the output appears to be fine but with inputs binded the result is coming to be different.
Below is the code for the IO bindings:
def dec_pred_with_io_bindings(input_ids, attention_mask, encoder_output, past_key_values_dict,dec_session):
dec_io_binding = dec_session.io_binding()
dec_io_binding.bind_input(name="input_ids",
device_type="cuda",
device_id=0,
element_type=np.longlong,
shape=list(input_ids.shape),
buffer_ptr=input_ids.data_ptr())
dec_io_binding.bind_input(name="encoder_attention_mask",
device_type="cuda",
device_id=0,
element_type=np.longlong,
shape=list(attention_mask.shape),
buffer_ptr=attention_mask.data_ptr())
dec_io_binding.bind_input(name="encoder_hidden_states",
device_type="cuda",
device_id=0,
element_type=np.float32,
shape=list(encoder_output.shape),
buffer_ptr=encoder_output.data_ptr())
for key,val in past_key_values_dict.items():
dec_io_binding.bind_input(name=key,
device_type="cuda",
device_id=0,
element_type=np.float32,
shape=list(val.shape),
buffer_ptr=val.data_ptr())
#Bind outputs.
for arg in self.decoder.get_outputs():
dec_io_binding.bind_output(arg.name, "cuda")
dec_session.run_with_iobinding(dec_io_binding)
ort_output = dec_io_binding.get_outputs()
logits=ort_output[0]
list_pkv = tuple(torch.from_numpy(x.numpy()).cuda() for x in ort_output[1:])
# creates a tuple of tuples of shape 6x4 from the above tuple
out_past_key_values = tuple(
list_pkv[i : i + 4] for i in range(0, len(list_pkv), 4)
)
return torch.from_numpy(logits.numpy()).cuda(),out_past_key_values
Issue Analytics
- State:
- Created a year ago
- Comments:10 (3 by maintainers)
Top Results From Across the Web
Observing difference in outputs from decoder with IO bindings.
I am trying to implement IO bindings for T5 with past_key_values as I observed that the model without past_key_values was becoming slow for ......
Read more >Binary Decoder - Electronics Tutorials
A Binary Decoder converts coded inputs into coded outputs, where the input and output codes are different and decoders are available to “decode”...
Read more >A Simple Construction of iO for Turing Machines⋆
The decoding procedure evaluates this obfuscation on every gate index to obtain the garbled tables corresponding to every gate and then evaluates the...
Read more >Ultimate Guide to Audio Output - Headphonesty
Learn all about analog and digital audio output that you can find on TV and DAC/Amps in this ultimate guide.
Read more >Keyboard Control - mpv.io
The solution is to feed a previous packet to the decoder each time, and then discard the output. This option controls how many...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Ki6an Thanks for your time and the answer. By the way binding ORT values instead of tensors also solves the issue.But this is clearly efficient.
for io_binding https://github.com/microsoft/onnxruntime/issues/10992 this should fix it