question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Observing difference in outputs from decoder with IO bindings.

See original GitHub issue

Hi @Ki6an Was trying to implement IO bindings for the decoder part of the model. Used the same code from your repo to convert the model to ONNX. After loading the model and making predictions using the decoder session directly the output appears to be fine but with inputs binded the result is coming to be different.

Below is the code for the IO bindings:

def dec_pred_with_io_bindings(input_ids, attention_mask, encoder_output, past_key_values_dict,dec_session):
  dec_io_binding = dec_session.io_binding()
  dec_io_binding.bind_input(name="input_ids",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.longlong,
                          shape=list(input_ids.shape),
                          buffer_ptr=input_ids.data_ptr())
  dec_io_binding.bind_input(name="encoder_attention_mask",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.longlong,
                          shape=list(attention_mask.shape),
                          buffer_ptr=attention_mask.data_ptr())
                        
  dec_io_binding.bind_input(name="encoder_hidden_states",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.float32,
                          shape=list(encoder_output.shape),
                          buffer_ptr=encoder_output.data_ptr())
  

  for key,val in past_key_values_dict.items():
    dec_io_binding.bind_input(name=key,
                                      device_type="cuda",
                                      device_id=0,
                                      element_type=np.float32,
                                      shape=list(val.shape),
                                      buffer_ptr=val.data_ptr())
  
  #Bind outputs.
  for arg in self.decoder.get_outputs():
    dec_io_binding.bind_output(arg.name, "cuda")
    
  dec_session.run_with_iobinding(dec_io_binding)
  ort_output = dec_io_binding.get_outputs()

  logits=ort_output[0]

  list_pkv = tuple(torch.from_numpy(x.numpy()).cuda() for x in ort_output[1:])

  # creates a tuple of tuples of shape 6x4 from the above tuple
  out_past_key_values = tuple(
      list_pkv[i : i + 4] for i in range(0, len(list_pkv), 4)
  )


  return torch.from_numpy(logits.numpy()).cuda(),out_past_key_values

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
VikasOjha666commented, Apr 5, 2022

@Ki6an Thanks for your time and the answer. By the way binding ORT values instead of tensors also solves the issue.But this is clearly efficient.

0reactions
Ki6ancommented, Apr 5, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

Observing difference in outputs from decoder with IO bindings.
I am trying to implement IO bindings for T5 with past_key_values as I observed that the model without past_key_values was becoming slow for ......
Read more >
Binary Decoder - Electronics Tutorials
A Binary Decoder converts coded inputs into coded outputs, where the input and output codes are different and decoders are available to “decode”...
Read more >
A Simple Construction of iO for Turing Machines⋆
The decoding procedure evaluates this obfuscation on every gate index to obtain the garbled tables corresponding to every gate and then evaluates the...
Read more >
Ultimate Guide to Audio Output - Headphonesty
Learn all about analog and digital audio output that you can find on TV and DAC/Amps in this ultimate guide.
Read more >
Keyboard Control - mpv.io
The solution is to feed a previous packet to the decoder each time, and then discard the output. This option controls how many...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found