Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Observing difference in outputs from decoder with IO bindings.

See original GitHub issue

Hi @Ki6an Was trying to implement IO bindings for the decoder part of the model. Used the same code from your repo to convert the model to ONNX. After loading the model and making predictions using the decoder session directly the output appears to be fine but with inputs binded the result is coming to be different.

Below is the code for the IO bindings:

def dec_pred_with_io_bindings(input_ids, attention_mask, encoder_output, past_key_values_dict,dec_session):
  dec_io_binding = dec_session.io_binding()
  dec_io_binding.bind_input(name="input_ids",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.longlong,
                          shape=list(input_ids.shape),
                          buffer_ptr=input_ids.data_ptr())
  dec_io_binding.bind_input(name="encoder_attention_mask",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.longlong,
                          shape=list(attention_mask.shape),
                          buffer_ptr=attention_mask.data_ptr())
                        
  dec_io_binding.bind_input(name="encoder_hidden_states",
                          device_type="cuda",
                          device_id=0,
                          element_type=np.float32,
                          shape=list(encoder_output.shape),
                          buffer_ptr=encoder_output.data_ptr())
  

  for key,val in past_key_values_dict.items():
    dec_io_binding.bind_input(name=key,
                                      device_type="cuda",
                                      device_id=0,
                                      element_type=np.float32,
                                      shape=list(val.shape),
                                      buffer_ptr=val.data_ptr())
  
  #Bind outputs.
  for arg in self.decoder.get_outputs():
    dec_io_binding.bind_output(arg.name, "cuda")
    
  dec_session.run_with_iobinding(dec_io_binding)
  ort_output = dec_io_binding.get_outputs()

  logits=ort_output[0]

  list_pkv = tuple(torch.from_numpy(x.numpy()).cuda() for x in ort_output[1:])

  # creates a tuple of tuples of shape 6x4 from the above tuple
  out_past_key_values = tuple(
      list_pkv[i : i + 4] for i in range(0, len(list_pkv), 4)
  )


  return torch.from_numpy(logits.numpy()).cuda(),out_past_key_values

Issue Analytics

State:
Created a year ago
Comments:10 (3 by maintainers)

Top GitHub Comments

1reaction

VikasOjha666commented, Apr 5, 2022

@Ki6an Thanks for your time and the answer. By the way binding ORT values instead of tensors also solves the issue.But this is clearly efficient.

0reactions

Ki6ancommented, Apr 5, 2022

for io_binding https://github.com/microsoft/onnxruntime/issues/10992 this should fix it

Top Results From Across the Web

Observing difference in outputs from decoder with IO bindings.

I am trying to implement IO bindings for T5 with past_key_values as I observed that the model without past_key_values was becoming slow for ......

Binary Decoder - Electronics Tutorials

A Binary Decoder converts coded inputs into coded outputs, where the input and output codes are different and decoders are available to “decode”...

A Simple Construction of iO for Turing Machines⋆

The decoding procedure evaluates this obfuscation on every gate index to obtain the garbled tables corresponding to every gate and then evaluates the...

Ultimate Guide to Audio Output - Headphonesty

Learn all about analog and digital audio output that you can find on TV and DAC/Amps in this ultimate guide.

Keyboard Control - mpv.io

The solution is to feed a previous packet to the decoder each time, and then discard the output. This option controls how many...