question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add possibility to output attentions from the ONNX models

See original GitHub issue

Feature request

Hello!

I am interested in the possibility of retrieving not only the logits from the ONNX exported model, but also the attentions (which can be set via output_attentions=True in transformers). I have tried setting output_attentions=True in the config.json file of the model repository, but an exception is raised when the onnxruntime InferenceSession is initialized.

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ort_model_optimized.onnx failed:/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pip-req-build-u762x_rp/onnxruntime/core/graph/graph.cc:1236 void onnxruntime::Graph::InitializeStateFromModelFileGraphProto() This is an invalid model. Graph output (attentions) does not exist in the graph

Motivation

I am using attention scores for calculating the importance, but it is currently impossible to extract them in our production environment as we are using ONNX-exported transformers models.

Your contribution

If you could guide me a bit on what to do, I could submit a PR.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jegorkcommented, Aug 2, 2022

Hello @JingyaHuang, after taking a deeper look, it turned out that attention fusion was causing the aforementioned problem, so just using OptimizationConfig(disable_attention=True) worked well enough (although the attention fusion is not working, the remaining fusion optimization operations still work correctly).

Thank you for your time and guidance!

0reactions
JingyaHuangcommented, Aug 2, 2022

Hi @jegork , glad to hear that you have found a way to solve it. By setting optimize_with_onnxruntime_only=True you will turn off the whole fusion optimization, have you tried precising FusionOptions which can be set within your OptimizationConfig to locate which fusion leading to the failure? This might help enabling some other non-fatal fusions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add possibility to output attentions from the ONNX models
Hello! I am interested in the possibility of retrieving not only the logits from the ONNX exported model, but also the attentions (which...
Read more >
Exporting transformers models - Hugging Face
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to the ONNX format.
Read more >
Transformer Model Optimization Tool Overview - onnxruntime
Transformer Model Optimization Tool Overview. ONNX Runtime automatically applies most optimizations while loading a transformer model.
Read more >
Prediction with ONNX Models - OpenVINO™ Documentation
Model Server accepts ONNX models with the same versioning structure. Similar to IR, place each ONNX model file in a separate model version...
Read more >
Implement a new converter using other converters - ONNX
We could then reuse the converter associated to this model. ... We retrieve the input type add tells the output type has the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found