Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add possibility to output attentions from the ONNX models

See original GitHub issue

Feature request

Hello!

I am interested in the possibility of retrieving not only the logits from the ONNX exported model, but also the attentions (which can be set via output_attentions=True in transformers). I have tried setting output_attentions=True in the config.json file of the model repository, but an exception is raised when the onnxruntime InferenceSession is initialized.

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ort_model_optimized.onnx failed:/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pip-req-build-u762x_rp/onnxruntime/core/graph/graph.cc:1236 void onnxruntime::Graph::InitializeStateFromModelFileGraphProto() This is an invalid model. Graph output (attentions) does not exist in the graph

Motivation

I am using attention scores for calculating the importance, but it is currently impossible to extract them in our production environment as we are using ONNX-exported transformers models.

Your contribution

If you could guide me a bit on what to do, I could submit a PR.

Issue Analytics

State:
Created a year ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

jegorkcommented, Aug 2, 2022

Hello @JingyaHuang, after taking a deeper look, it turned out that attention fusion was causing the aforementioned problem, so just using OptimizationConfig(disable_attention=True) worked well enough (although the attention fusion is not working, the remaining fusion optimization operations still work correctly).

Thank you for your time and guidance!

0reactions

JingyaHuangcommented, Aug 2, 2022

Hi @jegork , glad to hear that you have found a way to solve it. By setting optimize_with_onnxruntime_only=True you will turn off the whole fusion optimization, have you tried precising FusionOptions which can be set within your OptimizationConfig to locate which fusion leading to the failure? This might help enabling some other non-fatal fusions.

Top Results From Across the Web

Add possibility to output attentions from the ONNX models

Hello! I am interested in the possibility of retrieving not only the logits from the ONNX exported model, but also the attentions (which...

Exporting transformers models - Hugging Face

Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to the ONNX format.

Transformer Model Optimization Tool Overview - onnxruntime

Transformer Model Optimization Tool Overview. ONNX Runtime automatically applies most optimizations while loading a transformer model.

Prediction with ONNX Models - OpenVINO™ Documentation

Model Server accepts ONNX models with the same versioning structure. Similar to IR, place each ONNX model file in a separate model version...

Implement a new converter using other converters - ONNX

We could then reuse the converter associated to this model. ... We retrieve the input type add tells the output type has the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Add possibility to output attentions from the ONNX models

Feature request

Motivation

Your contribution

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Inference performance drop 22X on GPU hardware with optimum[onnxruntime-gpu] (compared with transformer)

Optimum's DeBERTa-V2 behavior strange when training with ORT (training hangs or takes impossibly long)