question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inference tensors cannot be saved for backward.

See original GitHub issue

I don’t know if this is a bug. I’m trying to follow the official tutorial using pytorch engine.

Here is my code and exception.

String modelUrl = "https://resources.djl.ai/test-models/traced_distilbert_wikipedia_uncased.zip";
Criteria<NDList, NDList> criteria = Criteria.builder()
    .optApplication(Application.NLP.WORD_EMBEDDING)
    .setTypes(NDList.class, NDList.class)
    .optModelUrls(modelUrl)
    .optProgress(new ProgressBar())
    .build();

ZooModel<NDList, NDList> embedding = criteria.loadModel();
Predictor<NDList, NDList> embedder = embedding.newPredictor();
SequentialBlock classifier = new SequentialBlock()
    .add(
            ndList -> {
                NDArray data = ndList.singletonOrThrow();
                long batchSize = data.getShape().get(0);
                long maxLen = data.getShape().get(1);
                NDList inputs = new NDList();
                inputs.add(data.toType(DataType.INT64, false));
                inputs.add(data.getManager().full(data.getShape(), 1, DataType.INT64));
                inputs.add(data.getManager().arange(maxLen).toType(DataType.INT64, false).broadcast(data.getShape()));
                try {
                    return embedder.predict(inputs);
                } catch (TranslateException e) {
                    throw new RuntimeException(e);
                }
            }
    )
    .add(Linear.builder().setUnits(768).build())
    .add(Activation::relu)
    .add(Dropout.builder().optRate(0.2f).build())
    .add(Linear.builder().setUnits(5).build())
    .addSingleton(nd -> nd.get(":,0"));

Model model = Model.newInstance("review_classification");
model.setBlock(classifier);

DefaultVocabulary vocabulary = DefaultVocabulary.builder()
    .addFromTextFile(embedding.getArtifact("vocab.txt"))
    .optUnknownToken("[UNK]")
    .build();

int maxTokenLen = 64;
int batchSize = 8;
int limit = Integer.MAX_VALUE;

BertFullTokenizer tokenizer = new BertFullTokenizer(vocabulary, true);
CsvDataset awsDataset = getDataset(batchSize, tokenizer, maxTokenLen, limit);
RandomAccessDataset[] datasets = awsDataset.randomSplit(7, 3);
RandomAccessDataset trainDataset = datasets[0];
RandomAccessDataset evalDataset = datasets[1];

SaveModelTrainingListener listener = new SaveModelTrainingListener("build/model");
listener.setSaveModelCallback(
    trainer -> {
        TrainingResult result = trainer.getTrainingResult();
        Model trainerModel = trainer.getModel();
        float acc = result.getValidateEvaluation("Accuracy");
        trainerModel.setProperty("Accuracy", String.format("%.5f", acc));
        trainerModel.setProperty("Loss", String.format("%.5f", result.getValidateLoss()));
    }
);

DefaultTrainingConfig trainingConfig = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
    .addEvaluator(new Accuracy())
    .addTrainingListeners(TrainingListener.Defaults.logging("build/model"))
    .addTrainingListeners(listener);

int epoch = 2;
Trainer trainer = model.newTrainer(trainingConfig);
trainer.setMetrics(new Metrics());
Shape shape = new Shape(batchSize, maxTokenLen);
trainer.initialize(shape);
EasyTrain.fit(trainer, epoch, trainDataset, evalDataset);
System.out.println(trainer.getTrainingResult());
model.save(Paths.get("build/model"), "aws-review-rank");
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of inter-op threads is 6
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 6
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Training on: cpu().
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Load PyTorch Engine Version 1.12.1 in 0.079 ms.
Exception in thread "main" ai.djl.engine.EngineException: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
	at ai.djl.pytorch.jni.PyTorchLibrary.torchNNLinear(Native Method)
	at ai.djl.pytorch.jni.JniUtils.linear(JniUtils.java:1189)
	at ai.djl.pytorch.engine.PtNDArrayEx.linear(PtNDArrayEx.java:390)
	at ai.djl.nn.core.Linear.linear(Linear.java:183)
	at ai.djl.nn.core.Linear.forwardInternal(Linear.java:88)
	at ai.djl.nn.AbstractBaseBlock.forwardInternal(AbstractBaseBlock.java:126)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:91)
	at ai.djl.nn.SequentialBlock.forwardInternal(SequentialBlock.java:209)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:91)
	at ai.djl.training.Trainer.forward(Trainer.java:175)
	at ai.djl.training.EasyTrain.trainSplit(EasyTrain.java:122)
	at ai.djl.training.EasyTrain.trainBatch(EasyTrain.java:110)
	at ai.djl.training.EasyTrain.fit(EasyTrain.java:58)
	at cn.amberdata.misc.djl.rankcls.Main.main(Main.java:114)

And here is my dependencies.

        <!-- https://mvnrepository.com/artifact/ai.djl/api -->
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.19.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-simple -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.36</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/ai.djl.pytorch/pytorch-engine -->
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-engine</artifactId>
            <version>0.19.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/ai.djl/basicdataset -->
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>basicdataset</artifactId>
            <version>0.19.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/ai.djl/model-zoo -->
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>model-zoo</artifactId>
            <version>0.19.0</version>
        </dependency>

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
siddvenkcommented, Nov 9, 2022

Thanks for pointing out this bug to us - this is an issue in the NoopTranslator when using the PyTorch engine. We will take a look and determine the best fix.

For now, you can adjust your code as follows to get around the exception:

  1. Define your own custom translator that extends the NoopTranslator and overrides the processOutput method
public static final class MyTranslator extends NoopTranslator {

        @Override
        public NDList processOutput(TranslatorContext ctx, NDList input) {
            return new NDList(
                input.stream().map(ndArray -> ndArray.duplicate()).collect(Collectors.toList()));
        }
    }
  1. When you are building the Criteria object, add the optTranslator builder method and pass in the customer translator
Criteria<NDList, NDList> criteria = Criteria.builder()
    .optApplication(Application.NLP.WORD_EMBEDDING)
    .setTypes(NDList.class, NDList.class)
    .optModelUrls(modelUrl)
    .optProgress(new ProgressBar())
    .optTranslator(new MyTranslator()) // Add this line
    .build();
0reactions
KexinFengcommented, Dec 2, 2022

I’m glad to know that the answers helped!

By the way, the newest snapshot version is not in the 0.20.0 released version. It is accessed by

dependencies {
    implementation platform("ai.djl:bom:<UPCOMING VERSION>-SNAPSHOT")
}

Currently, the <UPCOMING VERSION> = 0.21.0. See https://github.com/deepjavalibrary/djl/blob/master/docs/get.md#using-built-from-source-version-in-another-project

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch on Twitter: "4. ⚠️ Inference tensors can't be used ...
⚠️ Inference tensors can't be modified in-place outside InferenceMode. ✓ Simply clone the inference tensor and you're good to go.
Read more >
pytorch/saved_variable.cpp at master - autograd - GitHub
You can't save an inference tensor for backwards. // If an inference tensor was saved for backward in an autograd session and. //...
Read more >
Inference Mode — PyTorch master documentation
A non-view tensor is an inference tensor if and only if it was allocated inside ... e.g. PyTorch throws an error when tensors...
Read more >
python - Getting pytorch backward's RuntimeError: Trying to ...
run_backward ( RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.
Read more >
NVIDIA Deep Learning TensorRT Documentation
Where possible, the parser is backward compatible up to opset 7; the ONNX Model Opset ... outputDims = dims; } // Saved dimensions...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found