question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exception during runtime TrainMnist.java

See original GitHub issue

Description

Expected Behavior

build success

Error Message

`“C:\Program Files\Java\jdk1.8.0_251\bin\java.exe” “-javaagent:D:\JetBrains\IntelliJ IDEA 2020.1.2\lib\idea_rt.jar=54222:D:\JetBrains\IntelliJ IDEA 2020.1.2\bin” -Dfile.encoding=UTF-8 -classpath “C:\Program Files\Java\jdk1.8.0_251\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\rt.jar;D:\demo\target\classes;C:\Users\Administrator.m2\repository\com\vmware\vijava\5.1\vijava-5.1.jar;C:\Users\Administrator.m2\repository\dom4j\dom4j\1.6.1\dom4j-1.6.1.jar;C:\Users\Administrator.m2\repository\xml-apis\xml-apis\1.0.b2\xml-apis-1.0.b2.jar;C:\Users\Administrator.m2\repository\commons-cli\commons-cli\1.4\commons-cli-1.4.jar;C:\Users\Administrator.m2\repository\org\apache\logging\log4j\log4j-slf4j-impl\2.12.1\log4j-slf4j-impl-2.12.1.jar;C:\Users\Administrator.m2\repository\org\slf4j\slf4j-api\1.7.25\slf4j-api-1.7.25.jar;C:\Users\Administrator.m2\repository\org\apache\logging\log4j\log4j-api\2.12.1\log4j-api-2.12.1.jar;C:\Users\Administrator.m2\repository\org\apache\logging\log4j\log4j-core\2.12.1\log4j-core-2.12.1.jar;C:\Users\Administrator.m2\repository\com\google\code\gson\gson\2.8.5\gson-2.8.5.jar;C:\Users\Administrator.m2\repository\ai\djl\api\0.6.0\api-0.6.0.jar;C:\Users\Administrator.m2\repository\net\java\dev\jna\jna\5.3.0\jna-5.3.0.jar;C:\Users\Administrator.m2\repository\org\apache\commons\commons-compress\1.20\commons-compress-1.20.jar;C:\Users\Administrator.m2\repository\ai\djl\basicdataset\0.6.0\basicdataset-0.6.0.jar;C:\Users\Administrator.m2\repository\ai\djl\model-zoo\0.6.0\model-zoo-0.6.0.jar;C:\Users\Administrator.m2\repository\ai\djl\mxnet\mxnet-model-zoo\0.6.0\mxnet-model-zoo-0.6.0.jar;C:\Users\Administrator.m2\repository\ai\djl\mxnet\mxnet-engine\0.6.0\mxnet-engine-0.6.0.jar;C:\Users\Administrator.m2\repository\ai\djl\mxnet\mxnet-native-auto\1.7.0-b\mxnet-native-auto-1.7.0-b.jar” com.zhaowei.training.TrainMnist [INFO ] - Training on: cpu(). [INFO ] - Load MXNet Engine Version 1.7.0 in 0.211 ms. Training: 17% |███████ | Accuracy: 0.86, SoftmaxCrossEntropyLoss: 0.50, speed: 1416.17 items/sec[INFO ] - train P50: 23.255 ms, P90: 30.021 ms [INFO ] - forward P50: 0.874 ms, P90: 1.021 ms [INFO ] - training-metrics P50: 0.027 ms, P90: 0.035 ms [INFO ] - backward P50: 1.305 ms, P90: 1.576 ms [INFO ] - step P50: 1.676 ms, P90: 2.138 ms

Exception in thread “main” ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: can’t alloc at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788) at ai.djl.mxnet.jna.JnaUtils.syncCopyToCPU(JnaUtils.java:473) at ai.djl.mxnet.engine.MxNDArray.toByteBuffer(MxNDArray.java:294) at ai.djl.ndarray.NDArray.toLongArray(NDArray.java:300) at ai.djl.ndarray.NDArray.getLong(NDArray.java:558) at ai.djl.training.evaluator.AbstractAccuracy.lambda$updateAccumulator$1(AbstractAccuracy.java:85) at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1877) at ai.djl.training.evaluator.AbstractAccuracy.updateAccumulator(AbstractAccuracy.java:85) at ai.djl.training.listener.EvaluatorTrainingListener.updateEvaluators(EvaluatorTrainingListener.java:153) at ai.djl.training.listener.EvaluatorTrainingListener.onTrainingBatch(EvaluatorTrainingListener.java:112) at ai.djl.training.EasyTrain.lambda$trainBatch$1(EasyTrain.java:86) at java.util.ArrayList.forEach(ArrayList.java:1257) at ai.djl.training.Trainer.notifyListeners(Trainer.java:249) at ai.djl.training.EasyTrain.trainBatch(EasyTrain.java:86) at ai.djl.training.EasyTrain.fit(EasyTrain.java:39) at com.zhaowei.training.TrainMnist.runExample(TrainMnist.java:84) at com.zhaowei.training.TrainMnist.main(TrainMnist.java:49) Suppressed: java.lang.NullPointerException at com.zhaowei.training.TrainMnist.lambda$setupTrainingConfig$0(TrainMnist.java:98) at ai.djl.training.listener.CheckpointsTrainingListener.saveModel(CheckpointsTrainingListener.java:144) at ai.djl.training.listener.CheckpointsTrainingListener.onTrainingEnd(CheckpointsTrainingListener.java:102) at ai.djl.training.Trainer.lambda$close$2(Trainer.java:295) at java.util.ArrayList.forEach(ArrayList.java:1257) at ai.djl.training.Trainer.notifyListeners(Trainer.java:249) at ai.djl.training.Trainer.close(Trainer.java:295) at com.zhaowei.training.TrainMnist.runExample(TrainMnist.java:87) … 1 more Suppressed: ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: can’t alloc

	at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788)
	at ai.djl.mxnet.jna.JnaUtils.waitAll(JnaUtils.java:466)
	at ai.djl.mxnet.engine.MxModel.close(MxModel.java:176)
	at com.zhaowei.training.TrainMnist.runExample(TrainMnist.java:88)
	... 1 more

Process finished with exit code 1 `

Environment Info

JDK 8 Windows 10 X64 CPU I5 8G Maven Compile `<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>demo</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>demo</name> <description>Demo project for Spring Boot</description>

<properties>
	<java.version>8</java.version>
	<maven.compiler.source>1.8</maven.compiler.source>
	<maven.compiler.target>1.8</maven.compiler.target>
	<djl.version>0.6.0</djl.version>
</properties>

<repositories>
	<repository>
		<id>djl.ai</id>
		<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
	</repository>
</repositories>

<dependencies>
	<dependency>
		<groupId>com.vmware</groupId>
		<artifactId>vijava</artifactId>
		<version>5.1</version>
	</dependency>
	<dependency>
		<groupId>commons-cli</groupId>
		<artifactId>commons-cli</artifactId>
		<version>1.4</version>
	</dependency>
	<dependency>
		<groupId>org.apache.logging.log4j</groupId>
		<artifactId>log4j-slf4j-impl</artifactId>
		<version>2.12.1</version>
	</dependency>
	<dependency>
		<groupId>com.google.code.gson</groupId>
		<artifactId>gson</artifactId>
		<version>2.8.5</version>
	</dependency>

	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>api</artifactId>
		<version>${djl.version}</version>
	</dependency>
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>basicdataset</artifactId>
		<version>${djl.version}</version>
	</dependency>
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>model-zoo</artifactId>
		<version>${djl.version}</version>
	</dependency>
	<dependency>
		<groupId>ai.djl.mxnet</groupId>
		<artifactId>mxnet-model-zoo</artifactId>
		<version>${djl.version}</version>
	</dependency>
	<dependency>
		<groupId>ai.djl.mxnet</groupId>
		<artifactId>mxnet-engine</artifactId>
		<version>${djl.version}</version>
	</dependency>
	<dependency>
		<groupId>ai.djl.mxnet</groupId>
		<artifactId>mxnet-native-auto</artifactId>
		<version>1.7.0-b</version>
		<scope>runtime</scope>
	</dependency>

</dependencies>

<build>
	<plugins>
		<plugin>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-maven-plugin</artifactId>
		</plugin>
	</plugins>
</build>

</project>`

` package com.zhaowei.training;

import ai.djl.Device; import ai.djl.Model; import ai.djl.basicdataset.Mnist; import ai.djl.basicmodelzoo.basic.Mlp; import com.zhaowei.training.util.Arguments; import ai.djl.metric.Metrics; import ai.djl.ndarray.types.Shape; import ai.djl.nn.Block; import ai.djl.training.DefaultTrainingConfig; import ai.djl.training.EasyTrain; import ai.djl.training.Trainer; import ai.djl.training.TrainingResult; import ai.djl.training.dataset.Dataset; import ai.djl.training.dataset.RandomAccessDataset; import ai.djl.training.evaluator.Accuracy; import ai.djl.training.listener.CheckpointsTrainingListener; import ai.djl.training.listener.TrainingListener; import ai.djl.training.loss.Loss; import ai.djl.training.util.ProgressBar; import java.io.IOException; import org.apache.commons.cli.ParseException;

public final class TrainMnist {

private TrainMnist() {}

public static void main(String[] args) throws IOException, ParseException {
    TrainMnist.runExample(args);
}

public static TrainingResult runExample(String[] args) throws IOException, ParseException {
    Arguments arguments = Arguments.parseArgs(args);

    // Construct neural network
    Block block =
            new Mlp(
                    Mnist.IMAGE_HEIGHT * Mnist.IMAGE_WIDTH,
                    Mnist.NUM_CLASSES,
                    new int[] {128, 64});

    try (Model model = Model.newInstance("mlp")) {
        model.setBlock(block);

        // get training and validation dataset
        RandomAccessDataset trainingSet = getDataset(Dataset.Usage.TRAIN, arguments);
        RandomAccessDataset validateSet = getDataset(Dataset.Usage.TEST, arguments);

        // setup training configuration
        DefaultTrainingConfig config = setupTrainingConfig(arguments);

        try (Trainer trainer = model.newTrainer(config)) {
            trainer.setMetrics(new Metrics());

            /*
             * MNIST is 28x28 grayscale image and pre processed into 28 * 28 NDArray.
             * 1st axis is batch axis, we can use 1 for initialization.
             */
            Shape inputShape = new Shape(1, Mnist.IMAGE_HEIGHT * Mnist.IMAGE_WIDTH);

            // initialize trainer with proper input shape
            trainer.initialize(inputShape);

            EasyTrain.fit(trainer, arguments.getEpoch(), trainingSet, validateSet);

            return trainer.getTrainingResult();
        }
    }
}

private static DefaultTrainingConfig setupTrainingConfig(Arguments arguments) {
    String outputDir = arguments.getOutputDir();
    CheckpointsTrainingListener listener = new CheckpointsTrainingListener(outputDir);
    listener.setSaveModelCallback(
            trainer -> {
                TrainingResult result = trainer.getTrainingResult();
                Model model = trainer.getModel();
                float accuracy = result.getValidateEvaluation("Accuracy");
                model.setProperty("Accuracy", String.format("%.5f", accuracy));
                model.setProperty("Loss", String.format("%.5f", result.getValidateLoss()));
            });
    return new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
            .addEvaluator(new Accuracy())
            .optDevices(Device.getDevices(arguments.getMaxGpus()))
            .addTrainingListeners(TrainingListener.Defaults.logging(outputDir))
            .addTrainingListeners(listener);
}

private static RandomAccessDataset getDataset(Dataset.Usage usage, Arguments arguments)
        throws IOException {
    Mnist mnist =
            Mnist.builder()
                    .optUsage(usage)
                    .setSampling(arguments.getBatchSize(), true)
                    .optLimit(arguments.getLimit())
                    .build();
    mnist.prepare(new ProgressBar());
    return mnist;
}

} `

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Jzowcommented, Jul 28, 2020

16G should be sufficient

I have a suggestion, which can be recommended in the official document, such as memory and cpu,Thanks for you

0reactions
stu1130commented, Jul 28, 2020

16G should be sufficient

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak? ai.djl.examples.training.TrainMnist running ...
A fatal error has been detected by the Java Runtime Environment: EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000055c64147, ...
Read more >
How to handle the Runtime Exception in Java? - Tutorialspoint
The Runtime Exception is the parent class in all exceptions of the Java programming language that are expected to crash or break down...
Read more >
Java Runtime Exceptions User's Guide
Java Runtime Exceptions. The exceptions thrown by Runtime system can be classified into the following categories: 1. Core Exceptions. 2. Plug-In Related ...
Read more >
caused by: java.lang.runtimeException in android studio
i have been trying to run a app from android studio but is always showing java lang Runtime Exception error after Gradle build...
Read more >
Exceptions in Java, Part 1: Exception handling basics | InfoWorld
Unlike checked exceptions, runtime exceptions typically arise from poorly written source code, and should thus be fixed by the programmer.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found