Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MXNET GPU CUDA missmatch Problem?

See original GitHub issue

hey i m new here ,today i have successfully install ijava kernel in google colab and java is running successfully BUT when i train i got this ERROR : ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Compile with USE_CUDA=1 to enable GPU usage Stack trace: File "src/storage/storage.cc", line 119 at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1909) at ai.djl.mxnet.jna.JnaUtils.createNdArray(JnaUtils.java:349) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:91) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:34) at ai.djl.ndarray.NDManager.create(NDManager.java:526) at ai.djl.mxnet.engine.MxNDArray.duplicate(MxNDArray.java:184) at ai.djl.mxnet.engine.MxNDArray.toDevice(MxNDArray.java:197) at ai.djl.training.ParameterStore.getValue(ParameterStore.java:110) at ai.djl.training.Trainer.lambda$initialize$1(Trainer.java:120) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at ai.djl.training.Trainer.initialize(Trainer.java:117) at .(#76:1) heres my training phase:

int batchSize = 32; int limit = Integer.MAX_VALUE; // change this to a small value for a dry run // int limit = 160; // limit 160 records in the dataset for a dry run Pipeline pipeline = new Pipeline( new ToTensor(), new Normalize(new float[] {0.4914f, 0.4822f, 0.4465f}, new float[] {0.2023f, 0.1994f, 0.2010f})); Cifar10 trainDataset = Cifar10.builder() .setSampling(batchSize, true) .optUsage(Dataset.Usage.TRAIN) .optLimit(limit) .optPipeline(pipeline) .build(); trainDataset.prepare(new ProgressBar()); DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss()) //softmaxCrossEntropyLoss is a standard loss for classification problems .addEvaluator(new Accuracy()) // Use accuracy so we humans can understand how accurate the model is .optDevices(new Device[]{Device.gpu(0)}) // Limit your GPU, using more GPU actually will slow down coverging .addTrainingListeners(TrainingListener.Defaults.logging()); // Now that we have our training configuration, we should create a new trainer for our model Trainer trainer = model.newTrainer(config); int epoch = 10; Shape inputShape = new Shape(1, 3, 32, 32); trainer.initialize(inputShape); for (int i = 0; i < epoch; ++i) { int index = 0; for (Batch batch : trainer.iterateDataset(trainDataset)) { EasyTrain.trainBatch(trainer, batch); trainer.step(); batch.close(); } // reset training and validation evaluators at end of epoch trainer.notifyListeners(listener -> listener.onEpoch(trainer)); }`

I know it CUDA related error in google colab its showing cuda-10.0 installed i have alse tried installing mxnet-cu90 using this Cmd: !pip install mxnet-cu90 Still not working … Please help me through this ??

Issue Analytics

State:
Created 2 years ago
Comments:28 (12 by maintainers)

Top GitHub Comments

1reaction

lanking520commented, Apr 9, 2021

@aksrajvanshi as a further action, can you update the colab instruction in D2L book so we can automate this process next time?

1reaction

nikkisingh111333commented, Apr 9, 2021

Awesome, thanks! Please just let me update if you change or Do anything special with colab so that I can adapt quickly …I m gladly looking forward to see it.

On Fri 9 Apr, 2021, 8:57 PM aksrajvanshi, @.***> wrote:

@nikkisingh111333 https://github.com/nikkisingh111333 Great! So this wasn’t exactly your problem. This is more of a colab problem. First of all, MXNet needed Cuda 10.1 or 10.2 to work.

Secondly, DJL needs the libcudart file at the $LD_LIBRARY_PATH environment variable which wasn’t there. Initially LD_LIBRARY_PATH was pointing to /usr/lib64-nvidia. We created a symbolic link that allowed DJL to locate the libcudart file.

We can try to do something which would make it easy for users to use DJL on colab along with GPU 😃

Also, if you’re starting with Deep Learning, you can start with this book. (https://d2l.djl.ai/)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/djl/issues/824#issuecomment-816763189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL3BT7A4TIEO7FCV2QPAWWLTH4MFTANCNFSM42NFUF7Q .