question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MXNET GPU CUDA missmatch Problem?

See original GitHub issue

hey i m new here ,today i have successfully install ijava kernel in google colab and java is running successfully BUT when i train i got this ERROR : ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Compile with USE_CUDA=1 to enable GPU usage Stack trace: File "src/storage/storage.cc", line 119 at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1909) at ai.djl.mxnet.jna.JnaUtils.createNdArray(JnaUtils.java:349) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:91) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:34) at ai.djl.ndarray.NDManager.create(NDManager.java:526) at ai.djl.mxnet.engine.MxNDArray.duplicate(MxNDArray.java:184) at ai.djl.mxnet.engine.MxNDArray.toDevice(MxNDArray.java:197) at ai.djl.training.ParameterStore.getValue(ParameterStore.java:110) at ai.djl.training.Trainer.lambda$initialize$1(Trainer.java:120) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at ai.djl.training.Trainer.initialize(Trainer.java:117) at .(#76:1) heres my training phase:

int batchSize = 32; int limit = Integer.MAX_VALUE; // change this to a small value for a dry run // int limit = 160; // limit 160 records in the dataset for a dry run Pipeline pipeline = new Pipeline( new ToTensor(), new Normalize(new float[] {0.4914f, 0.4822f, 0.4465f}, new float[] {0.2023f, 0.1994f, 0.2010f})); Cifar10 trainDataset = Cifar10.builder() .setSampling(batchSize, true) .optUsage(Dataset.Usage.TRAIN) .optLimit(limit) .optPipeline(pipeline) .build(); trainDataset.prepare(new ProgressBar()); DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss()) //softmaxCrossEntropyLoss is a standard loss for classification problems .addEvaluator(new Accuracy()) // Use accuracy so we humans can understand how accurate the model is .optDevices(new Device[]{Device.gpu(0)}) // Limit your GPU, using more GPU actually will slow down coverging .addTrainingListeners(TrainingListener.Defaults.logging()); // Now that we have our training configuration, we should create a new trainer for our model Trainer trainer = model.newTrainer(config); int epoch = 10; Shape inputShape = new Shape(1, 3, 32, 32); trainer.initialize(inputShape); for (int i = 0; i < epoch; ++i) { int index = 0; for (Batch batch : trainer.iterateDataset(trainDataset)) { EasyTrain.trainBatch(trainer, batch); trainer.step(); batch.close(); } // reset training and validation evaluators at end of epoch trainer.notifyListeners(listener -> listener.onEpoch(trainer)); }`

I know it CUDA related error in google colab its showing cuda-10.0 installed i have alse tried installing mxnet-cu90 using this Cmd: !pip install mxnet-cu90 Still not working … Please help me through this ??

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:28 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
lanking520commented, Apr 9, 2021

@aksrajvanshi as a further action, can you update the colab instruction in D2L book so we can automate this process next time?

1reaction
nikkisingh111333commented, Apr 9, 2021

Awesome, thanks! Please just let me update if you change or Do anything special with colab so that I can adapt quickly …I m gladly looking forward to see it.

On Fri 9 Apr, 2021, 8:57 PM aksrajvanshi, @.***> wrote:

@nikkisingh111333 https://github.com/nikkisingh111333 Great! So this wasn’t exactly your problem. This is more of a colab problem. First of all, MXNet needed Cuda 10.1 or 10.2 to work.

Secondly, DJL needs the libcudart file at the $LD_LIBRARY_PATH environment variable which wasn’t there. Initially LD_LIBRARY_PATH was pointing to /usr/lib64-nvidia. We created a symbolic link that allowed DJL to locate the libcudart file.

We can try to do something which would make it easy for users to use DJL on colab along with GPU 😃

Also, if you’re starting with Deep Learning, you can start with this book. (https://d2l.djl.ai/)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/djl/issues/824#issuecomment-816763189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL3BT7A4TIEO7FCV2QPAWWLTH4MFTANCNFSM42NFUF7Q .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mxnet and CUDA mismatch on Colab #1865 - GitHub
ERROR: Incomplete installation for leveraging GPUs for computations. Please make sure you have CUDA installed and run the following line in ...
Read more >
Environment Variables | Apache MXNet
If set to 1 , MXNet will utilize CUDA graphs when executing models on the GPU when possible. For CUDA graphs execution, one...
Read more >
Deep Learning with Nvidia GPUs in Cloudera Machine Learning
CUDA 11.0 libraries may have issues with CUDA 11.1 for example. If there is a mismatch between the CUDA version installed and what...
Read more >
GPU not supported Windows DLS 2.0.6
Yes I have another python in the path and it was the problem. ... I installed new version 2.0.7 with CUDA but still...
Read more >
mxnet Changelog - pyup.io
Update CustomOp doc with changes for GPU support (17486) ... Workaround problem with fusion in CUDA 9 (17028) (17035) ... Fix mismatch shapes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found