MXNET GPU CUDA missmatch Problem?
See original GitHub issuehey i m new here ,today i have successfully install ijava kernel in google colab and java is running successfully BUT when i train i got this ERROR :
ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Compile with USE_CUDA=1 to enable GPU usage Stack trace: File "src/storage/storage.cc", line 119 at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1909) at ai.djl.mxnet.jna.JnaUtils.createNdArray(JnaUtils.java:349) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:91) at ai.djl.mxnet.engine.MxNDManager.create(MxNDManager.java:34) at ai.djl.ndarray.NDManager.create(NDManager.java:526) at ai.djl.mxnet.engine.MxNDArray.duplicate(MxNDArray.java:184) at ai.djl.mxnet.engine.MxNDArray.toDevice(MxNDArray.java:197) at ai.djl.training.ParameterStore.getValue(ParameterStore.java:110) at ai.djl.training.Trainer.lambda$initialize$1(Trainer.java:120) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at ai.djl.training.Trainer.initialize(Trainer.java:117) at .(#76:1)
heres my training phase:
int batchSize = 32; int limit = Integer.MAX_VALUE; // change this to a small value for a dry run // int limit = 160; // limit 160 records in the dataset for a dry run Pipeline pipeline = new Pipeline( new ToTensor(), new Normalize(new float[] {0.4914f, 0.4822f, 0.4465f}, new float[] {0.2023f, 0.1994f, 0.2010f})); Cifar10 trainDataset = Cifar10.builder() .setSampling(batchSize, true) .optUsage(Dataset.Usage.TRAIN) .optLimit(limit) .optPipeline(pipeline) .build(); trainDataset.prepare(new ProgressBar()); DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss()) //softmaxCrossEntropyLoss is a standard loss for classification problems .addEvaluator(new Accuracy()) // Use accuracy so we humans can understand how accurate the model is .optDevices(new Device[]{Device.gpu(0)}) // Limit your GPU, using more GPU actually will slow down coverging .addTrainingListeners(TrainingListener.Defaults.logging()); // Now that we have our training configuration, we should create a new trainer for our model Trainer trainer = model.newTrainer(config); int epoch = 10; Shape inputShape = new Shape(1, 3, 32, 32); trainer.initialize(inputShape); for (int i = 0; i < epoch; ++i) { int index = 0; for (Batch batch : trainer.iterateDataset(trainDataset)) { EasyTrain.trainBatch(trainer, batch); trainer.step(); batch.close(); } // reset training and validation evaluators at end of epoch trainer.notifyListeners(listener -> listener.onEpoch(trainer)); }`
I know it CUDA related error in google colab its showing cuda-10.0 installed i have alse tried installing mxnet-cu90 using this Cmd:
!pip install mxnet-cu90
Still not working …
Please help me through this ??
Issue Analytics
- State:
- Created 2 years ago
- Comments:28 (12 by maintainers)
Top GitHub Comments
@aksrajvanshi as a further action, can you update the colab instruction in D2L book so we can automate this process next time?
Awesome, thanks! Please just let me update if you change or Do anything special with colab so that I can adapt quickly …I m gladly looking forward to see it.
On Fri 9 Apr, 2021, 8:57 PM aksrajvanshi, @.***> wrote: