question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All trials under Chinese Name folder failed when running the mnist examples(pytroch)

See original GitHub issue

Describe the issue: I have runed mnist pytorch example , but all trials had failed. Environment:

  • NNI version:2.80

  • Training service (local|remote|pai|aml|etc):local

  • Client OS: Windows11

  • Server OS (for remote mode only):None

  • Python version:3.9.7

  • PyTorch version:torch 1.10.2 (use the Anaconda internal Python interpreter)

  • Is conda/virtualenv/venv used?: conda env(base)

  • Is running in Docker?:None image image the trial error log It’s not trial log in the trials directory {“error”:“File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\trial.log”}` image Log message:

  • nnimanager.log: `[2022-08-25 23:39:41] INFO (main) Start NNI manager [2022-08-25 23:39:41] INFO (NNIDataStore) Datastore initialization done [2022-08-25 23:39:41] INFO (RestServer) Starting REST server at port 8080, URL prefix: “/” [2022-08-25 23:39:41] INFO (RestServer) REST server started. [2022-08-25 23:39:42] INFO (NNIManager) Starting experiment: jnl15sro [2022-08-25 23:39:42] INFO (NNIManager) Setup training service… [2022-08-25 23:39:42] INFO (LocalTrainingService) Construct local machine training service. [2022-08-25 23:39:42] INFO (NNIManager) Setup tuner… [2022-08-25 23:39:42] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING [2022-08-25 23:39:43] INFO (NNIManager) Add event listeners [2022-08-25 23:39:43] INFO (LocalTrainingService) Run local machine training service. [2022-08-25 23:39:43] INFO (NNIManager) NNIManager received command from dispatcher: ID, [2022-08-25 23:39:43] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 0, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 1024, “lr”: 0.001, “momentum”: 0.9625271719636488}, “parameter_index”: 0} [2022-08-25 23:39:48] INFO (NNIManager) submitTrialJob: form: { sequenceId: 0, hyperParameters: { value: ‘{“parameter_id”: 0, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 1024, “lr”: 0.001, “momentum”: 0.9625271719636488}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:39:58] INFO (NNIManager) Trial job XJ3S3 status changed from WAITING to FAILED [2022-08-25 23:39:58] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 1, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 512, “lr”: 0.0001, “momentum”: 0.17049052927161668}, “parameter_index”: 0} [2022-08-25 23:40:03] INFO (NNIManager) submitTrialJob: form: { sequenceId: 1, hyperParameters: { value: ‘{“parameter_id”: 1, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 512, “lr”: 0.0001, “momentum”: 0.17049052927161668}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:08] INFO (NNIManager) Trial job YZUnM status changed from WAITING to FAILED [2022-08-25 23:40:08] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 2, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.0001, “momentum”: 0.4641360565997277}, “parameter_index”: 0} [2022-08-25 23:40:13] INFO (NNIManager) submitTrialJob: form: { sequenceId: 2, hyperParameters: { value: ‘{“parameter_id”: 2, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.0001, “momentum”: 0.4641360565997277}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:19] INFO (NNIManager) Trial job VZJ4V status changed from WAITING to FAILED [2022-08-25 23:40:19] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 3, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.1, “momentum”: 0.09260907090425263}, “parameter_index”: 0} [2022-08-25 23:40:24] INFO (NNIManager) submitTrialJob: form: { sequenceId: 3, hyperParameters: { value: ‘{“parameter_id”: 3, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.1, “momentum”: 0.09260907090425263}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:29] INFO (NNIManager) Trial job IwZ3y status changed from WAITING to FAILED [2022-08-25 23:40:29] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 4, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 256, “lr”: 0.01, “momentum”: 0.6022108230948574}, “parameter_index”: 0} [2022-08-25 23:40:34] INFO (NNIManager) submitTrialJob: form: { sequenceId: 4, hyperParameters: { value: ‘{“parameter_id”: 4, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 256, “lr”: 0.01, “momentum”: 0.6022108230948574}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:39] INFO (NNIManager) Trial job ffqVg status changed from WAITING to FAILED [2022-08-25 23:40:39] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 5, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 128, “lr”: 0.0001, “momentum”: 0.9069749674371951}, “parameter_index”: 0} [2022-08-25 23:40:44] INFO (NNIManager) submitTrialJob: form: { sequenceId: 5, hyperParameters: { value: ‘{“parameter_id”: 5, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 128, “lr”: 0.0001, “momentum”: 0.9069749674371951}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:49] INFO (NNIManager) Trial job YQ54k status changed from WAITING to FAILED [2022-08-25 23:40:49] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 6, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 512, “lr”: 0.01, “momentum”: 0.4989949571130713}, “parameter_index”: 0} [2022-08-25 23:40:54] INFO (NNIManager) submitTrialJob: form: { sequenceId: 6, hyperParameters: { value: ‘{“parameter_id”: 6, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 512, “lr”: 0.01, “momentum”: 0.4989949571130713}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:59] INFO (NNIManager) Trial job O7UIq status changed from WAITING to FAILED [2022-08-25 23:41:00] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 7, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.01, “momentum”: 0.11755641505627679}, “parameter_index”: 0} [2022-08-25 23:41:05] INFO (NNIManager) submitTrialJob: form: { sequenceId: 7, hyperParameters: { value: ‘{“parameter_id”: 7, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.01, “momentum”: 0.11755641505627679}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:10] INFO (NNIManager) Trial job S975K status changed from WAITING to FAILED [2022-08-25 23:41:10] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 8, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 16, “hidden_size”: 256, “lr”: 0.0001, “momentum”: 0.23273431921548116}, “parameter_index”: 0} [2022-08-25 23:41:15] INFO (NNIManager) submitTrialJob: form: { sequenceId: 8, hyperParameters: { value: ‘{“parameter_id”: 8, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 16, “hidden_size”: 256, “lr”: 0.0001, “momentum”: 0.23273431921548116}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:16] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\stderr at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14) [2022-08-25 23:41:20] INFO (NNIManager) Trial job i1Qt1 status changed from WAITING to FAILED [2022-08-25 23:41:20] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 9, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.001, “momentum”: 0.9310719853939587}, “parameter_index”: 0} [2022-08-25 23:41:25] INFO (NNIManager) submitTrialJob: form: { sequenceId: 9, hyperParameters: { value: ‘{“parameter_id”: 9, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.001, “momentum”: 0.9310719853939587}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:30] INFO (NNIManager) Trial job GKOE7 status changed from WAITING to FAILED [2022-08-25 23:41:30] INFO (NNIManager) Change NNIManager status from: RUNNING to: NO_MORE_TRIAL [2022-08-25 23:41:30] INFO (NNIManager) Change NNIManager status from: NO_MORE_TRIAL to: DONE [2022-08-25 23:41:30] INFO (NNIManager) Experiment done. [2022-08-26 00:00:27] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\trial.log at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14) [2022-08-26 00:01:47] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\stderr at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14)

  • dispatcher.log: [2022-08-25 23:39:42] INFO (numexpr.utils/MainThread) Note: NumExpr detected 16 cores but “NUMEXPR_MAX_THREADS” not set, so enforcing safe limit of 8. [2022-08-25 23:39:42] INFO (numexpr.utils/MainThread) NumExpr defaulting to 8 threads. [2022-08-25 23:39:43] INFO (nni.tuner.tpe/MainThread) Using random seed 1939897128 [2022-08-25 23:39:43] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started

  • nnictl_stderr: Experiment jnl15sro start: 2022-08-25 23:39:41.343385

  • nnictl_stdout: Experiment jnl15sro start: 2022-08-25 23:39:41.343385 run: $env:PATH=‘D:\PythonProgram\Anaconda3;D:\PythonProgram\Anaconda3\Library\mingw-w64\bin;D:\PythonProgram\Anaconda3\Library\usr\bin;D:\PythonProgram\Anaconda3\Library\bin;D:\PythonProgram\Anaconda3\Scripts;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;D:\MATLAB\R2021b\runtime\win64;D:\MATLAB\R2021b\bin;C:\Program Files\NVIDIA Corporation\Nsight Compute 2021.1.0;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn;D:\AllSoftwares\Softwares\SVN\bin;C:\Users\唐勇强\AppData\Local\Microsoft\WindowsApps;;D:\PythonProgram\PyCharm Community Edition 2021.3.2\bin;’ $env:NNI_PLATFORM=“local” $env:NNI_EXP_ID=“jnl15sro” $env:NNI_SYS_DIR=“C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM” $env:NNI_TRIAL_JOB_ID=“YZUnM” $env:NNI_OUTPUT_DIR=“C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM” $env:NNI_TRIAL_SEQ_ID=“1” $env:NNI_CODE_DIR=“D:\深度学习\mnist” $PSDefaultParameterValues = @{‘Out-File:Encoding’ = ‘utf8’} cd $env:NNI_CODE_DIR cmd.exe /c ‘python mnist.py’ 1>C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM\stdout 2>C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM\stderr $NOW_DATE = [int64](([datetime]::UtcNow)-(get-date “1/1/1970”)).TotalSeconds $NOW_DATE = “$NOW_DATE” + (Get-Date -Format fff).ToString() Write $LASTEXITCODE " " $NOW_DATE | Out-File “C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM.nni\state” -NoNewline -encoding utf8 powershell image

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Tyqfatcommented, Aug 26, 2022

Looks like trials failed for different reasons. english directory means that log directory is english rather than chinese

When I changed experimentWorkingDirectory: "C:\\Users\\唐勇强\\nni-experiments" to experimentWorkingDirectory: "D:\\mnist-pytorch\\nni-experiments", trials run successfully. Thank you very much for your help! image image

1reaction
Lijiaoacommented, Aug 26, 2022

Looks like trials failed for different reasons. english directory means that log directory is english rather than chinese

Read more comments on GitHub >

github_iconTop Results From Across the Web

Microsoft/nni - Gitter
hi, I am running the example code under folder mnist-annotation. When I change the tuner into SMAC , it reported an error in...
Read more >
MNIST server down - vision - PyTorch Forums
Hello together, can someone confirm, that the server for downloading MNIST dataset is down? I cannot access the dataset by the dataloader.
Read more >
Change Log - Neural Network Intelligence
Improve model compression examples and documentation (#3326 #3371) ... Bug fix of naïve evolution tuner, correctly deal with trial fails (#2695).
Read more >
Track and Compare Tutorial - Amazon SageMaker
It is intended that this topic be viewed alongside Studio with the MNIST notebook open. As you run through the cells, the sections...
Read more >
A Scalable and Cloud-Native Hyperparameter Tuning System ...
We present the motivation and design of the system and contrast it with existing ... 42 containers. 43 - name {{.Trial}}. 44 image...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found