All trials under Chinese Name folder failed when running the mnist examples(pytroch)
See original GitHub issueDescribe the issue: I have runed mnist pytorch example , but all trials had failed. Environment:
-
NNI version:2.80
-
Training service (local|remote|pai|aml|etc):local
-
Client OS: Windows11
-
Server OS (for remote mode only):None
-
Python version:3.9.7
-
PyTorch version:torch 1.10.2 (use the Anaconda internal Python interpreter)
-
Is conda/virtualenv/venv used?: conda env(base)
-
Is running in Docker?:None the trial error log It’s not trial log in the trials directory {“error”:“File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\trial.log”}` Log message:
-
nnimanager.log: `[2022-08-25 23:39:41] INFO (main) Start NNI manager [2022-08-25 23:39:41] INFO (NNIDataStore) Datastore initialization done [2022-08-25 23:39:41] INFO (RestServer) Starting REST server at port 8080, URL prefix: “/” [2022-08-25 23:39:41] INFO (RestServer) REST server started. [2022-08-25 23:39:42] INFO (NNIManager) Starting experiment: jnl15sro [2022-08-25 23:39:42] INFO (NNIManager) Setup training service… [2022-08-25 23:39:42] INFO (LocalTrainingService) Construct local machine training service. [2022-08-25 23:39:42] INFO (NNIManager) Setup tuner… [2022-08-25 23:39:42] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING [2022-08-25 23:39:43] INFO (NNIManager) Add event listeners [2022-08-25 23:39:43] INFO (LocalTrainingService) Run local machine training service. [2022-08-25 23:39:43] INFO (NNIManager) NNIManager received command from dispatcher: ID, [2022-08-25 23:39:43] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 0, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 1024, “lr”: 0.001, “momentum”: 0.9625271719636488}, “parameter_index”: 0} [2022-08-25 23:39:48] INFO (NNIManager) submitTrialJob: form: { sequenceId: 0, hyperParameters: { value: ‘{“parameter_id”: 0, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 1024, “lr”: 0.001, “momentum”: 0.9625271719636488}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:39:58] INFO (NNIManager) Trial job XJ3S3 status changed from WAITING to FAILED [2022-08-25 23:39:58] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 1, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 512, “lr”: 0.0001, “momentum”: 0.17049052927161668}, “parameter_index”: 0} [2022-08-25 23:40:03] INFO (NNIManager) submitTrialJob: form: { sequenceId: 1, hyperParameters: { value: ‘{“parameter_id”: 1, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 512, “lr”: 0.0001, “momentum”: 0.17049052927161668}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:08] INFO (NNIManager) Trial job YZUnM status changed from WAITING to FAILED [2022-08-25 23:40:08] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 2, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.0001, “momentum”: 0.4641360565997277}, “parameter_index”: 0} [2022-08-25 23:40:13] INFO (NNIManager) submitTrialJob: form: { sequenceId: 2, hyperParameters: { value: ‘{“parameter_id”: 2, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.0001, “momentum”: 0.4641360565997277}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:19] INFO (NNIManager) Trial job VZJ4V status changed from WAITING to FAILED [2022-08-25 23:40:19] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 3, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.1, “momentum”: 0.09260907090425263}, “parameter_index”: 0} [2022-08-25 23:40:24] INFO (NNIManager) submitTrialJob: form: { sequenceId: 3, hyperParameters: { value: ‘{“parameter_id”: 3, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.1, “momentum”: 0.09260907090425263}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:29] INFO (NNIManager) Trial job IwZ3y status changed from WAITING to FAILED [2022-08-25 23:40:29] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 4, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 256, “lr”: 0.01, “momentum”: 0.6022108230948574}, “parameter_index”: 0} [2022-08-25 23:40:34] INFO (NNIManager) submitTrialJob: form: { sequenceId: 4, hyperParameters: { value: ‘{“parameter_id”: 4, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 256, “lr”: 0.01, “momentum”: 0.6022108230948574}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:39] INFO (NNIManager) Trial job ffqVg status changed from WAITING to FAILED [2022-08-25 23:40:39] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 5, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 128, “lr”: 0.0001, “momentum”: 0.9069749674371951}, “parameter_index”: 0} [2022-08-25 23:40:44] INFO (NNIManager) submitTrialJob: form: { sequenceId: 5, hyperParameters: { value: ‘{“parameter_id”: 5, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 128, “lr”: 0.0001, “momentum”: 0.9069749674371951}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:49] INFO (NNIManager) Trial job YQ54k status changed from WAITING to FAILED [2022-08-25 23:40:49] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 6, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 512, “lr”: 0.01, “momentum”: 0.4989949571130713}, “parameter_index”: 0} [2022-08-25 23:40:54] INFO (NNIManager) submitTrialJob: form: { sequenceId: 6, hyperParameters: { value: ‘{“parameter_id”: 6, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 64, “hidden_size”: 512, “lr”: 0.01, “momentum”: 0.4989949571130713}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:40:59] INFO (NNIManager) Trial job O7UIq status changed from WAITING to FAILED [2022-08-25 23:41:00] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 7, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.01, “momentum”: 0.11755641505627679}, “parameter_index”: 0} [2022-08-25 23:41:05] INFO (NNIManager) submitTrialJob: form: { sequenceId: 7, hyperParameters: { value: ‘{“parameter_id”: 7, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 32, “hidden_size”: 1024, “lr”: 0.01, “momentum”: 0.11755641505627679}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:10] INFO (NNIManager) Trial job S975K status changed from WAITING to FAILED [2022-08-25 23:41:10] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 8, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 16, “hidden_size”: 256, “lr”: 0.0001, “momentum”: 0.23273431921548116}, “parameter_index”: 0} [2022-08-25 23:41:15] INFO (NNIManager) submitTrialJob: form: { sequenceId: 8, hyperParameters: { value: ‘{“parameter_id”: 8, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 16, “hidden_size”: 256, “lr”: 0.0001, “momentum”: 0.23273431921548116}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:16] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\stderr at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14) [2022-08-25 23:41:20] INFO (NNIManager) Trial job i1Qt1 status changed from WAITING to FAILED [2022-08-25 23:41:20] INFO (NNIManager) NNIManager received command from dispatcher: TR, {“parameter_id”: 9, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.001, “momentum”: 0.9310719853939587}, “parameter_index”: 0} [2022-08-25 23:41:25] INFO (NNIManager) submitTrialJob: form: { sequenceId: 9, hyperParameters: { value: ‘{“parameter_id”: 9, “parameter_source”: “algorithm”, “parameters”: {“batch_size”: 128, “hidden_size”: 256, “lr”: 0.001, “momentum”: 0.9310719853939587}, “parameter_index”: 0}’, index: 0 }, placementConstraint: { type: ‘None’, gpus: [] } } [2022-08-25 23:41:30] INFO (NNIManager) Trial job GKOE7 status changed from WAITING to FAILED [2022-08-25 23:41:30] INFO (NNIManager) Change NNIManager status from: RUNNING to: NO_MORE_TRIAL [2022-08-25 23:41:30] INFO (NNIManager) Change NNIManager status from: NO_MORE_TRIAL to: DONE [2022-08-25 23:41:30] INFO (NNIManager) Experiment done. [2022-08-26 00:00:27] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\trial.log at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14) [2022-08-26 00:01:47] ERROR (NNIRestHandler) Error: File not found: C:\Users\唐勇强\nni-experiments\jnl15sro\trials\XJ3S3\stderr at LocalTrainingService.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\training_service\local\localTrainingService.js:146:19) at NNIManager.getTrialFile (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\core\nnimanager.js:333:37) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\rest_server\restHandler.js:284:29 at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at next (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:137:13) at Route.dispatch (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\route.js:112:3) at Layer.handle [as handle_request] (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\layer.js:95:5) at D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:281:22 at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:360:14) at param (D:\PythonProgram\Anaconda3\lib\site-packages\nni_node\node_modules\express\lib\router\index.js:371:14)
-
dispatcher.log: [2022-08-25 23:39:42] INFO (numexpr.utils/MainThread) Note: NumExpr detected 16 cores but “NUMEXPR_MAX_THREADS” not set, so enforcing safe limit of 8. [2022-08-25 23:39:42] INFO (numexpr.utils/MainThread) NumExpr defaulting to 8 threads. [2022-08-25 23:39:43] INFO (nni.tuner.tpe/MainThread) Using random seed 1939897128 [2022-08-25 23:39:43] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
-
nnictl_stderr: Experiment jnl15sro start: 2022-08-25 23:39:41.343385
-
nnictl_stdout: Experiment jnl15sro start: 2022-08-25 23:39:41.343385 run: $env:PATH=‘D:\PythonProgram\Anaconda3;D:\PythonProgram\Anaconda3\Library\mingw-w64\bin;D:\PythonProgram\Anaconda3\Library\usr\bin;D:\PythonProgram\Anaconda3\Library\bin;D:\PythonProgram\Anaconda3\Scripts;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;D:\MATLAB\R2021b\runtime\win64;D:\MATLAB\R2021b\bin;C:\Program Files\NVIDIA Corporation\Nsight Compute 2021.1.0;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn;D:\AllSoftwares\Softwares\SVN\bin;C:\Users\唐勇强\AppData\Local\Microsoft\WindowsApps;;D:\PythonProgram\PyCharm Community Edition 2021.3.2\bin;’ $env:NNI_PLATFORM=“local” $env:NNI_EXP_ID=“jnl15sro” $env:NNI_SYS_DIR=“C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM” $env:NNI_TRIAL_JOB_ID=“YZUnM” $env:NNI_OUTPUT_DIR=“C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM” $env:NNI_TRIAL_SEQ_ID=“1” $env:NNI_CODE_DIR=“D:\深度学习\mnist” $PSDefaultParameterValues = @{‘Out-File:Encoding’ = ‘utf8’} cd $env:NNI_CODE_DIR cmd.exe /c ‘python mnist.py’ 1>C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM\stdout 2>C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM\stderr $NOW_DATE = [int64](([datetime]::UtcNow)-(get-date “1/1/1970”)).TotalSeconds $NOW_DATE = “$NOW_DATE” + (Get-Date -Format fff).ToString() Write $LASTEXITCODE " " $NOW_DATE | Out-File “C:\Users\唐勇强\nni-experiments\jnl15sro\trials\YZUnM.nni\state” -NoNewline -encoding utf8 powershell
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
When I changed
experimentWorkingDirectory: "C:\\Users\\唐勇强\\nni-experiments"
toexperimentWorkingDirectory: "D:\\mnist-pytorch\\nni-experiments"
, trials run successfully. Thank you very much for your help!Looks like trials failed for different reasons.
english directory
means that log directory is english rather than chinese