nightly Ludwig seems use ray air API incorrectly
See original GitHub issueDescribe the bug
hyperopt
throws an error and seems it’s incorrect usage on ray air API.
To Reproduce Steps to reproduce the behavior:
# TODO: write our own data loader
mushroom_edibility_df = mushroom_edibility.load()
automl_config = create_auto_config(
dataset=mushroom_edibility_df,
target=target_column,
time_limit_s=time_limit_s,
tune_for_memory=tune_for_memory,
user_config=None,
random_seed=default_random_seed,
use_reference_config=False,
)
auto_train_results = train_with_config(
dataset=mushroom_edibility_df,
config=automl_config,
output_directory=output_dir,
random_seed= default_random_seed,
)
logs
2022-09-13 21:02:26,099 INFO timeout.py:54 -- Reached timeout of 180 seconds. Stopping all trials.
== Status ==
Current time: 2022-09-13 21:02:26 (running for 00:03:00.88)
Memory usage on this node: 3.6/7.7 GiB
Using AsyncHyperBand: num_stopped=2
Bracket: Iter 72.000: 1.0
Resources requested: 0/6 CPUs, 0/0 GPUs, 0.0/4.36 GiB heap, 0.0/2.18 GiB objects
Current best trial: f5a3c492 with metric_score=1.0 and parameters={'trainer.learning_rate': 0.005, 'trainer.decay_rate': 0.8, 'trainer.decay_steps': 500, 'combiner.size': 16, 'combiner.output_size': 16, 'combiner.num_steps': 4, 'combiner.relaxation_factor': 1.0, 'combiner.sparsity': 0.0001, 'combiner.bn_virtual_bs': 256, 'combiner.bn_momentum': 0.2}
Result logdir: /tmp/automl/artifact/hyperopt
Number of trials: 9/10 (9 TERMINATED)
+----------------+------------+----------------+------------------------+------------------------+----------------------+------------------------+------------------------+-----------------+---------------------+----------------------+-----------------------+------------------------+--------+------------------+----------------+
| Trial name | status | loc | combiner.bn_momentum | combiner.bn_virtu... | combiner.num_steps | combiner.output_size | combiner.relaxati... | combiner.size | combiner.sparsity | trainer.decay_rate | trainer.decay_steps | trainer.learning_... | iter | total time (s) | metric_score |
|----------------+------------+----------------+------------------------+------------------------+----------------------+------------------------+------------------------+-----------------+---------------------+----------------------+-----------------------+------------------------+--------+------------------+----------------|
| trial_f2fdbd9c | TERMINATED | 172.17.0.2:362 | 0.05 | 512 | 3 | 8 | 2 | 16 | 0.1 | 0.95 | 500 | 0.025 | 54 | 173.326 | 0.997537 |
| trial_f59ff54c | TERMINATED | 172.17.0.2:396 | 0.4 | 2048 | 9 | 8 | 1.5 | 8 | 0.001 | 0.95 | 10000 | 0.005 | 13 | 76.7784 | 0.985222 |
| trial_f5a3c492 | TERMINATED | 172.17.0.2:398 | 0.2 | 256 | 4 | 16 | 1 | 16 | 0.0001 | 0.8 | 500 | 0.005 | 44 | 167.486 | 1 |
| trial_f5a82dd4 | TERMINATED | 172.17.0.2:400 | 0.2 | 2048 | 6 | 128 | 1 | 24 | 0 | 0.95 | 20000 | 0.01 | 24 | 159.313 | 1 |
| trial_f5acaa94 | TERMINATED | 172.17.0.2:404 | 0.4 | 512 | 9 | 32 | 2 | 24 | 1e-06 | 0.95 | 20000 | 0.005 | 23 | 162.771 | 1 |
| trial_f5b17f6a | TERMINATED | 172.17.0.2:406 | 0.3 | 256 | 4 | 24 | 1.2 | 32 | 0.0001 | 0.9 | 8000 | 0.005 | 45 | 167.616 | 1 |
| trial_f5b87b3a | TERMINATED | 172.17.0.2:725 | 0.3 | 512 | 7 | 128 | 1 | 8 | 0.001 | 0.95 | 10000 | 0.025 | 8 | 72.1376 | 0.997537 |
| trial_285573b8 | TERMINATED | 172.17.0.2:892 | 0.02 | 1024 | 6 | 32 | 2 | 8 | 0 | 0.9 | 10000 | 0.01 | 1 | 5.96841 | 0.561576 |
| trial_582786d0 | TERMINATED | | 0.4 | 256 | 4 | 16 | 2 | 24 | 0.01 | 0.95 | 2000 | 0.005 | | | |
+----------------+------------+----------------+------------------------+------------------------+----------------------+------------------------+------------------------+-----------------+---------------------+----------------------+-----------------------+------------------------+--------+------------------+----------------+
2022-09-13 21:04:26,204 INFO tune.py:758 -- Total run time: 300.97 seconds (180.84 seconds for the tuning loop).
Traceback (most recent call last):
File "auto_train.py", line 182, in <module>
run_job(args.dataset, args.target, args.time_limit_s, args.tune_for_memory ,args.output_directory)
File "auto_train.py", line 48, in run_job
auto_train_results = train_with_config(
File "/usr/local/lib/python3.8/site-packages/ludwig/automl/automl.py", line 209, in train_with_config
hyperopt_results = _train(
File "/usr/local/lib/python3.8/site-packages/ludwig/automl/automl.py", line 308, in _train
hyperopt_results = hyperopt(
File "/usr/local/lib/python3.8/site-packages/ludwig/hyperopt/run.py", line 329, in hyperopt
hyperopt_results = hyperopt_executor.execute(
File "/usr/local/lib/python3.8/site-packages/ludwig/hyperopt/execution.py", line 828, in execute
self._evaluate_best_model(
File "/usr/local/lib/python3.8/site-packages/ludwig/hyperopt/execution.py", line 385, in _evaluate_best_model
os.path.join(best_model_path, "model"),
File "/usr/local/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
File "/usr/local/lib/python3.8/site-packages/ray/air/checkpoint.py", line 616, in __fspath__
raise TypeError(
TypeError: You cannot use `air.Checkpoint` objects directly as paths. Use `Checkpoint.to_directory()` or `Checkpoint.as_directory()` instead.
Expected behavior The job should finish successfully.
Screenshots
Environment (please complete the following information):
- OS: [e.g. iOS] linux
- Version [e.g. 22] same as ludwig container image
- Python version 3.8
- Ludwig version ludwig:release-0.6 ludwig:nightly
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created a year ago
- Comments:5
Top Results From Across the Web
Incorrect formatting of visualization api calls in Ludwig ...
Describe the bug Most visualization api parameters documentation are not correctly formatted. To Reproduce Example of incorrect formatting ...
Read more >Pulse · ludwig-ai/ludwig · GitHub
[tests] Ray nightly image tests with pandas+numpy fails with TensorDType error. #2452 closed Sep 14, 2022. nightly Ludwig seems use ray air API...
Read more >nwL - River Thames Conditions - Environment Agency - GOV.UK
Iowa football undefeated video, Cod-bo2, Tiemeyer mccain church! Compteur edf u1c5, Water air soil pollution abbreviation! Tomax maglieria piano d'api.
Read more >U5L1-Spell-Checker.xml - The Beauty and Joy of Computing
Take any number of input lists, and create a new list containing the items of the input lists. So APPEND [A B] [C...
Read more >qandamaster410.xml - Chegg
... https://www.chegg.com/homework-help/questions-and-answers/air-95-f-1-atm- ... -2k-1-2-k-1-n-problem-says-use-alhazen-s-formula-says-k-1-n--q27981419 0.8 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Jeffwan #2485 should address the issue. Let me know if you still run into any issues!
Thanks for verifying @Jeffwan! I’ve now merged this into the 0.6 release branch in #2491.