Issues on getting antmaze-medium-play-v0 results with iql
See original GitHub issueHi there,
Thank you for releasing the CORL benchmark. I cloned the latest repo and using parameters as below to run antmaze-medium-play-v0
experiment. However, I got all near 0 normalized reward from the first 430,000 gradient step.
I did not change the code except using these parameters:
class TrainConfig:
# Experiment
device: str = "cpu"
env: str = "antmaze-medium-play-v0" # OpenAI gym environment name
seed: int = 0 # Sets Gym, PyTorch and Numpy seeds
eval_freq: int = int(1e4) # How often (time steps) we evaluate
n_episodes: int = 100 # How many episodes run during evaluation
max_timesteps: int = int(1e6) # Max time steps to run environment
checkpoints_path: str = "./models/iql" # Save path
load_model: str = "" # Model load file name, "" doesn't load
# IQL
buffer_size: int = 10_000_000 # Replay buffer size
batch_size: int = 256 # Batch size for all networks
discount: float = 0.99 # Discount factor
tau: float = 0.005 # Target network update rate
beta: float = 10.0 # Inverse temperature. Small beta -> BC, big beta -> maximizing Q
iql_tau: float = 0.9 # Coefficient for asymmetric loss
iql_deterministic: bool = False # Use deterministic actor
normalize: bool = True # Normalize states
normalize_reward: bool = False # Normalize reward
# Wandb logging
project: str = "CORL-default"
group: str = "IQL-D4RL"
name: str = "IQL"
And the results are as below:
% python iql.py
objc[33597]: Class GLFWApplicationDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13778) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc7e8). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindowDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13700) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc810). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWContentView is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa137a0) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc860). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindow is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13818) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc8d8). One of the two will be used. Which one is undefined.
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: Oct 16 2022 01:59:14
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/envs/registration.py:505: UserWarning: WARN: The environment antmaze-medium-play-v0 is out of date. You should consider upgrading to version `v2` with the environment ID `antmaze-medium-play-v2`.
logger.warn(
/Users/xxx/Documents/project_offlineexploration/D4RL_6330b4e09e36a80f4b706a3885d59d97745c05a9/d4rl/locomotion/ant.py:180: UserWarning: This environment is deprecated. Please use the most recent version of this environment.
offline_env.OfflineEnv.__init__(self, **kwargs)
Target Goal: (20.64647417679362, 21.089515421327548)
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
load datafile: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:03<00:00, 2.14it/s]
Dataset size: 999092
Checkpoints path: ./models/iql
---------------------------------------
Training IQL, Env: antmaze-medium-play-v0, Seed: 0
---------------------------------------
wandb: Currently logged in as: lxu. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.13.4 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.21
wandb: Run data is saved locally in /Users/xxx/Documents/default_repo/CORL/algorithms/wandb/run-20221019_133015-2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run IQL
wandb: βοΈ View project at https://wandb.ai/xxx/CORL-default
wandb: π View run at https://wandb.ai/xxx/CORL-default/runs/2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: WARNING Calling wandb.run.save without any arguments is deprecated.Changes to attributes are automatically persisted.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Insight IQL search gives wrong result when using ... - Jira
Issue Summary. Insight IQL search gives wrong result when using "!=" and "NOT IN" operators. This happens when using an attribute name of...
Read more >Search for objects in Insight Asset Management for Jira
Learn different ways to search for objects in Insight, how to save searches and customize the way you view the results.
Read more >Insight 5.5 - Support for "OR" in IQL - YouTube
With the release of Insight 5.5 IQL now supports "OR" adding a lot of power to the Insight platform.
Read more >IQL Search - Dashboard Hub Documentation
IQL SEARCH Display corporate assets and their information based on a IQL query. ... However, if you select all, the results could change...
Read more >API Reference - Documentation - Impira
Switch back and forth between each view to get a different perspective on your result. Filters, sorting, limits, and offsets. The remainder of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My problem is solved by setting
normalize_reward
toTrue
.Thanks for your report. Antmaze configs are fixed now https://github.com/tinkoff-ai/CORL/pull/8