Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues on getting antmaze-medium-play-v0 results with iql

See original GitHub issue

Hi there, Thank you for releasing the CORL benchmark. I cloned the latest repo and using parameters as below to run antmaze-medium-play-v0 experiment. However, I got all near 0 normalized reward from the first 430,000 gradient step.

I did not change the code except using these parameters:

class TrainConfig:
    # Experiment
    device: str = "cpu"
    env: str = "antmaze-medium-play-v0"  # OpenAI gym environment name
    seed: int = 0  # Sets Gym, PyTorch and Numpy seeds
    eval_freq: int = int(1e4)  # How often (time steps) we evaluate
    n_episodes: int = 100  # How many episodes run during evaluation
    max_timesteps: int = int(1e6)  # Max time steps to run environment
    checkpoints_path: str = "./models/iql"  # Save path
    load_model: str = ""  # Model load file name, "" doesn't load
    # IQL
    buffer_size: int = 10_000_000  # Replay buffer size
    batch_size: int = 256  # Batch size for all networks
    discount: float = 0.99  # Discount factor
    tau: float = 0.005  # Target network update rate
    beta: float = 10.0  # Inverse temperature. Small beta -> BC, big beta -> maximizing Q
    iql_tau: float = 0.9  # Coefficient for asymmetric loss
    iql_deterministic: bool = False  # Use deterministic actor
    normalize: bool = True  # Normalize states
    normalize_reward: bool = False  # Normalize reward
    # Wandb logging
    project: str = "CORL-default"
    group: str = "IQL-D4RL"
    name: str = "IQL"

And the results are as below:

 % python iql.py
objc[33597]: Class GLFWApplicationDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13778) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc7e8). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindowDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13700) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc810). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWContentView is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa137a0) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc860). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindow is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13818) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc8d8). One of the two will be used. Which one is undefined.
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: Oct 16 2022 01:59:14
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/envs/registration.py:505: UserWarning: WARN: The environment antmaze-medium-play-v0 is out of date. You should consider upgrading to version `v2` with the environment ID `antmaze-medium-play-v2`.
  logger.warn(
/Users/xxx/Documents/project_offlineexploration/D4RL_6330b4e09e36a80f4b706a3885d59d97745c05a9/d4rl/locomotion/ant.py:180: UserWarning: This environment is deprecated. Please use the most recent version of this environment.
  offline_env.OfflineEnv.__init__(self, **kwargs)
Target Goal:  (20.64647417679362, 21.089515421327548)
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
load datafile: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00,  2.14it/s]
Dataset size: 999092
Checkpoints path: ./models/iql
---------------------------------------
Training IQL, Env: antmaze-medium-play-v0, Seed: 0
---------------------------------------
wandb: Currently logged in as: lxu. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.13.4 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.21
wandb: Run data is saved locally in /Users/xxx/Documents/default_repo/CORL/algorithms/wandb/run-20221019_133015-2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run IQL
wandb: ⭐️ View project at https://wandb.ai/xxx/CORL-default
wandb: 🚀 View run at https://wandb.ai/xxx/CORL-default/runs/2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: WARNING Calling wandb.run.save without any arguments is deprecated.Changes to attributes are automatically persisted.