Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DictReplayBuffer: TypeError: 'dict' object cannot be interpreted as an integer

See original GitHub issue

I am using DDPG with HER on a custom env. The error happened here, with

self.obs_shape.items() = dict_items([('achieved_goal', (3,)), ('desired_goal', (3,)), ('observation', (64, 64, 4))]), which is also printed below

self.observations = {
            key: np.zeros((self.buffer_size, self.n_envs) + _obs_shape, dtype=observation_space[key].dtype)
            for key, _obs_shape in self.obs_shape.items()
        }

I printed the self.obs_shape.items() as above, where the error happened I also tried to reproduce the dict in a way like this with the same parameters and it worked.

Observation space:

@classmethod
    def getObservationSpace(cls):
        """Return observation_space for gym Env class.
        """
        return gym.spaces.Dict(dict(desired_goal=gym.spaces.Box(-np.inf, np.inf, shape=(3,), dtype=np.float32),achieved_goal=gym.spaces.Box(-np.inf, np.inf, shape=(3,), dtype=np.float32),observation=gym.spaces.Box(low=0.0, high=1.0, shape=(Robot.CAMERA_PIXEL_HEIGHT, Robot.CAMERA_PIXEL_WIDTH, 4), dtype=np.float32),))

    def reset(self):
        """Reset environment.
        """
        self.steps = 0
        self.episode_rewards = 0
        p.resetSimulation()
        # p.setTimeStep(1.0 / 240.0)
        p.setGravity(0, 0, self.GRAVITY)
        self.plane_id = p.loadURDF('plane.urdf')
        self.robot = self.robot_model()
        self.object_ids = []
        for i, (pos, orn) in enumerate(self._generateObjectPositions(num=(self.num_foods+self.num_fakes), radius_scale=self.object_radius_scale, radius_offset=self.object_radius_offset, angle_scale=self.object_angle_scale)):
            if i < self.num_foods:
                urdfPath = 'food_sphere.urdf'
                self.goal = pos
            else:
                urdfPath = 'food_cube.urdf'
            object_id = p.loadURDF(urdfPath, pos, orn, globalScaling=self.object_size)
            
            self.object_ids.append(object_id)
        for i in range(self.BULLET_STEPS):
            p.stepSimulation()
        obs = self._getObservation()
        #self.robot.printAllJointInfo()
        return obs

    def step(self, action):
        """Apply action to environment, then return observation and reward.
        """
        self.steps += 1
        self.robot.setAction(action)
        #reward = -1.0 * float(self.num_foods) / float(self.max_steps) # so agent needs to eat foods quickly
        reward=0
        obs = self._getObservation() #get obs first 
        done = False
        info = {
            'is_success': self.issuccess(obs['achieved_goal'], self.goal),
        }
        for i in range(self.BULLET_STEPS):
            p.stepSimulation()
            reward += self.compute_reward(obs['achieved_goal'], self.goal, info)
        self.episode_rewards += reward

Code for running

import argparse
import gym
import gym_foodhunting
import numpy as np
import time
import tensorflow
import torch

from stable_baselines3 import DDPG, DQN, SAC, TD3
from stable_baselines3.her.her_replay_buffer import HerReplayBuffer
from stable_baselines3.her.goal_selection_strategy import GoalSelectionStrategy
from stable_baselines3.td3.policies import CnnPolicy, MlpPolicy, MultiInputPolicy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_checker import check_env

model_class = DDPG
def learn(env_name, save_file, total_timesteps):
    #env =make_robotics_env(env_name,seed=0)
    env = gym.make(env_name)
    check_env(env)
    #env = HERGoalEnvWrapper(env)
    goal_selection_strategy = 'future'
    online_sampling = True
    replay_buffer_class=HerReplayBuffer
    replay_buffer_kwargs=dict(
        n_sampled_goal=4,
        goal_selection_strategy=goal_selection_strategy,
        online_sampling=online_sampling,
        max_episode_length=80,
    )
    model = model_class(
      'MultiInputPolicy',
      env,
      replay_buffer_class,
    # Parameters for HER
      replay_buffer_kwargs,
      verbose=1,
    )
    #env = SubprocVecEnv([make_env(env_name, i, seed=0) for i in range(8)])
    
    
    model.learn(total_timesteps=total_timesteps)
    model.save(save_file)
    del model
    env.close()

Traceback:

pybullet build time: Jul  4 2021 01:27:40
2021-07-04 01:39:22.108965: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/common/env_checker.py:27: UserWarning: It seems that your observation  is an image but the `dtype` of your observation_space is not `np.uint8`. If your observation is not an image, we recommend you to flatten the observation to have only a 1D vector
  f"It seems that your observation {key} is an image but the `dtype` "
/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/common/env_checker.py:35: UserWarning: It seems that your observation space  is an image but the upper and lower bounds are not in [0, 255]. Because the CNN policy normalize automatically the observation you may encounter issue if the values are not in that range.
  f"It seems that your observation space {key} is an image but the "
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
dict_items([('achieved_goal', (3,)), ('desired_goal', (3,)), ('observation', (64, 64, 4))])
Traceback (most recent call last):
  File "examples/example_rl1.py", line 77, in <module>
    learn(args.env_name, args.filename, args.total_timesteps)
  File "examples/example_rl1.py", line 39, in learn
    verbose=1,
  File "/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/ddpg/ddpg.py", line 115, in __init__
    self._setup_model()
  File "/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/td3/td3.py", line 125, in _setup_model
    super(TD3, self)._setup_model()
  File "/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/common/off_policy_algorithm.py", line 218, in _setup_model
    **self.replay_buffer_kwargs,
  File "/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/common/buffers.py", line 510, in __init__
    for key, _obs_shape in self.obs_shape.items()
  File "/content/gdrive/My Drive/hsr/stable-baselines3/stable_baselines3/common/buffers.py", line 510, in <dictcomp>
    for key, _obs_shape in self.obs_shape.items()
TypeError: 'dict' object cannot be interpreted as an integer

I made adjustment based on the warnings from check_env, same error happened, so it seemed that the error is not from custom env.

Issue Analytics

State:
Created 2 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

2reactions

Miffylicommented, Jul 5, 2021

@JimLiu1213 Unfortunately we do not offer custom tech support like this, but I think the issue lies in using images with HER (the replay buffer takes a lot of space). You can either upgrade and get more memory, or you could try using a smaller replay buffer size.

0reactions

JimLiu1213commented, Jul 5, 2021

Thank you… Also, when running with my custom env with DDPG+HER, this showed up. For my env, with space: dict_items([(‘achieved_goal’, (3,)), (‘desired_goal’, (3,)), (‘observation’, (64, 64, 4))]), observaation is a image. I don’t think this could take more than the 12GB RAM that colab provide? I run the multi_input_env with the same code, and it worked. But not my env Could you please give me some hint about the issue, cause I have googled a lot without any answer. Do I need to upgrade to Colab pro?

2021-07-05 14:07:51.888852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
tcmalloc: large alloc 16384000000 bytes == 0x55a047adc000 @  0x7f5ff4796001 0x7f5ff1bc154f 0x7f5ff1c11b58 0x7f5ff1c15b17 0x7f5ff1cb4203 0x55a03b697d54 0x55a03b697a50 0x55a03b70c105 0x55a03b7067ad 0x55a03b6993ea 0x55a03b7073b5 0x55a03b7067ad 0x55a03b699c9f 0x55a03b6dad79 0x55a03b6d7cc4 0x55a03b698559 0x55a03b70c4f8 0x55a03b69930a 0x55a03b70b7f0 0x55a03b7067ad 0x55a03b6993ea 0x55a03b70760e 0x55a03b7067ad 0x55a03b699c9f 0x55a03b6dad79 0x55a03b6d7cc4 0x55a03b698559 0x55a03b70c4f8 0x55a03b69930a 0x55a03b7073b5 0x55a03b7064ae
^C