Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Gym Retro Stable Baseline ACKTR: Cannot parse tensor from proto: dtype: DT_FLOAT [BUG]

See original GitHub issue

Describe the bug Using Gym retro with ACKTR will cause a TensorFlow Error.

The error is called with this code: model = ACKTR(MlpPolicy,env,verbose=1) Error message:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot parse tensor from proto: dtype: DT_FLOAT
tensor_shape {
  dim {
    size: 215040
  }
  dim {
    size: 215040
  }
}
float_val: 0

         [[{{node kfac/mul}}]]

During handling of the above exception, another exception occurred:

Error Reproduction:

Install Anaconda
create environment for stable baseline
Install TensorFlow 1.14.0
Install Stable Baselines
Install Gym Retro
Run this code:

# -*- coding: utf-8 -*-
"""
Created on Thu Mar  5 08:59:03 2020

@author: MasterTrader
"""

import gym
import retro
from stable_baselines import PPO2, A2C,ACKTR
import numpy as np
from stable_baselines.common.policies import MlpPolicy, MlpLstmPolicy, MlpLnLstmPolicy, CnnLnLstmPolicy, CnnPolicy, CnnLstmPolicy
from stable_baselines.common.vec_env import SubprocVecEnv, DummyVecEnv
from stable_baselines.common import make_vec_env


class Discretizer(gym.ActionWrapper):
    """
    Wrap a gym-retro environment and make it use discrete
    actions for the Sonic game.
    """
    def __init__(self, env):
        super(Discretizer, self).__init__(env)
        buttons = ["B", "A", "MODE", "START", "UP", "DOWN", "LEFT", "RIGHT", "C", "Y", "X", "Z"]
        actions = [['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'],
                   ['DOWN', 'B'], ['B']]
        self._actions = []

        """
        What we do in this loop:
        For each action in actions
            - Create an array of 12 False (12 = nb of buttons)
            For each button in action: (for instance ['LEFT']) we need to make that left button index = True
                - Then the button index = LEFT = True
            In fact at the end we will have an array where each array is an action and each elements True of this array
            are the buttons clicked.
        """
        for action in actions:
            arr = np.array([False] * 12)
            for button in action:
                arr[buttons.index(button)] = True
            self._actions.append(arr)
        self.action_space = gym.spaces.Discrete(len(self._actions))

    def action(self, a): # pylint: disable=W0221
        return self._actions[a].copy()




def main():
  
    print("Nasa Main")
    n_cpu = 10
    env = SubprocVecEnv([lambda:Discretizer(retro.make(game='Airstriker-Genesis')) for i in range(n_cpu)])    
    model = ACKTR(MlpPolicy,env,verbose=1)
    print("model",model)
    model.learn(total_timesteps=100000)
    
    
    obs = env.reset()
    
    while True:
        action,_states = model.predict(obs)
        print(action)
        obs, reward,dones,info = env.step(action)
        env.render()


if __name__ == "__main__":
    main()

Log File ACKTRLog.txt

System Info Describe the characteristic of your environment:

Describe how the library was installed (pip, docker, source, …) PIP. TesnforFlow installed using Anaconda Navigator.
GPU models and configuration CPU ryzen 2700x
Python version Python 3.7.6 from Anaconda Navigator (1.9.12)
Tensorflow version 1.14.0
Versions of any other relevant libraries Conda list codnalist.txt

Additional context I am using Windows 10 64 bit

Issue Analytics

State:
Created 4 years ago
Comments:7

Top GitHub Comments

1reaction

Miffylicommented, Mar 5, 2020

The issue lies in the policy you selected: MlpPolicy treats the input as 1D vector, which in this case is huuuuge, as the environment returns images. Changing to CnnPolicy runs the code, but easily runs out of memory (the images are still big, so you need resizing and whatnot). I also recommend trying out simpler algorithms first (A2C, PPO). ACKTR and ACER are more complex and slower to run.

0reactions

toksiscommented, Mar 6, 2020

thank you. I will look into that. maybe in the future , model.learn(render= yes) as parameter?