Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] VecEnv GPU optimizations

See original GitHub issue

Question

Are the vector envs in stable-baselines3 GPU-optimizable? I note that models can have their parameters loaded into GPU memory with the device attribute. However, during training tensors between the policy and the env undergo conversions between GPU <> CPU as well as PyTorch <> NumPy.

Additional context

For example in OnPolicyAlgorithm.collect_rollouts():

            with th.no_grad():
                # Convert to pytorch tensor
                obs_tensor = th.as_tensor(self._last_obs).to(self.device)
                actions, values, log_probs = self.policy.forward(obs_tensor)
            actions = actions.cpu().numpy()  # <--

            # Rescale and perform action
            clipped_actions = actions
            # Clip the actions to avoid out of bound error
            if isinstance(self.action_space, gym.spaces.Box):
                clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)

            new_obs, rewards, dones, infos = env.step(clipped_actions)

the actions tensor is removed from GPU and converted to a NumPy array. It seems that if there was a VecEnv that supported tensors then this step could be forgoed and the propagation could stay on the Cuda device. Unless I am misinterpreting something.

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 3 years ago
Comments:9 (1 by maintainers)

Top GitHub Comments

7reactions

MetcalfeTomcommented, Feb 28, 2021

Wow! Thanks all for the discussion. Seems like I’ve just uncovered the tip of the iceberg 🤓

Heeding the advice, I decided to publish my code to a fork here for any curious future readers. No doubt I will be continuing the discussion in other places, but will close the issue from here for now 👍

5reactions

MetcalfeTomcommented, Feb 13, 2021

Thanks for the pointers! I managed to get some benchmarks. I trained the PPO model on a custom VecEnv version of cartpole that vectorizes the step() and reset() methods. That eliminated the loop in Vecenv.step_wait(). Then as you mentioned, there were some modifications to the rollout buffers to store tensors/compute advantage as well as numerous conversions from numpy to torch operations throughout the codebase to support it. I timed the execution of PPO.learn() a high number of environments across a few different batch sizes (all runs were done on an Nvidia TITAN RTX)

which is a nice result, but these particular hyperparameters may be uncommon (particularly for cartpole). With a smaller numbers of parallel environments, the optimization is not so profound but still quicker. Average reward of the PPO was roughly the same.

The majority of the work is in building the tensor version of the environment - like you mentioned stable-baselines may not be the place for it. But it seems like It would speed up policy development.

Top Results From Across the Web

The 37 Implementation Details of Proximal Policy Optimization

According to a GitHub issue, one maintainer suggests ppo2 should offer better GPU utilization by batching observations from multiple simulation ...

FAQ — ElegantRL 0.3.1 documentation

This document contains the most frequently asked questions related ... GPU_ids to None (you cannot use GPU-accelerated VecEnv in this case).

Question about Vectorized Environments and GPU/Cuda ...

Question about Vectorized Environments and GPU/Cuda training. I have a bunch of questions that I can't seem to find any good answers to, ......

Stable Baselines Documentation - Read the Docs

A best practice when you apply RL to a new problem is to do automatic hyperparameter optimization. Again, this is included in the...

VectorEnv API — Ray 2.2.0

rllib.env.vector_env.VectorEnv# · make_env – Factory that produces a new gym. · existing_envs – Optional list of already instantiated sub environments. · num_envs ...