I want to know differences between 2 approaches of creating vectorized environment in stable baseline3
See original GitHub issueI have a custom env and it is made for a single agent below is small snippet:
🤖 Custom Gym Environment
def step(self,action):
k=self.rad_curvature
u1=self.insertion_depth
#un normalise the action
u2=math.pi*(action[0]+1)
print("action",u2)
xi=self.state[0]
yi=self.state[1]
zi=self.state[2]
qi1=self.state[3]
qi2=self.state[4]
qi3=self.state[5]
qi4=self.state[6]
curr_needle_mtx=self.action_model.homogenous_matrix(xi,yi,zi,qi1,qi2,qi3,qi4)
next_needle_mtx=self.action_model.NeedleModelBicycle(curr_needle_mtx,k,u2,u1)
xnew=next_needle_mtx[0,3]
ynew=next_needle_mtx[1,3]
znew=next_needle_mtx[2,3]
rot_mtx=next_needle_mtx[0:3,0:3]
ar=R.from_matrix(rot_mtx)
Q=ar.as_quat()
self.state_temp[0]=xnew
self.state_temp[1]=ynew
self.state_temp[2]=znew
self.state_temp[3]=Q[0]
self.state_temp[4]= Q[1]
self.state_temp[5]= Q[2]
self.state_temp[6]= Q[3]
print("tempstate: ",self.state_temp)
self.new_state=self.state_temp.copy()
done=False
self.num_steps+=1
self.reward,done,res,self.num_times_tum_reached=state_reward_validator(self.img_arr,curr_state=self.state_temp,prev_state=self.state,reward=self.reward,num_steps=self.num_steps,steps_to_reset=self.steps_to_reset,done=done,goal_point=self.goal_point,num_times_tum_reached=self.num_times_tum_reached)
### Describe the bug 1)What i want to know is difference between 2 approaches given in stable baseline to create vectorized environment.
a)This approach is given in stable baslines 3 eg link def make_env(rank, seed=0):
### Code example
def make_env2(rank, seed=0):
def _init():
env = NeuroRL4(label_name)
env.seed(seed + rank)
return env
set_random_seed(seed)
return _init
num_cpu = 4
# Create the vectorized environment
env = DummyVecEnv([make_env(i) for i in range(num_cpu)])
model = DDPG('MlpPolicy', env,train_freq=1, gradient_steps=-1, verbose=1)
model.learn(total_timesteps=250)
b) This approach was told by in my last github issue link
env2=NeuroRL4(label_name)
env2 = make_vec_env(lambda: env2, n_envs=4)
model = DDPG('MlpPolicy', env2,train_freq=1,gradient_steps=-1, verbose=1)
model.learn(total_timesteps=250)
The real difference comes is in ‘a’ method i can clearly see 4 environments getting executed as my env outputs starting and goal points every time an environment is defined so in ‘a’ method 4 points were printed out whereas in ‘b’ method i cant see any such thing and it seems to work only on 1 environment.
### System Info My environment consists of a 3d numpy array which has obstacles and a target ,my plan is to make my agent which follows a action model to reach the target:
- I am using colab
- how the library was installed : !pip install stable-baselines3[extra]
- Python: 3.7.14 *Stable-Baselines3: 1.6.1 *PyTorch: 1.12.1+cu113 *GPU Enabled: True *Numpy: 1.21.6 *Gym: 0.21.0 ### Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
- I have checked my env using the env checker (required)
Issue Analytics
- State:
- Created a year ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
in the colab, it is fine as we are creating only one env. But it should be updated to avoid misleading users.
@araffin Thanks a lot it worked,also can we directly add noise like ou noise like we did for single env or do we now need to add vectorised noise.