Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Test phase on custom environment

See original GitHub issue

Hi,

I’m experimenting with a custom env on rlpyt. I’ve the intention to use different data for training and testing (env shows novel states on testing/evaluation vs training).

I have been using example_1 as stepping stone, so far so good, but I’m not sure how to achieve this last bit: testing.

After running runner.train() (as in example_1, inside the logger), I think I should use runner.evaluate_agent() inside a loop to evaluate the agent several times (or maybe use eval_max_steps=NumEvals as SerialSampler argument?).

But I’m a bit (more) lost on how to “send” the test signal to my environment from here, as only the SerialSampler ‘knows’ where the environtment class is, and I cannot find a way to use it to send some arguments to the env in this phase.

In short (I don’t know if i’m being clear) I need to know how:

to test/eval my agent
to send a “test” argument to my custom env on testing phase

Thank you!

Issue Analytics

State:
Created 3 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

astookecommented, May 21, 2020

Actually the evaluation is performed using a separate instance of your environment, and this can be instantiated using different kwargs than the environment used for training. See env_kwargs and eval_env_kwargs into the sampler. Sounds like exactly what you need. 😃

https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/examples/example_1.py#L29

Thanks for moving your question over!

0reactions

LecJackScommented, Jun 7, 2020

Just for the record ended up solving the last issue like this:

Keep track of episode number on each environment instance (pass log interval as parameters). Define summary writer, and log from environment on each test call, and last 50 train runs before calling to test.

 writer = SummaryWriter()
 epi_len = 65 # Episode number of steps
 log_int = 500 * epi_len

sampler = SerialSampler(
    EnvCls=CustomEnv,
    env_kwargs=dict(id=env_id, mode="train", writer=writer, logInt=log_int),
    eval_env_kwargs=dict(id=env_id, mode="test", writer=writer, logInt=log_int),

And in the environment:

def plot_stats(self, reward):
        # Tensorboard debugging
        if self.mode =='test':
            # Plot all rewards
            self.writer.add_scalars('data/'+str(self.mode), {'reward': reward},
                                                             self.episode)
            # Save last numEvals for statistics
            self.rewardHist[self.numLog%self.numEvals] = reward

            if ((self.episode+1) % self.epiLogInt) == 0:
                # Plot statistics of last self.numEvals
                self.writer.add_scalars('data/'+str(self.mode),
                                    {'mean':   np.mean(self.rewardHist),
                                     'median': np.median(self.rewardHist),
                                     'max':    np.max(self.rewardHist),
                                     'min':    np.min(self.rewardHist)},
                                     self.episode)
                self.writer.add_scalars('data/'+str(self.mode)+"/",
                                            {'std': np.std(self.rewardHist)},
                                             self.episode)
                # Keep same count as training episodes
                self.episode += self.epiLogInt - 1 #self.numEvals
            self.numLog += 1
        else:
            # Want 5 training episodes before the 5 testing episodes
            if self.episode > (self.episode-self.numEvals-1) \
                and ((self.episode-self.numEvals) % self.epiLogInt) >= 0 \
                and ((self.episode-self.numEvals) % self.epiLogInt) < self.numEvals:
                # Plot all rewards
                #self.writer.add_scalars('data/'+str(self.mode), {'reward': reward},
                #                                                 self.episode)
                # Save last numEvals for statistics
                self.rewardHist[self.numLog%self.numEvals] = reward
                # Plot statistics of last self.numEvals
                if (self.numLog%self.numEvals) == (self.numEvals-1):
                    self.writer.add_scalars('data/'+str(self.mode),
                                            {'mean': np.mean(self.rewardHist),
                                             'median': np.median(self.rewardHist),
                                             'max': np.max(self.rewardHist),
                                             'min': np.min(self.rewardHist),
                                             'maxAllTime': self.maxRecord},
                                             self.episode)
                    self.writer.add_scalars('data/'+str(self.mode)+"/",
                                            {'std': np.std(self.rewardHist)},
                                             self.episode)
                self.numLog += 1

Ugly as f*ck, but it works, so I’m closing this issue 😃