neptune.ai logger console error: X-coordinates must be strictly increasing
See original GitHub issue🐛 Bug
When using the neptune.ai logger, epochs automatically get logged, even though I never explicitly told it to do so. Also, I get an error, yet everything seems to get logged correctly (apart from the epochs, which also get logged every training step and not every epoch):
WARNING:neptune.internal.channels.channels_values_sender:Failed to send channel value: Received batch errors sending channels' values to experiment BAC-32. Cause: Error(code=400, message='X-coordinates must be strictly increasing for channel: 2262414e-e5fc-4a8f-b3f8-4a8d84d7a5e2. Invalid point: InputChannelValue(2020-03-18T17:18:04.233Z,Some(164062.27599999998),None,Some(Epoch 35: 99%|#######', type=None) (metricId: '2262414e-e5fc-4a8f-b3f8-4a8d84d7a5e2', x: 164062.27599999998) Skipping 3 values.
To Reproduce
Steps to reproduce the behavior:
Write a lightning module and use the neptune.ai logger. See my own code below.
Code sample
class MemoryTest(pl.LightningModule):
# Main Testing Unit for Experiments on Recurrent Cells
def __init__(self, hp):
super(MemoryTest, self).__init__()
self.predict_col = hp.predict_col
self.n_datasamples = hp.n_datasamples
self.dataset = hp.dataset
if self.dataset is 'rand':
self.seq_len = None
else:
self.seq_len = hp.seq_len
self.hparams = hp
self.learning_rate = hp.learning_rate
self.training_losses = []
self.final_loss = None
self.model = RecurrentModel(1, hp.n_cells, hp.n_layers, celltype=hp.celltype)
def forward(self, input, input_len):
return self.model(input, input_len)
def training_step(self, batch, batch_idx):
x, y, input_len = batch
features_y = self.forward(x, input_len)
loss = F.mse_loss(features_y, y)
mean_absolute_loss = F.l1_loss(features_y, y)
self.training_losses.append(mean_absolute_loss.item())
tensorboard_logs = {'batch/train_loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss}
return {'loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss, 'log': tensorboard_logs}
def on_epoch_end(self):
train_loss_mean = np.mean(self.training_losses)
self.final_loss = train_loss_mean
self.logger.experiment.log_metric('epoch/mean_absolute_loss', y=train_loss_mean, x=self.current_epoch)
self.training_losses = [] # reset for next epoch
def on_train_end(self):
self.logger.experiment.log_text('network/final_loss', str(self.final_loss))
def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=self.learning_rate)
@pl.data_loader
def train_dataloader(self):
train_dataset = dg.RandomDataset(self.predict_col, self.n_datasamples)
if self.dataset == 'rand_fix':
train_dataset = dg.RandomDatasetFix(self.predict_col, self.n_datasamples, self.seq_len)
if self.dataset == 'correlated':
train_dataset = dg.CorrelatedDataset(self.predict_col, self.n_datasamples)
train_loader = DataLoader(dataset=train_dataset, batch_size=1)
return train_loader
@staticmethod
def add_model_specific_args(parent_parser):
# MODEL specific
model_parser = ArgumentParser(parents=[parent_parser])
model_parser.add_argument('--learning_rate', default=1e-2, type=float)
model_parser.add_argument('--n_layers', default=1, type=int)
model_parser.add_argument('--n_cells', default=5, type=int)
model_parser.add_argument('--celltype', default='LSTM', type=str)
# training specific (for this model)
model_parser.add_argument('--epochs', default=500, type=int)
model_parser.add_argument('--patience', default=200, type=int)
model_parser.add_argument('--min_delta', default=0.01, type=float)
# data specific
model_parser.add_argument('--n_datasamples', default=1000, type=int)
model_parser.add_argument('--seq_len', default=10, type=int)
model_parser.add_argument('--dataset', default='rand', type=str)
model_parser.add_argument('--predict_col', default=2, type=int)
return model_parser
def main(hparams):
neptune_logger = NeptuneLogger(
project_name="dunrar/bachelor-thesis",
params=vars(hparams),
)
early_stopping = EarlyStopping('batch/mean_absolute_loss', min_delta=hparams.min_delta, patience=hparams.patience)
model = MemoryTest(hparams)
trainer = pl.Trainer(logger=neptune_logger,
gpus=hparams.cuda,
val_percent_check=0,
early_stop_callback=early_stopping,
max_epochs=hparams.epochs)
trainer.fit(model)
Expected behavior
Epochs should not be logged without my explicit instruction. Also there should be no error when running that code.
Issue Analytics
- State:
- Created 4 years ago
- Comments:45 (27 by maintainers)
Okay, I’ll compare, make a few updates/downgrades and report back
Sorry for the trouble @Dunrar 😃 We will work on a better Windows experience.
I will post updates about this here once I have them.