Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Running out of memory during training

See original GitHub issue

When training with custom eval metric (pearson corr), after first evaluation my colab session runs out of memory.

What is the current behavior? Training of TabNetRegressor starts fine and after first evaluation round, I run out of memory. I am training the model on GPU 16GB and free RAM is approx 40 GB. The RAM consumption during training steadily increases. I am training on a pretty large dataset (11 GB)

Expected behavior

I would expect that the RAM consumption is more or less constant during training, once the model is initialized.

Screenshots

max_epochs = 2
batch_size = 1028
model = TabNetRegressor(
                       optimizer_fn=torch.optim.Adam,
                       optimizer_params=dict(lr=1e-2)
                      )

model.fit(
    X_train=factors_train[features].to_numpy(), y_train=factors_train.target.to_numpy().reshape((-1,1)),
    eval_set=[(factors_test[features].to_numpy(), factors_test.target.to_numpy().reshape((-1,1)))],
    eval_name=['test'],
    eval_metric=[PearsonCorrMetric],
    max_epochs=max_epochs , patience=5,
    batch_size=batch_size,
    virtual_batch_size=128,
    num_workers=0,
    drop_last=False
)

class PearsonCorrMetric(Metric):
  def __init__(self):
    self._name = "pearson_corr"
    self._maximize = True
  
def __call__(self, y_true, y_score):
    return corr_score(y_true, y_score)[1]

def corr_score(y_true, y_pred):
    return "score", np.corrcoef(y_true, y_pred)[0,1], True

Other relevant information: poetry version: ? python version: 3.8 Operating System: Ubuntu Additional tools:

Additional context

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    24W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Issue Analytics

State:
Created a year ago
Comments:13

Top GitHub Comments

1reaction

Kayne88commented, Sep 6, 2022

TRAIN (1914562, 1214) - TEST (476390, 1214) RMSE actually works 😃

0reactions

Optimoxcommented, Sep 12, 2022

@Kayne88 thank you very much for sharing your results.

The model learns to pay attention to specific features in order to minimize the loss function. Some features might end up masked out if they correlate too much with a better feature, however you’ll have no guarantee that this is the case. You could simply remove those feature before training.

However you can play with hyperparameters to get closer to what you want:

lambda_sparse : the bigger this is the sparsier your mask will be. So setting this to a score > 0 might ensure that the model won’t look at two correlated features.
gamma : a large gamma (gamma values should stay between 1 and 5 max I’d recommend) will forbid the model to reuse the same features at different steps. So if you don’t want weak correlated features to be used by the model you can set a high gamma
n_steps : the more steps the more features your model will be able to pick at some point.

All these recommendations have no guarantee of working. This is just my general understanding but you should experiment on them and see how it goes.

Good luck!