Multiple synchronization of a running offline job
See original GitHub issueHi,
I’m working with Slurm and my compute nodes do not have access to the internet. Therefore I’m using the offline feature of neptune.
I was wondering if it is possible to sync an offline job while it is running (as in my case neptune is writing on a shared disk, I can sync the .neptune folder from another node with internet access). I’d like to monitor my training while it is running to see in advance how it performs.
I have tried neptune sync and it seems to work but only the first time. Then when I synchronize again it does not upload any more logs until the end of the program
Reproduction
Here is an example to reproduce what I experienced on a single computer with internet access
import neptune.new as neptune
run = neptune.init(mode="offline")
while True:
x = float(input("New log:"))
run["inputs"].log(x)
In a shell, let’s run the script and enter a few values
# Shell 1
$ python my_script.py
New log: 0.5
New log: 1.5
New log: 2.5
After this let’s synchronize this experiment, while keeping the script running
# Shell 2
$ neptune sync
Offline run run__01ee0d02-dfcc-4424-a0ce-d99cec75733a registered as raphaelreme/MyExp/TRAC-10
Synchronising raphaelreme/MyExp/TRAC-10
Synchronization of run raphaelreme/MyExp/TRAC-10 completed.
Now I see my new run, with three inputs: 0.5, 1.5, 2.5
But if I continue to log data in my script:
# Shell 1
New log: 2.5 # From before
New log: 3.5
New log: 4.5
And try to sync it again:
#Shell 2
$ neptune sync
Synchronising raphaelreme/MyExp/TRAC-10
Synchronization of run raphaelreme/MyExp/TRAC-10 completed.
Nothing more will come in my dashboard: still only three inputs (0.5, 1.5, 2.5, and no sign of 3.5, 4.5)
I found out that stopping the script and then synchronizing it afterward, will log the missing data. But if my training is long I don’t really want to wait for the end to watch the training curves (Even if I have other ways to see the logs it would be nice to be able to manually sync neptune while the training is running)
Issue Analytics
- State:
- Created a year ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
Hi @raphaelreme,
Prince here, from Neptune.ai
Thank for reaching out!
That’s interesting use case, I will run a few tests and perhaps talk to the devs and get back to you!
Hi @raphaelreme
I’m closing this issue as it’s a bit stale,
Feel free to leave a comment here if you still need help with this so we can re-open the issue,
Until then have a great day.