Too many open files under RustBoard (EMFILE)
See original GitHub issueI am getting a lot of warnings about too many open files – is there a way to reduce or cap the number of open file descriptors?
2021-05-11T14:31:46Z WARN rustboard_core::run] Failed to open event file EventFileBuf("[RUN NAME]"): Os { code: 24, kind: Other, message: "Too many open files" }
I don’t have that many runs (~2000), so it shouldn’t really be an issue. Using lsof to count the number of open FDs shows over 12k being used…
>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
6210 tokio-run
6210 Reloader-
1035 StdinWatc
1035 server
1035 Reloader
184 gmain
168 gdbus
134 grpc_glob
85 bash
80 snapd
Compared to <500 in “slow” mode.
>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
427 tensorboa
184 gmain
168 gdbus
85 bash
80 snapd
72 systemd
71 screen
52 dconf\x20
51 dbus-daem
48 llvmpipe-
In my case, the “slow” mode actually loads files faster since it doesn’t run into this issue.
_Originally posted by @Raphtor in https://github.com/tensorflow/tensorboard/issues/4784#issuecomment-838599948_
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
node and Error: EMFILE, too many open files - Stack Overflow
I used this command to test the number of files that were opened after doing various events in my app. lsof -i -n...
Read more >"EMFILE: too many open files" after upgrading to 0.15.6
After upgrading to 0.15.6, Obsidian can't open any vaults, showing the “An error occurred while loading Obsidian. EMFILE: too many open ...
Read more >async_listen::errors - Rust - Docs.rs
List of errors having a hint: Too many open files / EMFILE; Too many open files in system / ENFILE. Too Many Open...
Read more >How to Fix the 'Too Many Open Files' Error in Linux?
It means that a process has opened too many files (file descriptors) and cannot open new ones. On Linux, the “max open file...
Read more >Unable to deploy Angular .NET Core 3,1 Web App to Azure ASE
Your open channel to Microsoft engineering teams ... Error: EMFILE: too many open files, open ... Thanks for posting in Developer Community.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ve also encountered the problem and found that raising the “open files” limit by executing e.g.
ulimit -n 50000
solves the problem for me (without requiring superuser permissions).
Hi @Raphtor! Thanks for the report and the helpful info. Some questions:
Could you run
diagnose_tensorboard.py
in the same environment from which you usually run TensorBoard and post the full output in a comment (sanitizing as desired if you want to redact anything)?Are you able to share the log directory with us? If not, could you describe the structure of the event files? You say that you only have ~2000 runs, but I wonder if each run tends to have many event files (can happen if your training workers restart a lot). If so, it’s possible that that explains the difference, since the particulars around how we handle multiple event files in the same directory differ somewhat.
Broadly, there are three potential behaviors. In all cases, we read all event files in lexicographical order. When we hit EOF on an event file, we keep polling it iff…
TensorBoard with
--load_fast=false
uses last-file mode by default (and can also be told to use multifile mode), but with--load_fast=true
uses all-files mode.Can you also reproduce the issue when running TensorBoard with
? Same train of thought as above; this enables multifile mode with an unbounded age threshold, making it equivalent to all-files mode. If this reproduces the issue, we can probably fix this by making
--load_fast=true
also implement last-file and/or multifile modes, which would be nice, anyway.What
lsof
do you have? Mylsof
(4.93.2, Linux) uses the first column for the command name, but (e.g.)tensorboard
andbash
are process names whereasReloader
andStdinWatcher
are thread names. So mylsof
output has lines like:…and I don’t see how your
lsof | awk '{ print $1 }'
is giving the output that you’re seeing. Probably just a reporting thing, but I’d like to be able to reproduce your interaction if possible.