New Files dont get recognized on Docker SMB-Volume
See original GitHub issueHi,
my scanner puts all scanned documents in a SMB-Share. This Share is then mounted with smb into the container.
At startup all Documents get ocr’ed, but if any documents get added after startup they will not recognized.
In my limited understanding i think that inotify does not work with docker and smb-shares.
Thank you very much… mg
docker-compose
version: '3'
services:
######## ocrmypdf-auto ########
ocrmypdf-auto:
container_name: "ocrmypdf-auto"
image: cmccambridge/ocrmypdf-auto
restart: always
environment:
- TZ=Europe/Berlin
- 'OCR_LANGUAGES=deu eng'
- OCR_OUTPUT_MODE=SINGLE_FOLDER
- OCR_PROCESS_EXISTING_ON_START=1
- OCR_ACTION_ON_SUCCESS=NOTHING
- UID=1000
- GID=1000
- USERMAP_UID=1000
- USERMAP_GIH=1000
volumes:
- scan_input:/input
- scan_output:/output
- config:/config
######## Volumes ########
volumes:
config:
scan_input:
driver: local
driver_opts:
type: "cifs"
o: "user=ocrmypdf,password=XXXXX,rw"
device: "//192.168.2.36/scans"
scan_output:
driver: local
driver_opts:
type: "cifs"
o: "user=ocrmypdf,password=XXXXX,rw"
device: "//192.168.2.36/scans/output"
Log after lat startup
ocrmypdf-auto | 2021-01-21 14:27:49 - Watching /input
ocrmypdf-auto | 2021-01-21 14:27:49 - Processing: /input/20210109_000066.pdf -> /output/20210109_000066.pdf
ocrmypdf-auto | 2021-01-21 14:27:49 - Processing: /input/20210112_000122.pdf -> /output/20210112_000122.pdf
ocrmypdf-auto | 2021-01-21 14:27:49 - Processing: /input/20210109_000070.pdf -> /output/20210109_000070.pdf
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing complete in 3.720000 seconds with status 5: /input/20210109_000066.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210109_000066.pdf/output/20210109_000066.pdf53.720000
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing: /input/20210112_000108.pdf -> /output/20210112_000108.pdf
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing complete in 3.720000 seconds with status 5: /input/20210109_000070.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210109_000070.pdf/output/20210109_000070.pdf53.720000
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing: /input/20210109_000077.pdf -> /output/20210109_000077.pdf
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing complete in 3.760000 seconds with status 5: /input/20210112_000122.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000122.pdf/output/20210112_000122.pdf53.760000
ocrmypdf-auto | 2021-01-21 14:27:53 - Processing: /input/20210115_000222.pdf -> /output/20210115_000222.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.760000 seconds with status 5: /input/20210109_000077.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210109_000077.pdf/output/20210109_000077.pdf50.760000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210121_000249.pdf -> /output/20210121_000249.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.830000 seconds with status 5: /input/20210112_000108.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000108.pdf/output/20210112_000108.pdf50.830000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210118_000237.pdf -> /output/20210118_000237.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.790000 seconds with status 5: /input/20210115_000222.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210115_000222.pdf/output/20210115_000222.pdf50.790000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210112_000217.pdf -> /output/20210112_000217.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.780000 seconds with status 5: /input/20210121_000249.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210121_000249.pdf/output/20210121_000249.pdf50.780000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210112_000172.pdf -> /output/20210112_000172.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.750000 seconds with status 5: /input/20210118_000237.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210118_000237.pdf/output/20210118_000237.pdf50.750000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210111_000105.pdf -> /output/20210111_000105.pdf
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing complete in 0.780000 seconds with status 5: /input/20210112_000217.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000217.pdf/output/20210112_000217.pdf50.780000
ocrmypdf-auto | 2021-01-21 14:27:54 - Processing: /input/20210109_000072.pdf -> /output/20210109_000072.pdf
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing complete in 0.750000 seconds with status 5: /input/20210112_000172.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000172.pdf/output/20210112_000172.pdf50.750000
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing: /input/20210121_000243.pdf -> /output/20210121_000243.pdf
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing complete in 0.790000 seconds with status 5: /input/20210111_000105.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210111_000105.pdf/output/20210111_000105.pdf50.790000
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing: /input/20210115_000226.pdf -> /output/20210115_000226.pdf
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing complete in 0.770000 seconds with status 5: /input/20210109_000072.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210109_000072.pdf/output/20210109_000072.pdf50.770000
ocrmypdf-auto | 2021-01-21 14:27:55 - Processing: /input/20210109_000056.pdf -> /output/20210109_000056.pdf
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing complete in 0.730000 seconds with status 5: /input/20210121_000243.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210121_000243.pdf/output/20210121_000243.pdf50.730000
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing: /input/20210119_000239.pdf -> /output/20210119_000239.pdf
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing complete in 0.770000 seconds with status 5: /input/20210115_000226.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210115_000226.pdf/output/20210115_000226.pdf50.770000
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing: /input/20210112_000215.pdf -> /output/20210112_000215.pdf
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing complete in 0.780000 seconds with status 5: /input/20210109_000056.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210109_000056.pdf/output/20210109_000056.pdf50.780000
ocrmypdf-auto | 2021-01-21 14:27:56 - Processing: /input/20210112_000203.pdf -> /output/20210112_000203.pdf
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing complete in 0.880000 seconds with status 5: /input/20210119_000239.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210119_000239.pdf/output/20210119_000239.pdf50.880000
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing: /input/20210114_000218.pdf -> /output/20210114_000218.pdf
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing complete in 0.850000 seconds with status 5: /input/20210112_000203.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000203.pdf/output/20210112_000203.pdf50.850000
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing: /input/20210112_000161.pdf -> /output/20210112_000161.pdf
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing complete in 0.870000 seconds with status 5: /input/20210112_000215.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000215.pdf/output/20210112_000215.pdf50.870000
ocrmypdf-auto | 2021-01-21 14:27:57 - Processing: /input/20210121_000245.pdf -> /output/20210121_000245.pdf
ocrmypdf-auto | 2021-01-21 14:27:58 - Processing complete in 0.780000 seconds with status 5: /input/20210114_000218.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210114_000218.pdf/output/20210114_000218.pdf50.780000
ocrmypdf-auto | 2021-01-21 14:27:58 - Processing complete in 0.780000 seconds with status 5: /input/20210112_000161.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210112_000161.pdf/output/20210112_000161.pdf50.780000
ocrmypdf-auto | 2021-01-21 14:27:58 - Processing complete in 0.810000 seconds with status 5: /input/20210121_000245.pdf
ocrmypdf-auto | TESTOCR_PROCESS_RESULT/input/20210121_000245.pdf/output/20210121_000245.pdf50.810000
^CGracefully stopping... (press Ctrl+C again to force)
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
Docker Compose not synchronising file changes in volume
The reason is that this is done with a CIFS mount where such information is not available.
Read more >Volumes mounted from a Linux WSL instance don't resolve in ...
The docker container will be able to see the files. However, the unix-style permissions and file metadata will not be recognized.
Read more >Docker Volume not mounting any files - Stack Overflow
Docker & Virtualbox seem to have an issue with mounting a volume outside of the /Users directory. The only way to fix the...
Read more >External storage (SMB) public share doesn't show content ...
All my files are on an external storage device, made available to Nextcloud via external storage SMB meaning I don't have any local...
Read more >Docker Container can't see files in cifs volume mount
I am using Docker Desktop 4.7.0 and wsl2. At the wsl cli, I am able to mount a cifs share to '/mnt/drnsf1/ims212' successfully....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@quotengrote
There is! All exit codes are passed through directly from
ocrmypdf
: https://ocrmypdf.readthedocs.io/en/latest/advanced.html#return-code-policy@jo-me
This is likely expected behavior. To enable processing of existing files at start up, you need to set the environment variable
OCR_PROCESS_EXISTING_ON_START
=1
. (If you have that set and it’s still not working, let me know… but the process of enumerating existing files is independent of which observer gets started, and should work with either the standardinotify
or with the new Polling watchdog.)Just to confirm: What you’re describing here was with the polling container, and with
OCR_USE_POLLING_SCHEDULER=1
set, right? And when files are available in/input
again, that’s looking at the input mount from a shell inside the container? I would have hoped for that to work, especially with the polling observer… but perhaps the in between state when “all files suddenly disappeared” did indeed cause the polling observer to crash.I believe both observers (inotify and polling) suffer from this problem, though I do imagine it will be worse with the polling observer. I wrote a “coalescing delay” into the code to handle this situation with the standard observer. By default it is 3 seconds (the container will wait 3 seconds after any modification to a given file, and any new modifications coming in will cause the 3-second timer to start over), but you can override this by setting environment variable
OCR_PROCESSING_DELAY=15.0
(for example, to set 15 second delay), to any number of seconds that you would like to wait before starting to process a file. You could try increasing this for your network environment where file transfers take a long time… BUT, realize that even short transfers that complete quickly will still wait for this amount of time before processing. There’s no way to know that a producer of data is done writing a file… he may close the file and then open it one second later to add another page, for example. So, the container just uses this timer heuristic as a best guess.This might be explained by the slow network transfer. If that file were still in use by the source or destination side of a transfer, I could see the container failing to delete it… and yet when you come back later through the shell to do the delete manually, all remaining users have closed their handles and the file can be deleted successfully.
Interesting idea re
mountpoint
anddf
… Though if it works for @quotengrote, I may simply leave this as an opt-in feature rather than digging too deep into auto-detection. We might be able to find a solution that properly autodetects CIFS, only to learn that it breaks someone else’s NFS flow or breaks docker on WSL2 or some other complicated scenario 😄. Unfortunately testing any of these scenarios in an automated way will be pretty tricky.Hi @quotengrote - Yes, thank you for the reminder!
I’ve merged this new polling observer mode from the same branch that you have been testing. In other words: The same configuration is required to opt-in to the polling behavior, by setting
OCR_USE_POLLING_SCHEDULER=1
.Now that the change is merged to
main
, you can go back to the original container image in your docker-compose file, i.e.:latest
, or no explicit tag.