question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New Files dont get recognized on Docker SMB-Volume

See original GitHub issue

Hi,

my scanner puts all scanned documents in a SMB-Share. This Share is then mounted with smb into the container.

At startup all Documents get ocr’ed, but if any documents get added after startup they will not recognized.

In my limited understanding i think that inotify does not work with docker and smb-shares.

Thank you very much… mg

docker-compose

version: '3'
services:
######## ocrmypdf-auto ########
  ocrmypdf-auto:
    container_name: "ocrmypdf-auto"
    image: cmccambridge/ocrmypdf-auto
    restart: always
    environment:
      - TZ=Europe/Berlin
      - 'OCR_LANGUAGES=deu eng'
      - OCR_OUTPUT_MODE=SINGLE_FOLDER
      - OCR_PROCESS_EXISTING_ON_START=1
      - OCR_ACTION_ON_SUCCESS=NOTHING
      - UID=1000
      - GID=1000
      - USERMAP_UID=1000
      - USERMAP_GIH=1000
    volumes:
      - scan_input:/input
      - scan_output:/output
      - config:/config


######## Volumes ########
volumes:
  config:
  scan_input:
    driver: local
    driver_opts:
      type: "cifs"
      o: "user=ocrmypdf,password=XXXXX,rw"
      device: "//192.168.2.36/scans"
  scan_output:
    driver: local
    driver_opts:
      type: "cifs"
      o: "user=ocrmypdf,password=XXXXX,rw"
      device: "//192.168.2.36/scans/output"


Log after lat startup

ocrmypdf-auto    | 2021-01-21 14:27:49 - Watching /input
ocrmypdf-auto    | 2021-01-21 14:27:49 - Processing: /input/20210109_000066.pdf -> /output/20210109_000066.pdf
ocrmypdf-auto    | 2021-01-21 14:27:49 - Processing: /input/20210112_000122.pdf -> /output/20210112_000122.pdf
ocrmypdf-auto    | 2021-01-21 14:27:49 - Processing: /input/20210109_000070.pdf -> /output/20210109_000070.pdf
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing complete in 3.720000 seconds with status 5: /input/20210109_000066.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210109_000066.pdf/output/20210109_000066.pdf53.720000
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing: /input/20210112_000108.pdf -> /output/20210112_000108.pdf
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing complete in 3.720000 seconds with status 5: /input/20210109_000070.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210109_000070.pdf/output/20210109_000070.pdf53.720000
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing: /input/20210109_000077.pdf -> /output/20210109_000077.pdf
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing complete in 3.760000 seconds with status 5: /input/20210112_000122.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000122.pdf/output/20210112_000122.pdf53.760000
ocrmypdf-auto    | 2021-01-21 14:27:53 - Processing: /input/20210115_000222.pdf -> /output/20210115_000222.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.760000 seconds with status 5: /input/20210109_000077.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210109_000077.pdf/output/20210109_000077.pdf50.760000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210121_000249.pdf -> /output/20210121_000249.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.830000 seconds with status 5: /input/20210112_000108.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000108.pdf/output/20210112_000108.pdf50.830000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210118_000237.pdf -> /output/20210118_000237.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.790000 seconds with status 5: /input/20210115_000222.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210115_000222.pdf/output/20210115_000222.pdf50.790000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210112_000217.pdf -> /output/20210112_000217.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.780000 seconds with status 5: /input/20210121_000249.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210121_000249.pdf/output/20210121_000249.pdf50.780000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210112_000172.pdf -> /output/20210112_000172.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.750000 seconds with status 5: /input/20210118_000237.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210118_000237.pdf/output/20210118_000237.pdf50.750000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210111_000105.pdf -> /output/20210111_000105.pdf
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing complete in 0.780000 seconds with status 5: /input/20210112_000217.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000217.pdf/output/20210112_000217.pdf50.780000
ocrmypdf-auto    | 2021-01-21 14:27:54 - Processing: /input/20210109_000072.pdf -> /output/20210109_000072.pdf
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing complete in 0.750000 seconds with status 5: /input/20210112_000172.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000172.pdf/output/20210112_000172.pdf50.750000
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing: /input/20210121_000243.pdf -> /output/20210121_000243.pdf
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing complete in 0.790000 seconds with status 5: /input/20210111_000105.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210111_000105.pdf/output/20210111_000105.pdf50.790000
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing: /input/20210115_000226.pdf -> /output/20210115_000226.pdf
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing complete in 0.770000 seconds with status 5: /input/20210109_000072.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210109_000072.pdf/output/20210109_000072.pdf50.770000
ocrmypdf-auto    | 2021-01-21 14:27:55 - Processing: /input/20210109_000056.pdf -> /output/20210109_000056.pdf
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing complete in 0.730000 seconds with status 5: /input/20210121_000243.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210121_000243.pdf/output/20210121_000243.pdf50.730000
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing: /input/20210119_000239.pdf -> /output/20210119_000239.pdf
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing complete in 0.770000 seconds with status 5: /input/20210115_000226.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210115_000226.pdf/output/20210115_000226.pdf50.770000
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing: /input/20210112_000215.pdf -> /output/20210112_000215.pdf
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing complete in 0.780000 seconds with status 5: /input/20210109_000056.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210109_000056.pdf/output/20210109_000056.pdf50.780000
ocrmypdf-auto    | 2021-01-21 14:27:56 - Processing: /input/20210112_000203.pdf -> /output/20210112_000203.pdf
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing complete in 0.880000 seconds with status 5: /input/20210119_000239.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210119_000239.pdf/output/20210119_000239.pdf50.880000
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing: /input/20210114_000218.pdf -> /output/20210114_000218.pdf
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing complete in 0.850000 seconds with status 5: /input/20210112_000203.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000203.pdf/output/20210112_000203.pdf50.850000
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing: /input/20210112_000161.pdf -> /output/20210112_000161.pdf
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing complete in 0.870000 seconds with status 5: /input/20210112_000215.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000215.pdf/output/20210112_000215.pdf50.870000
ocrmypdf-auto    | 2021-01-21 14:27:57 - Processing: /input/20210121_000245.pdf -> /output/20210121_000245.pdf
ocrmypdf-auto    | 2021-01-21 14:27:58 - Processing complete in 0.780000 seconds with status 5: /input/20210114_000218.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210114_000218.pdf/output/20210114_000218.pdf50.780000
ocrmypdf-auto    | 2021-01-21 14:27:58 - Processing complete in 0.780000 seconds with status 5: /input/20210112_000161.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210112_000161.pdf/output/20210112_000161.pdf50.780000
ocrmypdf-auto    | 2021-01-21 14:27:58 - Processing complete in 0.810000 seconds with status 5: /input/20210121_000245.pdf
ocrmypdf-auto    | TESTOCR_PROCESS_RESULT/input/20210121_000245.pdf/output/20210121_000245.pdf50.810000
^CGracefully stopping... (press Ctrl+C again to force)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
cmccambridgecommented, Jan 26, 2021

@quotengrote

is there an overview what exit-code means what?

There is! All exit codes are passed through directly from ocrmypdf: https://ocrmypdf.readthedocs.io/en/latest/advanced.html#return-code-policy

@jo-me

the files that already were in the directory from previous scans were not picked up at all by ocrmypdf-auto. Newly added files (while the container was running) are picked up ok.

This is likely expected behavior. To enable processing of existing files at start up, you need to set the environment variable OCR_PROCESS_EXISTING_ON_START=1. (If you have that set and it’s still not working, let me know… but the process of enumerating existing files is independent of which observer gets started, and should work with either the standard inotify or with the new Polling watchdog.)

After turning the printer back on, the /input directory shows the files again, but ocrmypdf-auto is not picking up anything - also no new files

Just to confirm: What you’re describing here was with the polling container, and with OCR_USE_POLLING_SCHEDULER=1 set, right? And when files are available in /input again, that’s looking at the input mount from a shell inside the container? I would have hoped for that to work, especially with the polling observer… but perhaps the in between state when “all files suddenly disappeared” did indeed cause the polling observer to crash.

how does it detect that the new file is complete? When copying a larger PDF over the network it might take some seconds. And the polling observer might notice it while it is still being written.

I believe both observers (inotify and polling) suffer from this problem, though I do imagine it will be worse with the polling observer. I wrote a “coalescing delay” into the code to handle this situation with the standard observer. By default it is 3 seconds (the container will wait 3 seconds after any modification to a given file, and any new modifications coming in will cause the 3-second timer to start over), but you can override this by setting environment variable OCR_PROCESSING_DELAY=15.0 (for example, to set 15 second delay), to any number of seconds that you would like to wait before starting to process a file. You could try increasing this for your network environment where file transfers take a long time… BUT, realize that even short transfers that complete quickly will still wait for this amount of time before processing. There’s no way to know that a producer of data is done writing a file… he may close the file and then open it one second later to add another page, for example. So, the container just uses this timer heuristic as a best guess.

the file could not be deleted after processing, but it can be deleted using the shell in the container… strange

This might be explained by the slow network transfer. If that file were still in use by the source or destination side of a transfer, I could see the container failing to delete it… and yet when you come back later through the shell to do the delete manually, all remaining users have closed their handles and the file can be deleted successfully.

Interesting idea re mountpoint and df… Though if it works for @quotengrote, I may simply leave this as an opt-in feature rather than digging too deep into auto-detection. We might be able to find a solution that properly autodetects CIFS, only to learn that it breaks someone else’s NFS flow or breaks docker on WSL2 or some other complicated scenario 😄. Unfortunately testing any of these scenarios in an automated way will be pretty tricky.

1reaction
cmccambridgecommented, Feb 4, 2021

Hi @quotengrote - Yes, thank you for the reminder!

I’ve merged this new polling observer mode from the same branch that you have been testing. In other words: The same configuration is required to opt-in to the polling behavior, by setting OCR_USE_POLLING_SCHEDULER=1.

Now that the change is merged to main, you can go back to the original container image in your docker-compose file, i.e. :latest, or no explicit tag.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Docker Compose not synchronising file changes in volume
The reason is that this is done with a CIFS mount where such information is not available.
Read more >
Volumes mounted from a Linux WSL instance don't resolve in ...
The docker container will be able to see the files. However, the unix-style permissions and file metadata will not be recognized.
Read more >
Docker Volume not mounting any files - Stack Overflow
Docker & Virtualbox seem to have an issue with mounting a volume outside of the /Users directory. The only way to fix the...
Read more >
External storage (SMB) public share doesn't show content ...
All my files are on an external storage device, made available to Nextcloud via external storage SMB meaning I don't have any local...
Read more >
Docker Container can't see files in cifs volume mount
I am using Docker Desktop 4.7.0 and wsl2. At the wsl cli, I am able to mount a cifs share to '/mnt/drnsf1/ims212' successfully....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found