CVAT does not work when annotating PDFs
See original GitHub issueHi,
I am trying to annotate pdf documents instead of images with cvat and noticed a number of problems that I am not able to resolve alone. I am using the develop branch, because on the master branch the Docker image of cvat does not build successfully.
-
I am only able to upload a single pdf document (with many pages) but not several pdf documents. The error Code explains that I can only upload a single pdf but it would be helpful to understand the rational for this:
ValueError: Only one video, archive, pdf or many image, directory can be used simultaneously, but 0 image(s), 0 video(s), 0 archive(s), 2 pdf(s), 0 directory(s) found.
-
The conversion from pdf to image with pdf2image is not working, because poppler is missing from the Dockerfile. I fixed it by adding it to the Dockerfile:
# Install poppler for working with pdfs
RUN apt-get update && apt install -y poppler-utils
- After annotating a few items, I attempted to dump the annotation and no matter which format I use it fails, here is the error message. Note, dumping annotated png images works perfectly, seems to be a problem specific to pdfs.
2019-12-07 23:45:12,475 DEBG 'rqworker_default_1' stderr output:
23:45:12 default: cvat.apps.engine.annotation.dump_task_data('5', <SimpleLazyObject: <User: admin>>, '/home/django/data/5/5_IDP.admin.2019_12_07_23_45_12.zip', <AnnotationDumper: AnnotationDumper object (YOLO ZIP 1.0)>, 'http', 'localhost:8080') (admin@/api/v1/tasks/5/annotations/YOLO ZIP 1.0/5_IDP)
2019-12-07 23:45:12,574 DEBG 'rqworker_default_1' stderr output:
23:45:12 cvat.apps.engine.utils.InterpreterError: ValueError at line 308: '.upload' is not in list
Traceback (most recent call last):
File "/home/django/cvat/apps/engine/utils.py", line 45, in execute_python_code
exec(source_code, global_vars, local_vars)
File "<string>", line 1, in <module>
File "<string>", line 104, in dump
File "/home/django/cvat/apps/annotation/annotation.py", line 325, in group_by_frame
_get_frame(annotations, shape).labeled_shapes.append(self._export_labeled_shape(shape))
File "/home/django/cvat/apps/annotation/annotation.py", line 308, in _get_frame
rpath = os.path.sep.join(rpath[rpath.index(".upload")+1:])
ValueError: '.upload' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform
self._result = self._execute()
File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute
return self.func(*self.args, **self.kwargs)
File "/home/django/cvat/apps/engine/annotation.py", line 135, in dump_task_data
annotation.dump(filename, dumper, scheme, host)
File "/home/django/cvat/apps/engine/annotation.py", line 740, in dump
execute_python_code("{}(file_object, annotations)".format(dumper.handler), global_vars)
File "/home/django/cvat/apps/engine/utils.py", line 60, in execute_python_code
raise InterpreterError("{} at line {}: {}".format(error_class, line_number, details))
cvat.apps.engine.utils.InterpreterError: ValueError at line 308: '.upload' is not in list
Traceback (most recent call last):
File "/home/django/cvat/apps/engine/utils.py", line 45, in execute_python_code
exec(source_code, global_vars, local_vars)
File "<string>", line 1, in <module>
File "<string>", line 104, in dump
File "/home/django/cvat/apps/annotation/annotation.py", line 325, in group_by_frame
_get_frame(annotations, shape).labeled_shapes.append(self._export_labeled_shape(shape))
File "/home/django/cvat/apps/annotation/annotation.py", line 308, in _get_frame
rpath = os.path.sep.join(rpath[rpath.index(".upload")+1:])
ValueError: '.upload' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform
self._result = self._execute()
File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute
return self.func(*self.args, **self.kwargs)
File "/home/django/cvat/apps/engine/annotation.py", line 135, in dump_task_data
annotation.dump(filename, dumper, scheme, host)
File "/home/django/cvat/apps/engine/annotation.py", line 740, in dump
execute_python_code("{}(file_object, annotations)".format(dumper.handler), global_vars)
File "/home/django/cvat/apps/engine/utils.py", line 60, in execute_python_code
raise InterpreterError("{} at line {}: {}".format(error_class, line_number, details))
cvat.apps.engine.utils.InterpreterError: ValueError at line 308: '.upload' is not in list
2019-12-07 23:45:15,528 DEBG 'runserver' stderr output:
[Sat Dec 07 23:45:15.528224 2019] [wsgi:error] [pid 151:tid 139962191009536] [remote 172.19.0.1:33606] [2019-12-07 23:45:15,528] ERROR django.request: Internal Server Error: /api/v1/tasks/5/annotations/5_IDP
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
Thanks for your detailed Response:
Unfortunately, the code fails silently. Though it says task has been created, the task is not there in the overview ready for the annotation.
The following output from the logs, shows that not frames have been created for the task.
What is more, in the data folder no .jpg file is getting saved when I upload pdfs (projectid 1) but when I upload images (projectid 2) , it works as expected:
I use your code only minimally adapted: cvat/cvat/apps/engine/media_extractors.py
Complete Code: https://github.com/philippschw/cvat
I can’t speak to the dumper errors.
As far as the rationale behind only being able to load a single PDF, I submitted this while working a job for a client. All the client needed was the ability to upload a single PDF per task. And I had many, many other responsibilities 😃 .
The upload code can easily be extended to account for your use case.
You would need to wrap lines 92 - 97 in a
for
loop. Line 92 is linked below:https://github.com/opencv/cvat/blob/1ec89b5f6a445aaa86854356cb73deb7e070d346/cvat/apps/engine/media_extractors.py#L92
I think the
DirectoryExtractor
has a somewhat relevant example, the only difference being thatfile_ = convert_from_path(self._source_path)
is a little mis-labeled. I believefile_
is a list of multiple file paths of images that each need to be handled.The relevant section of
DirectoryExtractor
code is linked below.https://github.com/opencv/cvat/blob/1ec89b5f6a445aaa86854356cb73deb7e070d346/cvat/apps/engine/media_extractors.py#L129
Below is a take on my comments from above.
But I don’t have any way to test the above code currently.