Potential memory leak / infinite loop scenario
See original GitHub issueSo I have a problem which is quite similar to #923 . There is an HTML document that is causing CPU usage to increase to 100% and causes memory usage issues when I call .write_pdf
function on it.
I have actually spent the last few days to create a quasi-minimal HTML document with cleaned data to reproduce the problem and share it here, and this is it:
Some of the css markers in there might not be fully needed to reproduce the problem. I hope you enjoy the images I’ve included 😃. The thing which is important to replicate the issue is that those images are of certain sizes (I’m not sure if it’s related only to width, or to both width and height of the images).
The problem occurs on an env with latest WeasyPrint installed:
pip freeze
cairocffi==1.1.0
CairoSVG==2.4.2
cffi==1.14.1
cssselect2==0.3.0
defusedxml==0.6.0
html5lib==1.1
Pillow==7.2.0
pkg-resources==0.0.0
pycparser==2.20
Pyphen==0.9.5
six==1.15.0
tinycss2==1.0.2
WeasyPrint==51
webencodings==0.5.1
python version I’ve tested it on is 3.6.9
. I have also tested it on python 3.8.4 with the same results (production env is running on that one).
OS is ubuntu 18.04:
uname -a
Linux ubuntu-bionic 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I have managed to “fix” the problem currently by setting the width
of table
element to 95%
instead of 100%
.
Steps to replicate the problem in python console:
content = """<content from the .txt file attached to this bug report>"""
from weasyprint import HTML
from tempfile import NamedTemporaryFile
html = HTML(string=content, encoding="utf-8")
temp_file = NamedTemporaryFile()
html.write_pdf(target=temp_file)
Maybe there’s some way of raising an error when an infinite loop of operations is detected (e.g. limiting the number of calls that can be made for certain function) or some other fail-safe mechanism that will not allow memory usage to rise? I understand that this is quite a specific (and pretty weird) edge case, I’m a bit worried that there are more of those around though.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
This bug is kind of fixed by da146c639ad6e7c91c5feb04dde0f0afc14eba7e in the master branch.
Instead of an infinite loop WeasyPrint now generates two pages for the table: First page with the table header only, on the second page there is the header and the images.
Reason is: The row with the images and the header above is higher than the page and WeasyPrint is unable to paginate/split rows which cross the page margin see #36
When a row doesn’t fit on the current page, WeasyPrint pushes the row onto the next page. But the header is already on the page just generated.
Before da146c639ad6e7c91c5feb04dde0f0afc14eba7e WeasyPrint never ceased to push the large row forward to the next page…
Thanks to both of you!