snap not working on some pages
See original GitHub issueHi,
I currently try to automate downloading a newspaper to my archive because it is only available online for limited time (7 days per issue). I decided to take some screenshots to see whether things are working like I expect them to work. Unfortunately in headless mode snap hangs and both, phantomjs and chrome have 100%cpu. I also tried to enable the debugging mode which didn’t gave me very much information. I try to use headless mode because it looks like the normal mode cannot render the next pages (snap is just empty - I’d say those pages are relatively complex).
tagui@51e7de6e5311:/s$ /tagui/src/tagui sz headless debug
START - automation started - Tue May 22 2018 15:24:56 GMT+0000 (UTC)
[info] [phantom] Starting...
[info] [phantom] Running suite: 18 steps
[debug] [phantom] opening url: about:blank, HTTP GET
[debug] [phantom] url changed to ""
[debug] [phantom] Successfully injected Casper client-side utilities
https://epaper.sueddeutsche.de/login - SZID - Login
[info] [phantom] Step anonymous 2/18: done in 1953ms.
wait 10 seconds
[info] [phantom] Step anonymous 3/18: done in 1958ms.
[info] [phantom] Step _step 4/18: done in 1977ms.
[info] [phantom] wait() finished waiting for 10000ms.
type id_login as username
[info] [phantom] Step _step 5/19: done in 11994ms.
[info] [phantom] waitFor() finished in 229ms.
[info] [phantom] Step then 6/20: done in 13154ms.
type id_password as password
[info] [phantom] Step anonymous 7/20: done in 13156ms.
[info] [phantom] Step _step 8/21: done in 13176ms.
[info] [phantom] waitFor() finished in 224ms.
[info] [phantom] Step then 9/22: done in 14330ms.
click authentication-button
[info] [phantom] Step anonymous 10/22: done in 14331ms.
[info] [phantom] Step _step 11/23: done in 14351ms.
[info] [phantom] waitFor() finished in 224ms.
[info] [phantom] Step then 12/24: done in 16600ms.
wait 10 seconds
[info] [phantom] Step anonymous 13/24: done in 16605ms.
[info] [phantom] Step _step 14/24: done in 16625ms.
[info] [phantom] wait() finished waiting for 10000ms.
snap page to page1.pdf
here it hangs with both processes using 100% cpu
The flow looks like this. I’ve added the high wait times to see if it improves anything.
https://epaper.sueddeutsche.de/login
wait 10 seconds
type id_login as username
type id_password as password
click authentication-button
wait 10 seconds
snap page to page1.pdf
https://epaper.sueddeutsche.de/Stadtausgabe/2018-05-22
wait 10 seconds
snap page to page4.pdf
click issue__cover
wait 10 seconds
snap page to page2.pdf
click sz-daily-download-thumb-tray-control
click //a[text()="Ganze Ausgabe speichern"]
wait 10 seconds
snap page to page3.pdf
I’m using https://raw.githubusercontent.com/tebelorg/Tump/master/TagUI_Linux.zip from yesterday.
I run TagUI using docker, which allows me to just upload it to my gitlab and schedule a regular task where I don’t have to care about the system environment and it can just be run on any runner available. The Dockerfile looks like this:
FROM debian:latest
RUN apt-get update \
&& apt-get -y install \
php-cli \
python \
unzip \
wget \
curl \
procps \
&& wget https://raw.githubusercontent.com/tebelorg/Tump/master/TagUI_Linux.zip \
&& unzip TagUI_Linux.zip \
&& rm TagUI_Linux.zip \
&& wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& dpkg -i google-chrome-stable_current_amd64.deb || true \
&& apt-get install -f -y \
&& dpkg -i google-chrome-stable_current_amd64.deb \
&& rm google-chrome-stable_current_amd64.deb \
&& apt-get -y remove unzip \
&& rm -rf /var/lib/apt/lists
RUN adduser tagui && chmod +r -R /tagui && chown tagui -R /tagui
USER tagui
WORKDIR /s
#ENTRYPOINT ["/tagui/src/tagui"]
And I run it with this docker run -it --rm --privileged --shm-size 256m -v "$PWD/s":/s tagui bash
and invoke TagUI with /tagui/src/tagui sz headless debug
Edit: I’ve tried a minimal flow to take a snap of google and that works fine.
tagui@f2de344576a0:/s$ cat google
https://www.google.com
snap page to google.pdf
~/D/d/d/tagui docker run -it --rm --privileged --shm-size 256m -v "$PWD/s":/s tagui bash
tagui@f2de344576a0:/s$ /tagui/src/tagui google headless debug
START - automation started - Tue May 22 2018 15:56:50 GMT+0000 (UTC)
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: about:blank, HTTP GET
[debug] [phantom] url changed to ""
[debug] [phantom] Successfully injected Casper client-side utilities
https://www.google.com - Google
[info] [phantom] Step anonymous 2/4: done in 1653ms.
snap page to google.pdf
[info] [phantom] Step anonymous 3/4: done in 1763ms.
https://www.google.com/ - Google
FINISH - automation finished - 2.0s
[info] [phantom] Step anonymous 4/4: done in 1967ms.
[info] [phantom] Done 4 steps in 1967ms
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
Issue Analytics
- State:
- Created 5 years ago
- Comments:15 (8 by maintainers)
Top GitHub Comments
Ok, I’ll look at it another day and see if I can find out something new or get another idea.
Alternatively for visible Chrome, during running, users can manually select allow popup for the website in top right hand corner of Chrome browser.
Not sure why it is not working for that particular website. By default, TagUI will set a download path to the flow directory, for a webpage that is opened via https or http step.