question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Released version 2.1.8 failed on March-28-2021

See original GitHub issue

Description

pip3 install git+https://github.com/PaulMcInnis/JobFunnel.git@2.1.8 Being using this a couple of years now. For some reason, this failed. What I have done so far:

  1. Deleted all the data files in search(master_list.csv, jobfunnel.log,jobs_2021-03-22.pkl,jobs_2021-03-28.pkl,filter_list.json)
  2. Disabled adblocker.

Error

admin@Admins-MacBook-Pro ~ % bash job.sh                                                        
finding you jobs
jobfunnel initialized at 2021-03-28
no master-list, filter-list was not updated
jobfunnel indeed to pickle running @ 2021-03-28
failed to scrape Indeed: 'NoneType' object has no attribute 'contents'
jobfunnel monster to pickle running @ 2021-03-28
failed to scrape Monster: 'NoneType' object has no attribute 'text'
jobfunnel glassdoor to pickle running @ 2021-03-28
failed to scrape GlassDoor: 'NoneType' object has no attribute 'text'
Traceback (most recent call last):
  File "/usr/local/bin/funnel", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/jobfunnel/__main__.py", line 48, in main
    jp.update_masterlist()
  File "/usr/local/lib/python3.8/site-packages/jobfunnel/jobfunnel.py", line 358, in update_masterlist
    raise ValueError("No scraped jobs, cannot update masterlist")
ValueError: No scraped jobs, cannot update masterlist
DONE
admin@Admins-MacBook-Pro ~ % 

Apart from Google and Youtube insisting on captcha every 3 hours for my IP, this has become unusable. The traffic coming from my machine is this code running.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Nlliicommented, Mar 30, 2021

Awesome thanks, released version 3.0.2 works. shell

bash job.sh       
finding you jobs
[2021-03-29 19:31:31,209] [INFO] JobFunnel: Scraping local providers with: ['IndeedScraperUSAEng']
[2021-03-29 19:31:40,281] [INFO] IndeedScraperUSAEng: Found 4 pages of search results for query=Phlebotomist
[2021-03-29 19:31:48,047] [INFO] IndeedScraperUSAEng: Scraped 188 job listings from search results pages
100%|#######################################################| 188/188 [02:18<00:00,  1.35it/s]
[2021-03-29 19:34:07,240] [INFO] JobFunnel: Completed all scraping, found 188 new jobs.
[2021-03-29 19:34:07,394] [INFO] JobFunnel: Done. View your current jobs in demo_job_search_results/demo_search.csv
DONE
1reaction
Nlliicommented, Mar 30, 2021

Are you able to test the current master of repository? Yes, I checkout the master repo last year, I had to revert back to 2.1.8. 2.1.8 was faster and straight forward nothing fancy.

I don’t know if this helps, but, if the end-user already has a copy of 2.1.8 on kaggle and re-runs it again this is the outcome. https://www.kaggle.com/bellphegor/job-search

  1. It will filter the jobs --max_listing_days 2 and find jobs on indeed to add to the csv file after filtering
  2. Then it will fail when the end-user runs it again.
  3. Why does it fail the second time when run. I will try to get the current masterlist and filterlists from kaggle to duplicate the outcome.

shell

jobfunnel indeed to pickle running @ 2021-03-29
Found 4 indeed results for query=phlebotomist
getting indeed page 0 : http://www.indeed.com/jobs?q=phlebotomist&l=HOUSTON%2C+TX&radius=25&limit=50&filter=0&start=0
getting indeed page 1 : http://www.indeed.com/jobs?q=phlebotomist&l=HOUSTON%2C+TX&radius=25&limit=50&filter=0&start=50
getting indeed page 2 : http://www.indeed.com/jobs?q=phlebotomist&l=HOUSTON%2C+TX&radius=25&limit=50&filter=0&start=100
getting indeed page 3 : http://www.indeed.com/jobs?q=phlebotomist&l=HOUSTON%2C+TX&radius=25&limit=50&filter=0&start=150
date_filter running

delay of 10.00s, getting indeed search: http://www.indeed.com/viewjob?jk=ac44060dadbe32b3
delay of 10.00s, getting indeed search: http://www.indeed.com/viewjob?jk=a028d791865bb433
delay of 10.00s, getting indeed search: http://www.indeed.com/viewjob?jk=db0271a737679ea2
indeed scrape job took 206.649s
jobfunnel monster to pickle running @ 2021-03-29
failed to scrape Monster: 'NoneType' object has no attribute 'text'
no jobs filtered, missing search/data/filter_list.json
removed 0 jobs in blacklist from master-list
Found and removed 6 re-posts/duplicates via TFIDF cosine similarity!
no masterlist detected, added 5 jobs to search/master_list.csv
done. see un-archived jobs in search/master_list.csv
Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting ready for 2.1.8 - Aegisub
7 stable, and we are happy to announce that version 2.1.8 is getting closer every day. After we released 2.1.7 we split Aegisub...
Read more >
Shelly support group (English Version) - Facebook
I have problems updating two shelly devices: TRV and Motion 2 from version 2.1.7 to 2.1.8. I have tried many times from update...
Read more >
Dashcam Viewer Version History
[Windows] Dashcam Viewer is now released as a 64-bit application for increased stability. New Camera Supports: • Added support for Guardian G1 ...
Read more >
Dependencies | hlp | npm - Open Source Insights
arrow_right @babel/cli. 7.19.3 Notes Relation Licenses Dependencies 37 Version 7.19.3 Published Description arrow_right @babel/core. 7.20.5 Notes Relation Licenses Dependencies 51 Version 7.20.5 Published Description
Read more >
Need help upgrading to fedora 34 beta - Reddit
I ran sudo dnf system-upgrade download --releasever=34 --allowerasing and it returned: Error: Problem: package ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found