question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Glassdoor.com is not working

See original GitHub issue

Issue Template

Description

Just today I discovered that when scraping Glassdoor.com, JobFunnel fails. Please include the steps to reproduce. List any additional libraries that are affected.

Steps to Reproduce

  1. Comment out Indeed and Monster from providers options in settings.yaml as such:
        # - 'Indeed'
         # - 'Monster'
        - 'GlassDoor'
  1. Run job funnel funnel -s settings.yaml

Expected behavior

Scrape Glassdoor.com and store jobs in master_list.csv

Actual behavior

JobFunnel output:

jobfunnel initialized at 2020-05-05
no master-list, filter-list was not updated
jobfunnel glassdoor to pickle running @ 2020-05-05
failed to scrape GlassDoor: 'NoneType' object has no attribute 'text'
Traceback (most recent call last):
  File "/usr/local/bin/funnel", line 11, in <module>
    load_entry_point('JobFunnel==2.1.6', 'console_scripts', 'funnel')()
  File "/usr/local/lib/python3.6/dist-packages/jobfunnel/__main__.py", line 55, in main
    jf.update_masterlist()
  File "/usr/local/lib/python3.6/dist-packages/jobfunnel/jobfunnel.py", line 291, in update_masterlist
    raise ValueError('No scraped jobs, cannot update masterlist')

Environment

  • Operating system and version: Linux Mint(Ubuntu 18.04)
  • Desktop Environment and/or Window Manager: Cinnamon
  • Tested on .com(United States domain) and .ca(Canada domain) NOTE: I also ran JobFunnel on an isolated docker container(Ubuntu 18.04) and the issue persisted.

I discovered this while inspecting glassdoor.py for testing. I will try my best to tackle this issue in the upcoming days. Hopefully we’ll fix it soon!

Cheers!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:6

github_iconTop GitHub Comments

2reactions
studentbradcommented, May 6, 2020

Can confirm that selenium is a solution to this problem. They seem to be running javascript before bringing up the page which is why we can’t get any html data. Using a webdriver you can bring up the page pretty easily and requires minimal effort but slows the process of scraping.

from selenium import webdriver
# initialize the webdriver
try:
    self.driver = webdriver.Chrome()
except FileNotFoundError:
    try:
        self.driver = webdriver.Firefox()
    except FileNotFoundError:
        raise FileNotFoundError('Sorry, chromedriver or geckodriver must de installed to scrape')
# get the search url
search = self.get_search_url()

# get the html data, initialize bs4 with lxml
self.driver.get(search)

# create the soup base
soup_base = BeautifulSoup(self.driver.page_source, self.bs4_parser)

You first must implement the get method for glassdoor as I have done.

if method == 'get':
    # form job search url
    search = ('https://www.glassdoor.{0}/Job/jobs.htm?'
              'clickSource=searchBtn&sc.keyword={1}&locT=C&locId={2}&jobType=&radius={3}'.format(
        self.search_terms['region']['domain'],
        self.query,
        location_response[0]['locationId'],
        self.convert_radius(
            self.search_terms['region']['radius'])))

We can keep other methods of scraping the same while changing glassdoor. If the user enables scraping of glassdoor in the yaml we will have to give warning of the need for chromedriver or geckodriver prior.

Checkout my branch to see the changes https://github.com/PaulMcInnis/JobFunnel/tree/studentbrad/glassdoor

1reaction
studentbradcommented, May 6, 2020

No problem! I’m really thankful that you’re willing to help out on this 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Glassdoor down? Current status and problems - Downdetector
Real-time overview of problems with Glassdoor. Service down, can't log in or send messages? We'll tell you what is going on.
Read more >
Troubleshooting Tips | Glassdoor Help Center
Refresh the page and try again · Disable pop-up or ad blockers, refresh the page and try again · Open the site in...
Read more >
Is Glassdoor Down Right Now?
Glassdoor down? Check whether Glassdoor.com server is down right now or having outage problems for everyone or just for you.
Read more >
Glassdoor down today December, 2022 ... - UpdownRadar
Glassdoor website down Today December, 2022? Can't log in? Real-time problems and outages - here you'll see what is going on.
Read more >
Glassdoor - Jobs Search & More - Apps on Google Play
Search jobs, find companies hiring now, and get useful interview tips. We've got comprehensive job search tools and advice to help you get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found