question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

misuse of abstract base classes + monolithic JobFunnel class + schema validation + localisation

See original GitHub issue

Description

Currently we are using the JobFunnel class for to much, I want to break it down into the following:

Job(object):
    def __init__(self, title: str, company: str, location: str, tags: List[str], post_date: datetime.date, key_id: str, url: str) -> None:
        ...

Scraper(ABC):

    @abstractmethod
    def scrape(self) -> List[Job]:
        pass
    

main():
    
    # instantiate scrapers

    # run filter on list of Job

    # dump pickle

    # writeout CSV

Note: if I get to it, I’d also like our filters to be an ABC.

Steps to Reproduce

This is a structural technical debt issue. (n/a)

Expected behavior

Abstract base class should not be halfway abstract, Need seperation between JobFunnel and main() and inherited scrapers.

Actual behavior

JobFunnel being monolithic and half-abstract has allowed us to implement three script-like scrapers which share too many methods, without an actual Job object.

Environment

n/a


Current Status:

  • Job Object

  • Support for Internationalization

  • BaseScraper with get/set scraping logic

  • New YAML and CLI implemented

  • Schema Validation with Cerberus

  • Caching

  • Filtering with lists

  • Indeed

  • Monster

  • GlassDoorStatic (works but seems like it has bugs so fixing this).

  • Wage Scraping

  • GlassDoor Dynamic/Driven

  • Duplicates list file support

  • Integrate TFIDF similarity filter (special case filter)

  • Prevent writing out empty CSVs in --no-scrape mode

  • Prevent delayed get/set for jobs which fail filters

  • Fix multi-page Monster scraping

  • handle duplicated jobs special case

  • Make JobFilter class

  • Add TAG scraping to Monster

  • Implement job filtering as own class

  • Fix paths from -s yaml being overwritten with defaults with CLI

  • Fix concurrency issue with dependencies for get/set

  • Monkey / general usability testing

  • Update main README

  • Update other READMEs + tutorials

  • Add versioning to cache files (i.e wrapper for dict with metadata)

  • Review various FIXMES in-code

  • Fix build (Travis CI)

  • Test setup.py

  • Fix demo GIF

  • Document how to write new scrapers with localization


Future work:

  • Google jobs scraper
  • Ycombinator job scraper
  • Assess the update experience from V2.0 --> V3.0, provide a guide
  • cut a release
  • Add WAGE scraping to Indeed
  • Add REMOTE scraping to Indeed
  • Add REMOTE scraping to Monster

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:31 (22 by maintainers)

github_iconTop GitHub Comments

1reaction
PaulMcInniscommented, Aug 29, 2020

@thebigG FYI, I’ve just fixed a bug with the CLI parser where YAML paths were not being respected.

1reaction
thebigGcommented, Aug 28, 2020

I have already started testing. Like I said before starting to test cli.py and will take it from there.

Love the new JobFunnel architecture by the way, great job 🚀

Read more comments on GitHub >

github_iconTop Results From Across the Web

GlassDoor support (fix and re-enable) · Issue #87 - GitHub
Build: current development, or the branch on misuse of abstract base classes + monolithic JobFunnel class + schema validation + localisation ...
Read more >
collections.abc — Abstract Base Classes for Containers ...
This module provides abstract base classes that can be used to test whether a class provides a particular interface; for example, whether it...
Read more >
Implementing abstract methods from abstract java class in ...
I have a base class called Schema, which is abstract and it is a non-generated class. I have two generated JAXB classes that...
Read more >
Abstract Classes in Python - GeeksforGeeks
A class which contains one or more abstract methods is called an abstract class. An abstract method is a method that has a...
Read more >
Abstract Methods and Classes (The Java™ Tutorials ...
An abstract class is a class that is declared abstract —it may or may not include abstract methods. Abstract classes cannot be instantiated,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found