Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

misuse of abstract base classes + monolithic JobFunnel class + schema validation + localisation

See original GitHub issue

Description

Currently we are using the JobFunnel class for to much, I want to break it down into the following:

Job(object):
    def __init__(self, title: str, company: str, location: str, tags: List[str], post_date: datetime.date, key_id: str, url: str) -> None:
        ...

Scraper(ABC):

    @abstractmethod
    def scrape(self) -> List[Job]:
        pass
    

main():
    
    # instantiate scrapers

    # run filter on list of Job

    # dump pickle

    # writeout CSV

Note: if I get to it, I’d also like our filters to be an ABC.

Steps to Reproduce

This is a structural technical debt issue. (n/a)

Expected behavior

Abstract base class should not be halfway abstract, Need seperation between JobFunnel and main() and inherited scrapers.

Actual behavior

JobFunnel being monolithic and half-abstract has allowed us to implement three script-like scrapers which share too many methods, without an actual Job object.

Environment

n/a

Current Status:

Job Object
Support for Internationalization
BaseScraper with get/set scraping logic
New YAML and CLI implemented
Schema Validation with Cerberus
Caching
Filtering with lists
Indeed
Monster
GlassDoorStatic (works but seems like it has bugs so fixing this).
Wage Scraping
GlassDoor Dynamic/Driven
Duplicates list file support
Integrate TFIDF similarity filter (special case filter)
Prevent writing out empty CSVs in --no-scrape mode
Prevent delayed get/set for jobs which fail filters
Fix multi-page Monster scraping
handle duplicated jobs special case
Make JobFilter class
Add TAG scraping to Monster
Implement job filtering as own class
Fix paths from -s yaml being overwritten with defaults with CLI
Fix concurrency issue with dependencies for get/set
Monkey / general usability testing
Update main README
Update other READMEs + tutorials
Add versioning to cache files (i.e wrapper for dict with metadata)
Review various FIXMES in-code
Fix build (Travis CI)
Test setup.py
Fix demo GIF
Document how to write new scrapers with localization

Future work:

Google jobs scraper
Ycombinator job scraper
Assess the update experience from V2.0 --> V3.0, provide a guide
cut a release
Add WAGE scraping to Indeed
Add REMOTE scraping to Indeed
Add REMOTE scraping to Monster

Issue Analytics

State:
Created 3 years ago
Comments:31 (22 by maintainers)

Top GitHub Comments

1reaction

PaulMcInniscommented, Aug 29, 2020

@thebigG FYI, I’ve just fixed a bug with the CLI parser where YAML paths were not being respected.

1reaction

thebigGcommented, Aug 28, 2020

I have already started testing. Like I said before starting to test cli.py and will take it from there.

Love the new JobFunnel architecture by the way, great job 🚀

Read more comments on GitHub >

Top Results From Across the Web

GlassDoor support (fix and re-enable) · Issue #87 - GitHub

Build: current development, or the branch on misuse of abstract base classes + monolithic JobFunnel class + schema validation + localisation ...

collections.abc — Abstract Base Classes for Containers ...

This module provides abstract base classes that can be used to test whether a class provides a particular interface; for example, whether it...

Implementing abstract methods from abstract java class in ...

I have a base class called Schema, which is abstract and it is a non-generated class. I have two generated JAXB classes that...

Abstract Classes in Python - GeeksforGeeks

A class which contains one or more abstract methods is called an abstract class. An abstract method is a method that has a...

Abstract Methods and Classes (The Java™ Tutorials ...

An abstract class is a class that is declared abstract —it may or may not include abstract methods. Abstract classes cannot be instantiated,...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

GlassDoor support (fix and re-enable)

ValueError: empty vocabulary