question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spider: Illinois Department of Corrections Advisory Board

See original GitHub issue

URL: https://www2.illinois.gov/idoc/aboutus/advisoryboard/Pages/default.aspx Spider Name: il_corrections Agency Name: Illinois Department of Corrections Advisory Board

See the contribution guide for information on how to get started

Some information will need to be parsed from PDFs, so it could be useful to look at chi_human_relations for an example of how to handle this

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:25 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
pjsiercommented, Dec 14, 2020

@cherdeman sure! Scrapy requests are made asynchronously, so if you want to guarantee you have data available for another method you’ll need to make sure you’re chaining callbacks so that they’re called in the right order.

You can see an example of this in chi_human_relations where the first request (in start_urls) goes to a department detail page. The parse method is the default callback, so the response from the detail page is checked there, and from there look for a PDF link to parse and yield a request to that link that will be handled by _parse_schedule. In _parse_schedule we handle the response body and then chain another request to _parse_documents now that we know the schedule details will be available and can be used to yield meetings.

From taking an initial look at your scraper, it looks like you could follow a similar pattern of first parsing all of the links, then yield a request to each meeting, pull the PDF, and then potentially yield back to the meeting (which is where this one might be a bit circular). Let me know if that’s helpful! The Scrapy docs are also generally good, so if there’s anything I can help with from those let me know

1reaction
pjsiercommented, Dec 7, 2020

@cherdeman great! I’ll assign you

Read more comments on GitHub >

github_iconTop Results From Across the Web

IDOC Advisory Board
An advisory board to the agency is established under the Illinois Compiled Statutes. The Adult Advisory Board is established by Chapter 730 Illinois...
Read more >
Stories — Just Media.
We syndicate, or “co-publish,” reporting and narrative writing with local and national outlets. To explore partnership, contact james@justmediaproject.org.
Read more >
2017 SECURECHOICE SPIDER ASSURANCE PROGRAM
Demon WP and Tandem, fail to provide adequate reduction of spider ... the Purpose Icon and the Syngenta logo are trademarks of a...
Read more >
Guidelines for Environmental Infection Control in Health-Care ...
health-care facilities. Recommendations from CDC and the Healthcare Infection Control Practices. Advisory Committee (HICPAC). Chicago IL; American Society ...
Read more >
Homeowner Guide to Spiders around the Home and Yard
spiders – the black widow and the hobo ... The external body of all arachnids – spiders and non- ... Records maintained by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found