question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem with 'mailto' links

See original GitHub issue

Mandatory

  • I read the documentation (readme and wiki).
  • I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.

Related issues:

  • I did not find any related issues in news-please and scrapy

Describe your question I trying to crawl from its url “http://www.ladigetto.it” and I get the following error from scrapy

  File "/usr/local/lib/python3.6/dist-packages/scrapy/http/request/__init__.py", line 25, in __init__
    self._set_url(url)
  File "/usr/local/lib/python3.6/dist-packages/scrapy/http/request/__init__.py", line 69, in _set_url
    raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: mailto:redazione@ladigetto.it

By looking at the content of the initial page of the url, I actually see the href generating the error. <a href="mailto:redazione@ladigetto.it">redazione@ladigetto.it</a>

Is it possible to force news-please not to follow this kind of link. Is it possible to do it through the configuration file?

Versions (please complete the following information):

  • OS: Ubuntu 16.04.6 LTS
  • Python Version: 3.6.8
  • news-please Version: 1.5.3

Intent (optional; we’ll use this info to prioritize upcoming tasks to work on)

  • personal
  • academic
  • business
  • other
  • Some information on your project:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
tobiasstrausscommented, Aug 27, 2020

For me the problem was solved by defining an ignore_regex in the config.cfg ignore_regex = "(mail[tT]o)|([jJ]avascript)|(tel)|(fax)"

0reactions
fhamborgcommented, Oct 12, 2022

added it 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mailto links do nothing in Chrome but work in Firefox?
This is because chrome handles the mailto in different way. You can go to chrome://settings/handlers ...
Read more >
Mailto Links Explained with Examples | Mailtrap Blog
You can customize mailto links in many ways. We describe them all, give examples, and discuss whether mailto is the right approach.
Read more >
Easy Fixes to Make Email Links Open Gmail
1. Open Google Chrome 2. Navigate to https://mail.google.com/ 3. Look at the right side of the address bar for the Protocol Handler icon....
Read more >
How to Fix the Mailto Link, not Working (About: blank) Message
There are a couple of different reasons why this might be happening. Especially in Google Chrome, you may have your handler turned off,...
Read more >
Why Mailto Links Should Be Avoided On Websites - Blog
For example, one problem would be that in a mail application (opened by a mailto link), receiver and subject are usually already set....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found