question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

StatusUpdaterBolt to use provided domain name for routing

See original GitHub issue

StatusUpdaterBolt if configured with routing byDomain should use the routing key from metadata (if provided in the field defined by es.status.routing.fieldname). Updates of the public suffix list (included in the crawler-commons dependency) may change the domain name and routing key, and may cause duplicate status records in the index and needless refetches of the same URL (cf. commoncrawl/news-crawl#28).

The simplest solution is just to use the provided routing key (similar as it’s done for routing byIP). This would require only changes in URLPartitioner. Alternatively, StatusUpdaterBolt could check whether the routing key has changed and then send a deletion request using the original routing key and update the status document with the new routing key.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jniochecommented, Feb 28, 2019

the way I see it, it would copy to a new index. Aliasing could be used to preserve a generic name e.g. status if needed. Reindexing could also be useful e.g. for changing the sharding logic or the number of shards etc…

1reaction
sebastian-nagelcommented, Feb 26, 2019

Thanks, I’ll test it during the next days. To pick the _routing value from ES is of course the most reliable solution, maybe better than using metadata.hostname (see es.status.bucket.field resp. es.status.routing.fieldname).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Routing traffic to an Amazon API Gateway API by using your ...
Configuring Route 53 to route traffic to an API Gateway endpoint · In the navigation pane, choose Custom domain names. · Select the...
Read more >
RFC 7686: The ".onion" Special-Use Domain Name
Introduction The Tor network [Dingledine2004] has the ability to host network services using the ".onion" Special-Use Top-Level Domain Name.
Read more >
Google Domains – Register Your Domain Name – Google ...
Find your place online with a domain from Google, powered by Google reliability, security and performance.
Read more >
Associating a custom domain name and securing ...
An IP address for use with A records is provided when you configure custom domain name ... will work, as the custom domain...
Read more >
What is a domain name? | Domain name vs. URL | Cloudflare
2.2), but thanks to DNS, users are able to enter human-friendly domain names and be routed to the websites they are looking for....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found