question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

should we normalize scheme and hostname of URI to be lower?

See original GitHub issue

datasets are defined by their URI

we don’t really spelly out what exactly our URI standard is, apart from reserving the airflow scheme and ascii

for websites, the scheme and hostname are case insensitive, while everything else is not.

should we normalize scheme and hostname or allow case sensitive differentiation?

my inclination is, we should allow everything to be case sensitive, except possibly the scheme. and the reason is, we don’t know exactly what “hostname” will mean for a dataset. if, for example, it’s a database object, it could be case sensitive. scheme.

if we do implement some normalization, we introduce somewhat of a problem when we do sqlalchemyf lookups by URI because we can’t just do Dataset.uri == uri; we’d have to normalize the incoming URI value first. One possibility is to split out the scheme into a different column in the db that has CI collation, which would avoid this issue, though it would force you to decompose. but this is messy too.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:24 (24 by maintainers)

github_iconTop GitHub Comments

2reactions
potiukcommented, Jul 20, 2022

Yep. It’s a mess. But I think if we are about to be compliant with OpenLineage and others, being compliant with RFC is best. The worst we can do is invent yet another “our” interpretation of the standard:

image

0reactions
dstandishcommented, Aug 2, 2022

Haha nice

Read more comments on GitHub >

github_iconTop Results From Across the Web

Should we normalize further the URLs · Issue #120 - GitHub
I 've got a minor concern about using URL for the normalize() method parameter, and using the URI.normalize() method. When we were doing...
Read more >
Should URL be case sensitive? - Stack Overflow
Section 6.2.2.1 of RFC 3986 says that "scheme and host are case-insensitive and therefore should be normalized to lowercase.
Read more >
URI Normalization
Host Name. Note that, per RFC 3986, any host name component of a URI is considered case-insensitive, and is normalized to lower case....
Read more >
HTTP::path -normalized (TMOS 13) issue? - DevCentral
According to the Wiki, HTTP::path -normalized should do: the normalization involves lower-casing, removing unnecessary directory traversals, ...
Read more >
RFC 3986: Uniform Resource Identifier (URI): Generic Syntax
Advice for designers of new URI schemes can be found in [RFC2718]. ... The ability to transcribe a resource identifier from one medium...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found