Incorrect domains returned by EffectiveTldFinder
See original GitHub issueThe method EffectiveTldFinder.getAssignedDomain(hostname)
returns an incorrect domain in a couple of cases. This was detected by a comparison with the Python module tldextract which is also based on the public suffix list.
-
exceptions to wildcard rules are not recognized as valid domain, e.g. with the rules
*.kawasaki.jp !city.kawasaki.jp
the domain for “www.city.kawasaki.jp” should be “city.kawasaki.jp”.
-
the domain detection fails entirely of the .za top-level domain: “blogs.uct.ac.za” should give “ac.za”. A test whether the last element “za” is a valid TLD fails but the .za registrar does not allow .za alone as TLD.
-
detection fails for internationalized domain names in their punycoded (ASCII) form: for “спб.бесплатныеобъявления.рф” the domain “бесплатныеобъявления.рф” is correctly returned but not for the punycode equivalent “спб.бесплатныеобъявления.рф”. The public suffix list contains the utf-8 representation of IDNs and must be (additionally) transposed to the ASCII/Punycode form.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Let me work on it today. Merging fails and I’ll try to get a PR ready today evening.
Done, except for IDNs which do not get processed. The puny-coded form however works. I’ll open a separate issue for this, requires more changes, esp. to handle mixed usage of IDN and punycoded parts.