Skip to content

parsing fonts.googleapis.com incorrect #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amitmtrn opened this issue Dec 6, 2015 · 5 comments
Closed

parsing fonts.googleapis.com incorrect #4

amitmtrn opened this issue Dec 6, 2015 · 5 comments

Comments

@amitmtrn
Copy link

amitmtrn commented Dec 6, 2015

parsing fonts.googleapis.com isn't work correctly. the subdomain should be fonts the domain should be googleapis and the tld should be com.
this is what I got while parsing the domain
{ tld: 'googleapis.com', domain: 'fonts', subdomain: '' }

@jhnns
Copy link
Member

jhnns commented Jan 2, 2016

Mhmm ... that's tricky, because googleapis.com is listed at publicsuffix, since the "user controlled" part of the URL is placed before googleapis.com. Same with blogspot.com.

That's why parseDomain() thinks it's a TLD although it's technically not a TLD. I probably used the term TLD in a wrong way, since a TLD is always the last portion of the URL (like uk in .co.uk). However, most users care more about the "user controlled" part which can be very subjective depending on your use-case.

Browser vendors, however, use publicsuffix as source to determine whether the entered string is a URL or a search keyword and even for restricting cookie access.

Honestly, I don't know how to proceed. Can we define an expected behavior? Maybe publicsuffix is the wrong source, maybe we should leverage DNS?

@amitmtrn
Copy link
Author

amitmtrn commented Jan 2, 2016

That's interesting, I didn't know the concept of public suffix. I used this module to separate the domain name and use whois to check who own this domain. I think that maybe it would be best to add the domain LTD and the domain publicsuffix.

for example:
something.blogspot.com whould be parse to

{
domain: blogspot,
subdomain: something,
LTD: com,
publicsuffix: blogspot.com 
}

@hongkongkiwi
Copy link

Seems I am having the same issue as above #6. It would be great to seperate out publicsuffix, because honestly that's just a guess and we will never catch them all. Where as tld is very clear, there can only be a finite number of tld's and in my case I'm most interested in the tld, domain, subdomain breakdown.

@jhnns
Copy link
Member

jhnns commented Feb 3, 2016

Well, as far as I can tell, is the TLD just the last portion of the URL. I don't know if there is a rule that separates the .co.uk case from the blogspot.com. That's why publicsuffix.com calls itself a "A list of effective TLDs".

However, since browser vendors use this list even for restricting cookie access, it's not just an arbitrary list. But I have to admit that it's somewhat surprising...

I'm thinking about using DNS resolution to distinct between co.uk and blogspot.com. Former returns Can't find co.uk: No answer while the latter returns a IP.

@jhnns
Copy link
Member

jhnns commented Oct 14, 2016

parse-domain excludes private domains by default now. Shipped with 1.0.0.

@jhnns jhnns closed this as completed Oct 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants