Re: Extracting hostname from URI column
| От | John Summerfield |
|---|---|
| Тема | Re: Extracting hostname from URI column |
| Дата | |
| Msg-id | 46F0B452.1050104@herakles.homelinux.org обсуждение исходный текст |
| Ответ на | Re: Extracting hostname from URI column (Paul Lambert <paul.lambert@autoledgers.com.au>) |
| Список | pgsql-sql |
Paul Lambert wrote: > Paul Lambert wrote: >> chester c young wrote: >>>> I'm trying to use substr() and position() functions to extract the >>>> full host name (and later a domain) from a column that holds URLs. >>> >>> substring( href from '.*://\([^/]*)' ); >>> >> >> Ok, your solution looks better than mine... but I have no idea how to >> interpret that, time to consult some manuals. >> > > OK - following on from this, I'm doing a similar thing to the OP to > analyze my proxy's traffic (never occured to me to do it in a db until > that post) > > I've done the above regex to pull out the domain part of the URL and am > left with results such as: > "acvs.mediaonenetwork.net" > "profile.ak.facebook.com" > "www.bankwest.com.au" > > What I want to do next is pull out the major domain part of the URL > I.e. for the above three records I should end up with > "mediaonenetwork.net" > "facebook.com" > "bankwest.com.au" > > What would be the best way to do something like that? I assume it won't > be a simple regex like the above due to the country codes on the end of > some domains. My thought is look at the last portion of the domain, if > it's 2 characters long then assume it's a country code and grab the last > three sections, if it's not three characters long then assume it's an > international domain and grab the last two... but that sounds a bit dodgy. and csiro.au? There may be a few others in .au too, legacies from before we went commercial.
В списке pgsql-sql по дате отправления: