Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
От | Robert Haas |
---|---|
Тема | Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores |
Дата | |
Msg-id | 603c8f070910220929i14dbcdcfw648e0c1a7ae19ef@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #5021: ts_parse doesn't recognize email addresses with underscores ("Dan O'Hara" <danarasoftware@gmail.com>) |
Ответы |
Re: BUG #5021: ts_parse doesn't recognize email addresses with
underscores
Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores |
Список | pgsql-bugs |
On Fri, Aug 28, 2009 at 9:59 AM, Dan O'Hara <danarasoftware@gmail.com> wrot= e: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05021 > Logged by: =A0 =A0 =A0 =A0 =A0Dan O'Hara > Email address: =A0 =A0 =A0danarasoftware@gmail.com > PostgreSQL version: 8.3.7 > Operating system: =A0 win32 > Description: =A0 =A0 =A0 =A0ts_parse doesn't recognize email addresses wi= th > underscores > Details: > > In the following example, > > select distinct token as email > from ts_parse('default', ' first_last@yahoo.com ' =A0 ) > where tokid =3D 4 > > ts_parse returns last@yahoo.com rather than first_last@yahoo.com =A0It se= ems > that any text prior to the underscore is truncated. =A0If the portion > following the underscore is only numeric, such as this example, > > select distinct token as email > from ts_parse('default', ' bill_2000@yahoo.com ' =A0 ) > where tokid =3D 4 > > then ts_parse returns nothing at all. > > section 3.2.3 of RFC 5322 indicates that underscores are valid characters= in > an email address. > > http://tools.ietf.org/html/rfc5322 I don't think this has much to do with email addresses. If you do: select token from ts_parse('a_b'); ...you get three tokens. In your case you're pulling out the fourth token, but some of your examples don't have four tokens, so then you get nothing at all. I'm not real familiar with ts_parse(), but I'm thinking that it doesn't have any special casing for email addresses and is just intended to parse text for full-text-search - in which case splitting on _ is a pretty good algorithm. ...Robert
В списке pgsql-bugs по дате отправления: