Re: Bug with Tsearch and tsvector
От | Tom Lane |
---|---|
Тема | Re: Bug with Tsearch and tsvector |
Дата | |
Msg-id | 9738.1272307397@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Bug with Tsearch and tsvector ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Ответы |
Re: Bug with Tsearch and tsvector
|
Список | pgsql-bugs |
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ie the critical point seems to be that url_path is willing to soak >> up a string containing "<" and ">", so the span tags don't get >> recognized as separate lexemes. While that's "obviously" the >> wrong thing in this particular example, I'm not sure if it's the >> wrong thing in general. Can anyone comment on the frequency of >> usage of those two symbols in URLs? > http://www.ietf.org/rfc/rfc2396.txt section 2.4.3 "delims" expressly > forbids their use in URIs. > In spite of the above prohibition, I notice that firefox and wget > both seem to *try* to use such characters if they're included. Hmm, thanks for the reference, but I'm not sure this is specifying quite what we want to get at. In particular I note that it excludes '%' on the grounds that that ought to be escaped, so I guess this is specifying the characters allowed in an underlying URI, *not* the textual representation of a URI. Still, it seems like this is a sufficient defense against any complaints we might get for not treating "<" or ">" as part of a URL. I wonder whether we ought to reject any of the other characters listed here too. Right now, the InURLPath state seems to eat everything until a space, quote, or double quote mark. We could easily make it stop at "<" or ">" too, but what else? regards, tom lane
В списке pgsql-bugs по дате отправления: