Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore
От | Euler Taveira de Oliveira |
---|---|
Тема | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Дата | |
Msg-id | 4ABAAFC8.7030108@timbira.com обсуждение исходный текст |
Ответ на | BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore ("Marek Lewczuk" <marek@lewczuk.com>) |
Ответы |
Re: BUG #5075: Text Search parser does not identify xml tag
when attribute name's contains underscore
Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Список | pgsql-bugs |
Marek Lewczuk escreveu: > Please execute following example: > select * from ts_debug('english', '<img width="182" height="120" > align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>') > > As the result you will see, that <img/> is not identified as XML tag, but > rather splitted as words, blank spaces etc. The reason for that is the fact, > that last attribute "test_aa" contains underscore in its name - when the > underscore is removed, then img tag is properly identified as XML tag. > > XML definition allows using underscore in tag and attribute names. > The problem is we already allow it in tag names but not in attribute names. So the proper fix is to allow underscore when the state is TPS_InTag; according to XML spec [1], the underscore is a valid character in attribute names. A possible downside is that we don't have underscores in HTML attribute names. In this case, should it fail? I don't think so but... The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there isn't a problem to back-patch it. [1] http://www.w3.org/TR/REC-xml/#sec-common-syn -- Euler Taveira de Oliveira http://www.timbira.com/ Index: wparser_def.c =================================================================== RCS file: /a/pgsql/dev/anoncvs/pgsql/src/backend/tsearch/wparser_def.c,v retrieving revision 1.24 diff -c -r1.24 wparser_def.c *** wparser_def.c 16 Jul 2009 06:33:44 -0000 1.24 --- wparser_def.c 23 Sep 2009 23:19:28 -0000 *************** *** 1225,1230 **** --- 1225,1231 ---- {p_isdigit, 0, A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '=', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '-', A_NEXT, TPS_Null, 0, NULL}, + {p_iseqC, '_', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '#', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '/', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, ':', A_NEXT, TPS_Null, 0, NULL},
В списке pgsql-bugs по дате отправления: