Re: tsvector string representation and parsing
От | Johannes Graën |
---|---|
Тема | Re: tsvector string representation and parsing |
Дата | |
Msg-id | 54593f15-6b3b-086e-ce1e-69f0af9d38a9@selfnet.de обсуждение исходный текст |
Ответ на | Re: tsvector string representation and parsing (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: tsvector string representation and parsing
|
Список | pgsql-general |
Thanks, Tom. On 23/02/2022 23.30, Tom Lane wrote: > =?UTF-8?Q?Johannes_Gra=c3=abn?= <johannes@selfnet.de> writes: >> This is a minimal example that goes wrong (but shouldn't IMHO: > >>> SELECT format('%L:1', '\:')::tsvector > > format(%L) is designed to produce a SQL literal, which does not > have the same requirements as a tsvector element ... yeah, they're > close, but not close enough. In this particular example, > what you get is > > =# SELECT format('%L:1', '\:'); > format > ---------- > E'\\:':1 > (1 row) > > because format() adds an E prefix for the avoidance of doubt about > what to do with the backslashes. tsvector doesn't like that. I see. > I don't think we have any prefab function that does what you're > looking for here, and TBH I'm not sure I see the point of it. > Pretty much any tsvector you'd be dealing with in practice is > going to have come from one of the to_tsvector family of > functions, and those tend to drop punctuation. Applied to normal texts in a standard language, I believe that's true. I'd like to use FTS in a setting where I need to control the positional attributes as I'm specifying more than one lexeme at the same position (which works great btw). That's why I can't use to_tsvector() but need to cast it from a string to tsvector. >> My understanding is that everything inside >> single quotes is taken as a lexeme. From the documentation: > >>> To represent lexemes containing whitespace or punctuation, surround them with quotes > > See also the next bit about having to double quotes and backslashes > within those quotes. So what you'd actually need is > > =# select $$'\\:':1$$::tsvector; > tsvector > ---------- > '\\:':1 > (1 row) > > If you write just one backslash, it has the effect of quoting > the next character, which in this case doesn't need quoting. Before using format(), I tried just generating those strings by doubling any single quote or backslash and enclosing the whole string in single quotes, but that didn't seem a safe way, though it works in principle: > SELECT format($$'%s':%s$$, replace(replace(s, $$'$$, $$''$$), '\', '\\'), i)::tsvector > FROM (SELECT $$\:$$ s, 1 i) x; Would that be the way to go if to_tsvector is not an option? Regards Johannes
В списке pgsql-general по дате отправления: