Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
От | Teodor Sigaev |
---|---|
Тема | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit |
Дата | |
Msg-id | 47D14998.3080304@sigaev.ru обсуждение исходный текст |
Ответ на | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-patches |
To be precise about tsvector: 1) GiST index is lossy for any kind of tserach queries, GIN index for @@ operation is not lossy, for @@@ - is lossy. 2) Number of positions per word is limited to 256 number - bigger number of positions is not helpful for ranking, but produces a big tsvector. If word has a lot of positions in document then it close to be a stopword. We could easy increase this limit to 65536 positions 3) Maximum value of position is 2^14, because for position's storage we use uint16. In this integer it's needed to reserve 2 bits to store weight of this position. It's possible to increase int16 to int32, but it will doubled tsvector size, which is unpractical, I suppose. So, part of document used for ranking contains first 16384 words - that is about first 50-100 kilobytes. 4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It contains number of bytes between first lexeme in tsvector and needed lexeme. So, limitation is total length of lexemes plus theirs positional information. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-patches по дате отправления: