Re: GIN improvements part2: fast scan

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: GIN improvements part2: fast scan
Дата	30 января 2014 г. 16:38:28
Msg-id	52EA7FFA.60001@vmware.com обсуждение исходный текст
Ответ на	Re: GIN improvements part2: fast scan (Tomas Vondra <tv@fuzzy.cz>)
Ответы	Re: GIN improvements part2: fast scan
Список	pgsql-hackers

Дерево обсуждения

On 01/30/2014 01:53 AM, Tomas Vondra wrote:
> (3) A file with explain plans for 4 queries suffering ~2x slowdown,
>      and explain plans with 9.4 master and Heikki's patches is available
>      here:
>
>        http://www.fuzzy.cz/tmp/gin/queries.txt
>
>      All the queries have 6 common words, and the explain plans look
>      just fine to me - exactly like the plans for other queries.
>
>      Two things now caught my eye. First some of these queries actually
>      have words repeated - either exactly like "database & database" or
>      in negated form like "!anything & anything". Second, while
>      generating the queries, I use "dumb" frequency, where only exact
>      matches count. I.e. "write != written" etc. But the actual number
>      of hits may be much higher - for example "write" matches exactly
>      just 5% documents, but using @@ it matches more than 20%.
>
>      I don't know if that's the actual cause though.

I tried these queries with the data set you posted here: 
http://www.postgresql.org/message-id/52E4141E.60609@fuzzy.cz. The reason 
for the slowdown is the extra consistent calls it causes. That's 
expected - the patch certainly does call consistent in situations where 
it didn't before, and if the "pre-consistent" checks are not able to 
eliminate many tuples, you lose.

So, what can we do to mitigate that? Some ideas:

1. Implement the catalog changes from Alexander's patch. That ought to 
remove the overhead, as you only need to call the consistent function 
once, not "both ways". OTOH, currently we only call the consistent 
function if there is only one unknown column. If with the catalog 
changes, we always call the consistent function even if there are more 
unknown columns, we might end up calling it even more often.

2. Cache the result of the consistent function.

3. Make the consistent function cheaper. (How? Magic?)

4. Use some kind of a heuristic, and stop doing the pre-consistent 
checks if they're not effective. Not sure what the heuristic would look 
like.

The caching we could easily do. It's very simple and very effective, as 
long as the number of number of entries is limited. The amount of space 
required to cache all combinations grows exponentially, so it's only 
feasible for up to 10 or so entries.

- Heikki

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: GIN improvements part2: fast scan