GiST kNN search queue (Re: KNN-GiST with recheck)
От | Heikki Linnakangas |
---|---|
Тема | GiST kNN search queue (Re: KNN-GiST with recheck) |
Дата | |
Msg-id | 54886BB8.9040000@vmware.com обсуждение исходный текст |
Ответ на | Re: KNN-GiST with recheck (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Re: GiST kNN search queue (Re: KNN-GiST with recheck) |
Список | pgsql-hackers |
On 01/28/2014 04:12 PM, Alexander Korotkov wrote: >> >3. A binary heap would be a better data structure to buffer the rechecked >> >values. A Red-Black tree allows random insertions and deletions, but in >> >this case you need to insert arbitrary values but only remove the minimum >> >item. That's exactly what a binary heap excels at. We have a nice binary >> >heap implementation in the backend that you can use, see >> >src/backend/lib/binaryheap.c. >> > > Hmm. For me binary heap would be a better data structure for KNN-GiST at > all :-) I decided to give this a shot, replacing the red-black tree in GiST with the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat simpler, as the binaryheap interface is simpler than the red-black tree one. Unfortunately, performance was somewhat worse. That was quite surprising, as insertions and deletions are both O(log N) in both data structures, but the red-black tree implementation is more complicated. I implemented another data structure called a Pairing Heap. It's also a fairly simple data structure, but insertions are O(1) instead of O(log N). It also performs fairly well in practice. With that, I got a small but measurable improvement. To test, I created a table like this: create table gisttest (id integer, p point); insert into gisttest select id, point(random(), random()) from generate_series(1, 1000000) id; create index i_gisttest on gisttest using gist (p); And I ran this query with pgbench: select id from gisttest order by p <-> '(0,0)' limit 1000; With unpatched master, I got about 650 TPS, and with the patch 720 TPS. That's a nice little improvement, but perhaps more importantly, the pairing heap implementation consumes less memory. To measure that, I put a MemoryContextStats(so->queueCtx) call into gistendscan. With the above query, but without the "limit" clause, on master I got: GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks); 21296 used And with the patch: GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks); 21072 used That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to reduce that memory consumption, as there is no hard upper bound on how much might be needed. If the GiST tree is really disorganized for some reason, a query might need a lot more. So all in all, I quite like this patch, even though it doesn't do anything too phenomenal. It adds a some code, in the form of the new pairing heap implementation, but it makes the GiST code a little bit simpler. And it gives a small performance gain, and reduces memory usage a bit. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: