Re: Trying out read streams in pgvector (an extension)
От | Heikki Linnakangas |
---|---|
Тема | Re: Trying out read streams in pgvector (an extension) |
Дата | |
Msg-id | e6c01b35-fb3f-4a36-a765-d731752f1b6c@iki.fi обсуждение исходный текст |
Ответ на | Trying out read streams in pgvector (an extension) (Thomas Munro <thomas.munro@gmail.com>) |
Список | pgsql-hackers |
On 11/06/2024 07:53, Thomas Munro wrote: > Someone involved in that project mentioned that it's probably not a > great topic to research in practice, because real world users of HNSW > use fully cached ie prewarmed indexes, because the performance is so > bad otherwise. (Though maybe that argument is a little circular...). I think that's true in practice for *building* an HNSW index, but faster *searching* when the index is not in memory seems quite useful. And of course, faster is always better, even if it's only in a non-optimal scenario. > So although this patch clearly speeds up cold HSNW searches to a > degree controlled by effective_io_concurrency, I'll probably look for > something else. Suggestions for interesting index types to look at > streamifying are very welcome! GiST and GIN? > Hmm. If that's really true about HNSW though, then there may still be > an opportunity to do automatic memory prefetching[1]. But then in the > case of index building, "stream" is NULL in this patch anyway. It > surely must also be possible to find some good places to put > profitable explicit pg_mem_prefetch() calls given the predictability > and the need to get only ~60ns ahead for that usage. I didn't look > into that because I was trying to prove things about read_stream.c, > not get involved in another project :-D > > Here ends my science experiment report, which I'm dropping here just > in case others see useful ideas here. The main thing I learned about > the read stream API is that it'd be nice to be able to reset the > stream but preserve the distance (something that came up on the > streaming sequential scan thread for a different reason), to deal with > cases where look-ahead opportunities come in bursts but you want a > longer lived stream than I used here. That is the reason the patch > creates and destroys temporary streams in a loop; doh. It also > provides an interesting case study for what speculative random > look-ahead support might need to look like. This reminds me of a prototype I wrote earlier, see https://github.com/pgvector/pgvector/pull/386, 1st commit. It reorganizes HnswSearchLayer() so that it in iteration, it first collects all the neighbors to visit, and then visits them, somewhat similar to your patch. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: