semi-PoC: kNN-gist for cubes
От | Jay Levitt |
---|---|
Тема | semi-PoC: kNN-gist for cubes |
Дата | |
Msg-id | 4F30616D.3030509@gmail.com обсуждение исходный текст |
Ответы |
Re: semi-PoC: kNN-gist for cubes
|
Список | pgsql-hackers |
I have a rough proof-of-concept for getting nearest-neighbor searches working with cubes. When I say "rough", I mean "I have no idea what I'm doing and I haven't written C for 15 years but I hear it got standardized please don't hurt me". It seems to be about 400x faster for a 3D cube with 1 million rows, more like 10-30x for a 6D cube with 10 million rows. The patch adds operator <-> (which is just the existing cube_distance function) and support function 8, distance (which is just g_cube_distance, a wrapper around cube_distance). The code is in no way production-quality; it is in fact right around "look! it compiles!", complete with pasted-in, commented-out code from something I was mimicking. I thought I'd share at this early stage in the hopes I might get some pointers, such as: - What unintended consequences should I be looking for? - What benchmarks should I do? - What kind of edge cases might I consider? - I'm just wrapping cube_distance and calling it through DirectFunctionCall; it's probably more proper to extract out the "real" function and call it from both cube_distance and g_cube_distance. Right? - What else don't I know? (Besides C, funny man.) The patch, such as it is, is at: https://github.com/jaylevitt/postgres/commit/9cae4ea6bd4b2e582b95d7e1452de0a7aec12857 with an even-messier test at https://github.com/jaylevitt/postgres/commit/daa33e30acaa2c99fe554d88a99dd7d78ff6c784 I initially thought this patch made inserting and indexing slower, but then I realized the fast version was doing 1 million rows, and the slow one did 10 million rows. Which means: dinnertime. Jay Levitt
В списке pgsql-hackers по дате отправления: