Re: Efficient rows filter for array inclusion with gin index
От | Shanti-Dominique |
---|---|
Тема | Re: Efficient rows filter for array inclusion with gin index |
Дата | |
Msg-id | f2dd527d-89d0-4bc0-a0a5-52b41f109c4f@mildred.fr обсуждение исходный текст |
Ответ на | Efficient rows filter for array inclusion with gin index (Shanti-Dominique <shanti@mildred.fr>) |
Список | pgsql-general |
Replying to myself after more investigation. On 28/02/2024 12:05, Shanti-Dominique wrote: > > > 2) > SELECT * > FROM items i1 > JOIN item_paths p1 ON i1.ref_id = p1.ref_id > JOIN items i2 ON ARRAY[i2.ref_id] <@ p1.item_path > WHERE ... > > 2) uses the operator <@ which is supported by the gin index, the test > for inclusion is fast and the query does not run a sequential scan > over the whole "item_paths" table. However, because of the > ARRAY[i2.ref_id] construct, it performs a sequential scan on i2. I was under the assumption that the ARRAY[] construct prevented postgresql from efficiently using the index on the other side of the operator, but I think I was mistaken. On a database full of data, I tried getting around this but did not see any improvement of performance. First I tried to add an index on the single element array: CREATE FUNCTION uuidarr(ref_id uuid) RETURNS uuid[] LANGUAGE SQL IMMUTABLE RETURNS NULL ON NULL INPUT RETURN ARRAY[ref_id]; CREATE INDEX items_ref_id_arr2_idx ON items USING gin (uuidarr(ref_id)); EXPLAIN SELECT i2.* FROM items i1 JOIN item_paths p1 ON i1.ref_id = p1.ref_id JOIN items i2 ON uuidarr(i2.ref_id) <@ p1.item_path WHERE i1.name = 'a'; The performance was even worse. Then I tried with a generated column: CREATE TABLE items ( ref_id uuid DEFAULT public.gen_random_uuid() NOT NULL, ref_id_array uuid[] GENERATED ALWAYS AS (uuidarr(ref_id)) STORED, name character varying, parent_ref_id uuid ); CREATE INDEX items_ref_id_array_idx ON items USING gin (ref_id_array); EXPLAIN SELECT i2.* FROM items i1 JOIN item_paths p1 ON i1.ref_id = p1.ref_id JOIN items i2 ON i2.ref_id_array <@ p1.item_path WHERE i1.name = 'a'; The performance was very similar to the query with ARRAY[...] It seems there is no good solution for the general case, apart from changing the structure of my dataset and removing the use of arrays entirely. I think I'll update my codebase and use <@ where it makes sense and = ANY in other places, but it'll be difficult to know for sure without running the query which one will be better.
В списке pgsql-general по дате отправления: