Обсуждение: OR with multi-key indexes
Here is a comment in path/indxpath.c that says they don't want to use multi-key indexes with OR clauses. Of course, we now support multi-key indexes, and this code was disabled anyway because it was broken. (In fact, it was disabled by having SingleAttributeIndex() always return false. Is there any reason we can't use multi-key indexes if the first key matches our OR column? I don't see why not. Also, I don't know how to handle the case where we specify the first key of a multi-key index in an AND clause, and specify the second key in an OR clause because the AND's are handled in a separate are of the code. Any ideas how to implement this? (Of course, the code would still use the index for the AND, but I don't know how to bring the AND case into my OR index clause processing area.) --------------------------------------------------------------------------- /* * 1. If this index has only one key, try matching it against * subclauses of an 'or' clause. The fields of the clauseinfo nodes * are marked with lists of the matching indices no path are actually * created. * * XXX NOTE: Currently btrees dos not support indices with > 1 key, so * the following test will always be true for now but we have decided * not to support index-scans on disjunction . -- lp */ if (SingleAttributeIndex(index)) { -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > > Here is a comment in path/indxpath.c that says they don't want to use > multi-key indexes with OR clauses. Of course, we now support multi-key > indexes, and this code was disabled anyway because it was broken. (In > fact, it was disabled by having SingleAttributeIndex() always return > false. > > Is there any reason we can't use multi-key indexes if the first key > matches our OR column? I don't see why not. Also, I don't know how to ^^^^^^^^^^^^^^^^^^^ Me too. > handle the case where we specify the first key of a multi-key index in > an AND clause, and specify the second key in an OR clause because the What do you mean? Example please... > AND's are handled in a separate are of the code. Any ideas how to > implement this? (Of course, the code would still use the index for the > AND, but I don't know how to bring the AND case into my OR index clause > processing area.) Vadim
> Bruce Momjian wrote: > > > > Here is a comment in path/indxpath.c that says they don't want to use > > multi-key indexes with OR clauses. Of course, we now support multi-key > > indexes, and this code was disabled anyway because it was broken. (In > > fact, it was disabled by having SingleAttributeIndex() always return > > false. > > > > Is there any reason we can't use multi-key indexes if the first key > > matches our OR column? I don't see why not. Also, I don't know how to > ^^^^^^^^^^^^^^^^^^^ > Me too. Good. Code restriction removed. > > > handle the case where we specify the first key of a multi-key index in > > an AND clause, and specify the second key in an OR clause because the > > What do you mean? Example please... > > > AND's are handled in a separate are of the code. Any ideas how to > > implement this? (Of course, the code would still use the index for the > > AND, but I don't know how to bring the AND case into my OR index clause > > processing area.) create table test (x int4, y int4); create index i_test on test(x,y); insert into test values(1,2); select * from test where x=3 and (y=1 or y=2); This is going to use the i_test index, but only with key x=3, and do a scan of the index looking for y=1 or y=2, and will not use the second key of the index. I don't know how to pass information to the OR code so it will know the first part of the index is matched, and it can compare the second key. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > > create table test (x int4, y int4); > create index i_test on test(x,y); > insert into test values(1,2); > select * from test where x=3 and (y=1 or y=2); > > This is going to use the i_test index, but only with key x=3, and do a > scan of the index looking for y=1 or y=2, and will not use the second ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Server will fetch heap tuple for each tuple with x = 3 returned by index access methods and call ExecQual... > key of the index. There are two ways. I. Rewrite this into (x = 3 and y = 1) or (x = 3 and y = 2) and scan index twice using both keys for finding index tuples. II. Extend multi-key indexing: (y = 1 or y = 2) could be qualified by index access methods itself because of Y is one of index keys. Only first key would be used for finding index tuples but additional qualification could decrease number of heap_fetch calls and this would be nice! This feature would be also usefull for: create index on table (a,b,c); select * from table where a = 1 and c = 2; ^^^^^ additional qualification would be performed on index level Personally, I would like to see II implemented first because of it works for both query examples. Vadim
> Bruce Momjian wrote: > > > > create table test (x int4, y int4); > > create index i_test on test(x,y); > > insert into test values(1,2); > > select * from test where x=3 and (y=1 or y=2); > > > > This is going to use the i_test index, but only with key x=3, and do a > > scan of the index looking for y=1 or y=2, and will not use the second > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Server will fetch heap tuple for each tuple with x = 3 > returned by index access methods and call ExecQual... > > > key of the index. > > There are two ways. > > I. Rewrite this into > > (x = 3 and y = 1) or (x = 3 and y = 2) > > and scan index twice using both keys for finding index tuples. Yes. One way to do this may be to have OR processed AFTER AND in the optimizer, and use the existing index matches to look for matches further down the key. I am not sure where the actual multiple index plans are created, so I am not sure how to implement this. > II. Extend multi-key indexing: (y = 1 or y = 2) could be > qualified by index access methods itself because of Y is > one of index keys. Only first key would be used for finding > index tuples but additional qualification could decrease > number of heap_fetch calls and this would be nice! I see what you are saying. In the first case, you look for key1,key2a, then for key1,key2b, etc. In the second case, you look for key1, then sitting on key1, you can get key2a, key2b, etc, without having to find key1 for every OR. > This feature would be also usefull for: > > create index on table (a,b,c); > select * from table where a = 1 and c = 2; > ^^^^^ > additional qualification would be performed on index level > > Personally, I would like to see II implemented first because > of it works for both query examples. Doesn't the existing code already use both keys in the above query. What is gained by moving this to the index access methods? -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > > > > create index i_test on test(x,y); > > > insert into test values(1,2); > > > select * from test where x=3 and (y=1 or y=2); > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be > > qualified by index access methods itself because of Y is > > one of index keys. Only first key would be used for finding > > index tuples but additional qualification could decrease > > number of heap_fetch calls and this would be nice! > > > This feature would be also usefull for: > > > > create index on table (a,b,c); > > select * from table where a = 1 and c = 2; > > ^^^^^ > > additional qualification would be performed on index level > > > > Personally, I would like to see II implemented first because > > of it works for both query examples. > > Doesn't the existing code already use both keys in the above query. > What is gained by moving this to the index access methods? I hadn't time to implement this year ago... Let's say we have 1000 tuples with a = 1 and only 10 with a = 1 and c = 2 - currently, all 1000 index tuples will be returned to Executor and all corresponding 1000 heap tuples will be fetched... Having this feature, only 10 index tuples would be returned and heap_fetch would be called 10 times only. Vadim
> Bruce Momjian wrote: > > > > > > create index i_test on test(x,y); > > > > insert into test values(1,2); > > > > select * from test where x=3 and (y=1 or y=2); > > > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be > > > qualified by index access methods itself because of Y is > > > one of index keys. Only first key would be used for finding > > > index tuples but additional qualification could decrease > > > number of heap_fetch calls and this would be nice! > > > > > This feature would be also usefull for: > > > > > > create index on table (a,b,c); > > > select * from table where a = 1 and c = 2; > > > ^^^^^ > > > additional qualification would be performed on index level > > > > > > Personally, I would like to see II implemented first because > > > of it works for both query examples. > > > > Doesn't the existing code already use both keys in the above query. > > What is gained by moving this to the index access methods? > > I hadn't time to implement this year ago... > > Let's say we have 1000 tuples with a = 1 and only 10 with > a = 1 and c = 2 - currently, all 1000 index tuples will be returned > to Executor and all corresponding 1000 heap tuples will be fetched... > Having this feature, only 10 index tuples would be returned > and heap_fetch would be called 10 times only. OK, stupid question, but why do we have multi-key indexes then? Just to allow UNIQUE index failure? -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > > > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be > > > > qualified by index access methods itself because of Y is > > > > one of index keys. Only first key would be used for finding > > > > index tuples but additional qualification could decrease > > > > number of heap_fetch calls and this would be nice! > > > > > > > This feature would be also usefull for: > > > > > > > > create index on table (a,b,c); > > > > select * from table where a = 1 and c = 2; > > > > ^^^^^ > > > > additional qualification would be performed on index level > > > > > > > > Personally, I would like to see II implemented first because > > > > of it works for both query examples. > > > > > > Doesn't the existing code already use both keys in the above query. > > > What is gained by moving this to the index access methods? > > > > I hadn't time to implement this year ago... > > > > Let's say we have 1000 tuples with a = 1 and only 10 with > > a = 1 and c = 2 - currently, all 1000 index tuples will be returned > > to Executor and all corresponding 1000 heap tuples will be fetched... > > Having this feature, only 10 index tuples would be returned > > and heap_fetch would be called 10 times only. > > OK, stupid question, but why do we have multi-key indexes then? Just to > allow UNIQUE index failure? In the example above only 1st and _3rd_ index keys are used in WHERE and so only 1st key will be used by index. For cases like WHERE a = 1 and b = 3 index will use 1st and 2nd keys. For cases like WHERE a = 1 and b = 3 and c = 2 index will use all three keys. Extending II means not using 3rd key to _find_ index tuples but using 3rd key to reduce # of index tuples returned to executor. Btree will read all tuples with a = 3 but will not return index tuple with a = 3 and c = 0 (ie). Vadim
> Bruce Momjian wrote: > > > > > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be > > > > > qualified by index access methods itself because of Y is > > > > > one of index keys. Only first key would be used for finding > > > > > index tuples but additional qualification could decrease > > > > > number of heap_fetch calls and this would be nice! > > > > > > > > > This feature would be also usefull for: > > > > > > > > > > create index on table (a,b,c); > > > > > select * from table where a = 1 and c = 2; > > > > > ^^^^^ > > > > > additional qualification would be performed on index level > > > > > > > > > > Personally, I would like to see II implemented first because > > > > > of it works for both query examples. > > > > > > > > Doesn't the existing code already use both keys in the above query. > > > > What is gained by moving this to the index access methods? > > > > > > I hadn't time to implement this year ago... > > > > > > Let's say we have 1000 tuples with a = 1 and only 10 with > > > a = 1 and c = 2 - currently, all 1000 index tuples will be returned > > > to Executor and all corresponding 1000 heap tuples will be fetched... > > > Having this feature, only 10 index tuples would be returned > > > and heap_fetch would be called 10 times only. > > > > OK, stupid question, but why do we have multi-key indexes then? Just to > > allow UNIQUE index failure? > > In the example above only 1st and _3rd_ index keys are used > in WHERE and so only 1st key will be used by index. > For cases like WHERE a = 1 and b = 3 index will use > 1st and 2nd keys. > For cases like WHERE a = 1 and b = 3 and c = 2 index will use > all three keys. > > Extending II means not using 3rd key to _find_ index tuples > but using 3rd key to reduce # of index tuples returned > to executor. Btree will read all tuples with a = 3 but > will not return index tuple with a = 3 and c = 0 (ie). Oh, stupid me. I see now. Hardly seems worth adding the extra code to reduce a match. I agree, let the executor handle it. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> Bruce Momjian wrote: > > > > create table test (x int4, y int4); > > create index i_test on test(x,y); > > insert into test values(1,2); > > select * from test where x=3 and (y=1 or y=2); > > > > This is going to use the i_test index, but only with key x=3, and do a > > scan of the index looking for y=1 or y=2, and will not use the second > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Server will fetch heap tuple for each tuple with x = 3 > returned by index access methods and call ExecQual... > > > key of the index. > > There are two ways. > > I. Rewrite this into > > (x = 3 and y = 1) or (x = 3 and y = 2) > > and scan index twice using both keys for finding index tuples. > > II. Extend multi-key indexing: (y = 1 or y = 2) could be > qualified by index access methods itself because of Y is > one of index keys. Only first key would be used for finding > index tuples but additional qualification could decrease > number of heap_fetch calls and this would be nice! > > This feature would be also usefull for: > > create index on table (a,b,c); > select * from table where a = 1 and c = 2; > ^^^^^ > additional qualification would be performed on index level > > Personally, I would like to see II implemented first because > of it works for both query examples. I now understand how this feature could be used in both cases where we match key1 and key3, and when we use OR on second key. I see what you are saying, that the values are already in the index, so you could scan the index looking for the third key or the OR clause, and not have to fetch the heap to look it up. I will add this to the TODO list under performance. It is sort of exotic, but why not add it: * use index to restrict rows returned by multi-key index for non-consecutive keys or OR clauses -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce, I have been following this thread great enthusiasm. I seem to have missed something though. Have you been CVSing these changes, and as such, should I be able to use indexes on ORs?
> Bruce, > > I have been following this thread great enthusiasm. I seem to have missed > something though. Have you been CVSing these changes, and as such, should I > be able to use indexes on ORs? > > > > Yes. Do an update because I have been fixing some minor problems. Should work now. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)