Обсуждение: Use of additional index columns in rows filtering
Hi All, I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve. TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like WHEREB = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L Take following database: CREATE TABLE t( a integer NOT NULL, b integer NOT NULL, d integer NOT NULL ); CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b); -- I'd expect index above to behave as index below for the purpose -- of this query -- CREATE UNIQUE INDEX ON t(a,b); INSERT INTO t( SELECT random() * 100000000 as a, random() * 3 as b, generate_series as d FROM generate_series(1,200000) ) ON CONFLICT DO NOTHING; If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples fromheap to check for condition `b = 4` despite both columns available in the index: SELECT * FROM t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10; Here is the plan (notice high "shared hit"): Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1) Output: a, b, d Buffers: shared hit=198307 -> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281 rows=0loops=1) Output: a, b, d Index Cond: (t.a > 1000000) Filter: (t.b = 4) Rows Removed by Filter: 197805 Buffers: shared hit=198307 Planning: Buffers: shared hit=30 Planning Time: 0.201 ms Execution Time: 84.303 ms And here is the plan with index on (a,b). Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1) Output: a, b, d Buffers: shared hit=613 -> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0 loops=1) Output: a, b, d Index Cond: ((t.a > 1000000) AND (t.b = 4)) Buffers: shared hit=613 Planning: Buffers: shared hit=41 Planning Time: 0.314 ms Execution Time: 6.910 ms Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and `(a,b)`behave exactly the same with this particular query. Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap, butonly for visibible tuples: VACUUM FREEZE t; SELECT a,b FROM t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10; Limit (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1) Output: a, b Buffers: shared hit=662 -> Index Only Scan using t_a_include_b on public.t (cost=0.42..6619.76 rows=1 width=8) (actual time=18.477..18.477rows=0 loops=1) Output: a, b Index Cond: (t.a > 1000000) Filter: (t.b = 4) Rows Removed by Filter: 197771 Heap Fetches: 0 Buffers: shared hit=662 Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in theresult. To make query plan comparable I had to force index scan on both with: SET enable_bitmapscan to off; SET enable_seqscan to off; SET max_parallel_workers_per_gather = 0; Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L Regards, Maxim
On 2/15/23 09:57, Maxim Ivanov wrote: > Hi All, > > I'd like to report what seems to be a missing optimization > opportunity or understand why it is not possible to achieve. > > TLDR; additional index column B specified in CREATE INDEX .. (A) > INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 > ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > > ... > > Here is the plan (notice high "shared hit"): > > Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1) > Output: a, b, d > Buffers: shared hit=198307 > -> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281 rows=0loops=1) > Output: a, b, d > Index Cond: (t.a > 1000000) > Filter: (t.b = 4) > Rows Removed by Filter: 197805 > Buffers: shared hit=198307 > Planning: > Buffers: shared hit=30 > Planning Time: 0.201 ms > Execution Time: 84.303 ms > Yeah. The reason for this behavior is pretty simple: 1) When matching clauses to indexes in match_clause_to_index(), we only look at key columns (nkeycolumns). We'd need to check all columns (ncolumns) and remember if the clause matched a key or included one. 2) index_getnext_slot would need to get "candidate" TIDs using conditions on keys, and then check the clauses on included columns. Seems doable, unless I'm missing some fatal issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 2/15/23 09:57, Maxim Ivanov wrote: >> TLDR; additional index column B specified in CREATE INDEX .. (A) >> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 >> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > Seems doable, unless I'm missing some fatal issue. Partly this is lack of round tuits, but there's another significant issue: there very likely are index entries corresponding to dead heap rows. Applying random user-defined quals to values found in such rows could produce semantic anomalies; for example, divide-by-zero failures even though you deleted all the rows having a zero in that column. This isn't a problem for operators found in operator families, because we trust those to not have undesirable side effects like raising data-dependent errors. But it'd be an issue if we started to apply quals that aren't index quals directly to index rows before doing the heap liveness check. (And, of course, once you've fetched the heap row there's no point in having a special path for columns available from the index.) regards, tom lane
On 2/15/23 16:18, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 2/15/23 09:57, Maxim Ivanov wrote: >>> TLDR; additional index column B specified in CREATE INDEX .. (A) >>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 >>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > >> Seems doable, unless I'm missing some fatal issue. > > Partly this is lack of round tuits, but there's another significant > issue: there very likely are index entries corresponding to dead heap > rows. Applying random user-defined quals to values found in such rows > could produce semantic anomalies; for example, divide-by-zero failures > even though you deleted all the rows having a zero in that column. > > This isn't a problem for operators found in operator families, because > we trust those to not have undesirable side effects like raising > data-dependent errors. But it'd be an issue if we started to apply > quals that aren't index quals directly to index rows before doing > the heap liveness check. (And, of course, once you've fetched the > heap row there's no point in having a special path for columns > available from the index.) Sure, but we can do the same VM check as index-only scan, right? That would save some of the I/O to fetch the heap tuple, as long as the page is all-visible and the filter eliminates the tuples. It makes the costing a bit trickier, because it needs to consider both how many pages are all-visible and selectivity of the condition. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> This isn't a problem for operators found in operator families, because > we trust those to not have undesirable side effects like raising > data-dependent errors. But it'd be an issue if we started to apply > quals that aren't index quals directly to index rows before doing > the heap liveness check. (And, of course, once you've fetched the > heap row there's no point in having a special path for columns > available from the index.) Assuming operators are pure and don't have global side effects, is it possible to ignore any error during that check? Iftuple is not visible it shouldn't matter, if it is visible then error will be reported by the same routine which does filteringnow (ExecQual?). If not, then limiting this optimization to builtin ops is something I can live with :)
Hi, I took a stab at this and implemented the trick with the VM - during index scan, we also extract the filters that only need the indexed attributes (just like in IOS). And then, during the execution we: 1) scan the index using the scan keys (as before) 2) if the heap page is all-visible, we check the new filters that can be evaluated on the index tuple 3) fetch the heap tuple and evaluate the filters This is pretty much exactly the same thing we do for IOS, so I don't see why this would be incorrect while IOS is correct. This also adds "Index Filter" to explain output, to show which filters are executed on the index tuple (at the moment the filters are a subset of "Filter"), so if the index tuple matches we'll execute them again on the heap tuple. I guess that could be fixed by having two "filter" lists, depending on whether we were able to evaluate the index filters. Most of the patch is pretty mechanical - particularly the planning part is about identifying filters that can be evaluated on the index tuple, and that code was mostly shamelessly copied from index-only scan. The matching of filters to index is done in check_index_filter(), and it's simpler than match_clause_to_indexcol() as it does not need to consider operators etc. (I think). But maybe it should be careful about other things, not sure. The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned earlier, the idea is to check VM and evaluate the filters on the index tuple if possible, similar to index-only scans. Except that we then have to fetch the heap tuple. Unfortunately, this means the code can't use index_getnext_slot() anymore. Perhaps we should invent a new variant that'd allow evaluating the index filters in between. With the patch applied, the query plan changes from: QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..10929.89 rows=1 width=12) (actual time=94.649..94.653 rows=0 loops=1) Buffers: shared hit=197575 read=661 -> Index Scan using t_a_include_b on t (cost=0.42..10929.89 rows=1 width=12) (actual time=94.646..94.647 rows=0 loops=1) Index Cond: (a > 1000000) Filter: (b = 4) Rows Removed by Filter: 197780 Buffers: shared hit=197575 read=661 Planning Time: 0.091 ms Execution Time: 94.674 ms (9 rows) to QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..3662.15 rows=1 width=12) (actual time=13.663..13.667 rows=0 loops=1) Buffers: shared hit=544 -> Index Scan using t_a_include_b on t (cost=0.42..3662.15 rows=1 width=12) (actual time=13.659..13.660 rows=0 loops=1) Index Cond: (a > 1000000) Index Filter: (b = 4) Rows Removed by Index Recheck: 197780 Filter: (b = 4) Buffers: shared hit=544 Planning Time: 0.105 ms Execution Time: 13.690 ms (10 rows) which is much closer to the "best" case: QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..4155.90 rows=1 width=12) (actual time=10.152..10.156 rows=0 loops=1) Buffers: shared read=543 -> Index Scan using t_a_b_idx on t (cost=0.42..4155.90 rows=1 width=12) (actual time=10.148..10.150 rows=0 loops=1) Index Cond: ((a > 1000000) AND (b = 4)) Buffers: shared read=543 Planning Time: 0.089 ms Execution Time: 10.176 ms (7 rows) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hello, I've cc'd Jeff Davis on this due to a conversation we had at PGCon about applying filters on index tuples during index scans. I've also cc'd Andres Freund because I think this relates to his musing in [1] that: > One thing I have been wondering around this is whether we should not have > split the code for IOS and plain indexscans... I think I remember Peter Geoghegan also wondering (I can't remember if this was in conversation at PGCon about index skip scans or in a hackers thread) about how we compose these various index scan optimizations. To be certain this is probably a thing to tackle as a follow-on to this patch, but it does seem to me that what we are implicitly realizing is that (unlike with bitmap scans, I think) it doesn't really make a lot of conceptual sense to have index only scans be a separate node from index scans. Instead it's likely better to consider it an optimization to index scans that can dynamically kick in when it's able to be of use. That would allow it to compose with e.g. prefetching in the aforelinked thread. At the very least we would need pragmatic (e.g., cost of dynamically applying optimizations) rather than conceptual reasons to argue they should continue to be separate. Apologies for that lengthy preamble; on to the patch under discussion: On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I took a stab at this and implemented the trick with the VM - during > index scan, we also extract the filters that only need the indexed > attributes (just like in IOS). And then, during the execution we: > > 1) scan the index using the scan keys (as before) > > 2) if the heap page is all-visible, we check the new filters that can > be evaluated on the index tuple > > 3) fetch the heap tuple and evaluate the filters Thanks for working on this; I'm excited about this class of work (along with index prefetching and other ideas I think there's a lot of potential for improving index scans). > This is pretty much exactly the same thing we do for IOS, so I don't see > why this would be incorrect while IOS is correct. > > This also adds "Index Filter" to explain output, to show which filters > are executed on the index tuple (at the moment the filters are a subset > of "Filter"), so if the index tuple matches we'll execute them again on > the heap tuple. I guess that could be fixed by having two "filter" > lists, depending on whether we were able to evaluate the index filters. Given that we show index filters and heap filters separately it seems like we might want to maintain separate instrumentation counts of how many tuple were filtered by each set of filters. > Most of the patch is pretty mechanical - particularly the planning part > is about identifying filters that can be evaluated on the index tuple, > and that code was mostly shamelessly copied from index-only scan. > > The matching of filters to index is done in check_index_filter(), and > it's simpler than match_clause_to_indexcol() as it does not need to > consider operators etc. (I think). But maybe it should be careful about > other things, not sure. This would end up requiring some refactoring of the existing index matching code (or alternative caching on IndexOptInfo), but match_filter_to_index() calling check_index_filter() results in constructs a bitmapset of index columns for every possible filter which seems wasteful (I recognize this is a bit of a proof-of-concept level v1). > The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned > earlier, the idea is to check VM and evaluate the filters on the index > tuple if possible, similar to index-only scans. Except that we then have > to fetch the heap tuple. Unfortunately, this means the code can't use > index_getnext_slot() anymore. Perhaps we should invent a new variant > that'd allow evaluating the index filters in between. It does seem there are some refactoring opportunities there. > With the patch applied, the query plan changes from: > > ... > > to > > QUERY PLAN > ------------------------------------------------------------------- > Limit (cost=0.42..3662.15 rows=1 width=12) > (actual time=13.663..13.667 rows=0 loops=1) > Buffers: shared hit=544 > -> Index Scan using t_a_include_b on t > (cost=0.42..3662.15 rows=1 width=12) > (actual time=13.659..13.660 rows=0 loops=1) > Index Cond: (a > 1000000) > Index Filter: (b = 4) > Rows Removed by Index Recheck: 197780 > Filter: (b = 4) > Buffers: shared hit=544 > Planning Time: 0.105 ms > Execution Time: 13.690 ms > (10 rows) > > ... I did also confirm that this properly identifies cases Jeff had mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10) = 4))". I noticed also you still had questions/TODOs about handling index scans for join clauses. Regards, James Coleman 1: https://www.postgresql.org/message-id/20230609000600.syqy447e6metnvyj%40awork3.anarazel.de
On 6/21/23 14:45, James Coleman wrote: > Hello, > > I've cc'd Jeff Davis on this due to a conversation we had at PGCon > about applying filters on index tuples during index scans. > > I've also cc'd Andres Freund because I think this relates to his > musing in [1] that: >> One thing I have been wondering around this is whether we should not have >> split the code for IOS and plain indexscans... > > I think I remember Peter Geoghegan also wondering (I can't remember if > this was in conversation at PGCon about index skip scans or in a > hackers thread) about how we compose these various index scan > optimizations. > > To be certain this is probably a thing to tackle as a follow-on to > this patch, but it does seem to me that what we are implicitly > realizing is that (unlike with bitmap scans, I think) it doesn't > really make a lot of conceptual sense to have index only scans be a > separate node from index scans. Instead it's likely better to consider > it an optimization to index scans that can dynamically kick in when > it's able to be of use. That would allow it to compose with e.g. > prefetching in the aforelinked thread. At the very least we would need > pragmatic (e.g., cost of dynamically applying optimizations) rather > than conceptual reasons to argue they should continue to be separate. > I agree it seems a bit weird to have IOS as a separate node. In a way, I think there are two dimensions for "index-only" scans - which pages can be scanned like that, and which clauses can be evaluated with only the index tuple. The current approach focuses on page visibility, but ignores the other aspect entirely. Or more precisely, it disables IOS entirely as soon as there's a single condition requiring heap tuple. I agree it's probably better to see this as a single node with various optimizations that can be applied when possible / efficient (based on planner and/or dynamically). I'm not sure I see a direct link to the prefetching patch, but it's true that needs to deal with tids (instead of slots), just like IOS. So if the node worked with tids, maybe the prefetching could be done at that level (which I now realize may be what Andres meant by doing prefetching in the executor). > Apologies for that lengthy preamble; on to the patch under discussion: > > On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> Hi, >> >> I took a stab at this and implemented the trick with the VM - during >> index scan, we also extract the filters that only need the indexed >> attributes (just like in IOS). And then, during the execution we: >> >> 1) scan the index using the scan keys (as before) >> >> 2) if the heap page is all-visible, we check the new filters that can >> be evaluated on the index tuple >> >> 3) fetch the heap tuple and evaluate the filters > > Thanks for working on this; I'm excited about this class of work > (along with index prefetching and other ideas I think there's a lot of > potential for improving index scans). > >> This is pretty much exactly the same thing we do for IOS, so I don't see >> why this would be incorrect while IOS is correct. >> >> This also adds "Index Filter" to explain output, to show which filters >> are executed on the index tuple (at the moment the filters are a subset >> of "Filter"), so if the index tuple matches we'll execute them again on >> the heap tuple. I guess that could be fixed by having two "filter" >> lists, depending on whether we were able to evaluate the index filters. > > Given that we show index filters and heap filters separately it seems > like we might want to maintain separate instrumentation counts of how > many tuple were filtered by each set of filters. > Yeah, separate instrumentation counters would be useful. What I was talking about was more about the conditions itself, because right now we re-evaluate the index-only clauses on the heap tuple. Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2). the patch splits this into two lists: index-only clauses: (a=1) clauses: (a=1) AND (b=1) So we evaluate (a=1) first, and then we fetch the heap tuple and check "clauses" again, which however includes the (a=1) again. For cheap clauses (or when (a=1) eliminated a lot of tuples using just the index), but for expensive clauses it might hurt. It's fixable, we'd just need to keep two versions of the "clauses" list, one for IOS mode (when index-only clauses were checked) and a complete one when we need to check all clauses. >> Most of the patch is pretty mechanical - particularly the planning part >> is about identifying filters that can be evaluated on the index tuple, >> and that code was mostly shamelessly copied from index-only scan. >> >> The matching of filters to index is done in check_index_filter(), and >> it's simpler than match_clause_to_indexcol() as it does not need to >> consider operators etc. (I think). But maybe it should be careful about >> other things, not sure. > > This would end up requiring some refactoring of the existing index > matching code (or alternative caching on IndexOptInfo), but > match_filter_to_index() calling check_index_filter() results in > constructs a bitmapset of index columns for every possible filter > which seems wasteful (I recognize this is a bit of a proof-of-concept > level v1). > Probably, I'm sure there's a couple other places where the current API was a bit cumbersome and we could optimize. >> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned >> earlier, the idea is to check VM and evaluate the filters on the index >> tuple if possible, similar to index-only scans. Except that we then have >> to fetch the heap tuple. Unfortunately, this means the code can't use >> index_getnext_slot() anymore. Perhaps we should invent a new variant >> that'd allow evaluating the index filters in between. > > It does seem there are some refactoring opportunities there. > Actually, I realized maybe we should switch this to index_getnext_tid() because of the prefetching patch. That would allow us to introduce a "buffer" of TIDs, populated by the index_getnext_tid(), and then do prefetching based on that. It's similar to what bitmap scans do, except that intead of the tbm iterator we get items from index_getnext_tid(). I haven't tried implementing this yet, but I kinda like the idea as it works no matter what exactly the AM does (i.e. it'd work even for cases like GiST with distance searches). >> With the patch applied, the query plan changes from: >> >> ... >> >> to >> >> QUERY PLAN >> ------------------------------------------------------------------- >> Limit (cost=0.42..3662.15 rows=1 width=12) >> (actual time=13.663..13.667 rows=0 loops=1) >> Buffers: shared hit=544 >> -> Index Scan using t_a_include_b on t >> (cost=0.42..3662.15 rows=1 width=12) >> (actual time=13.659..13.660 rows=0 loops=1) >> Index Cond: (a > 1000000) >> Index Filter: (b = 4) >> Rows Removed by Index Recheck: 197780 >> Filter: (b = 4) >> Buffers: shared hit=544 >> Planning Time: 0.105 ms >> Execution Time: 13.690 ms >> (10 rows) >> >> ... > > I did also confirm that this properly identifies cases Jeff had > mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10) > = 4))". > Good! > I noticed also you still had questions/TODOs about handling index > scans for join clauses. > Not sure which questions/TODOs you refer to, but I don't recall any issues with join clauses. But maybe I just forgot. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 21, 2023 at 11:28 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 6/21/23 14:45, James Coleman wrote: > > Hello, > > > > I've cc'd Jeff Davis on this due to a conversation we had at PGCon > > about applying filters on index tuples during index scans. > > > > I've also cc'd Andres Freund because I think this relates to his > > musing in [1] that: > >> One thing I have been wondering around this is whether we should not have > >> split the code for IOS and plain indexscans... > > > > I think I remember Peter Geoghegan also wondering (I can't remember if > > this was in conversation at PGCon about index skip scans or in a > > hackers thread) about how we compose these various index scan > > optimizations. > > > > To be certain this is probably a thing to tackle as a follow-on to > > this patch, but it does seem to me that what we are implicitly > > realizing is that (unlike with bitmap scans, I think) it doesn't > > really make a lot of conceptual sense to have index only scans be a > > separate node from index scans. Instead it's likely better to consider > > it an optimization to index scans that can dynamically kick in when > > it's able to be of use. That would allow it to compose with e.g. > > prefetching in the aforelinked thread. At the very least we would need > > pragmatic (e.g., cost of dynamically applying optimizations) rather > > than conceptual reasons to argue they should continue to be separate. > > > > I agree it seems a bit weird to have IOS as a separate node. In a way, I > think there are two dimensions for "index-only" scans - which pages can > be scanned like that, and which clauses can be evaluated with only the > index tuple. The current approach focuses on page visibility, but > ignores the other aspect entirely. Or more precisely, it disables IOS > entirely as soon as there's a single condition requiring heap tuple. > > I agree it's probably better to see this as a single node with various > optimizations that can be applied when possible / efficient (based on > planner and/or dynamically). > > I'm not sure I see a direct link to the prefetching patch, but it's true > that needs to deal with tids (instead of slots), just like IOS. So if > the node worked with tids, maybe the prefetching could be done at that > level (which I now realize may be what Andres meant by doing prefetching > in the executor). The link to prefetching is that IOS (as a separate node) won't benefit from prefetching (I think) with your current prefetching patch (in the case where the VM doesn't allow us to just use the index tuple), whereas if the nodes were combined that would more naturally be composable. > > Apologies for that lengthy preamble; on to the patch under discussion: > > > > On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> Hi, > >> > >> I took a stab at this and implemented the trick with the VM - during > >> index scan, we also extract the filters that only need the indexed > >> attributes (just like in IOS). And then, during the execution we: > >> > >> 1) scan the index using the scan keys (as before) > >> > >> 2) if the heap page is all-visible, we check the new filters that can > >> be evaluated on the index tuple > >> > >> 3) fetch the heap tuple and evaluate the filters > > > > Thanks for working on this; I'm excited about this class of work > > (along with index prefetching and other ideas I think there's a lot of > > potential for improving index scans). > > > >> This is pretty much exactly the same thing we do for IOS, so I don't see > >> why this would be incorrect while IOS is correct. > >> > >> This also adds "Index Filter" to explain output, to show which filters > >> are executed on the index tuple (at the moment the filters are a subset > >> of "Filter"), so if the index tuple matches we'll execute them again on > >> the heap tuple. I guess that could be fixed by having two "filter" > >> lists, depending on whether we were able to evaluate the index filters. > > > > Given that we show index filters and heap filters separately it seems > > like we might want to maintain separate instrumentation counts of how > > many tuple were filtered by each set of filters. > > > > Yeah, separate instrumentation counters would be useful. What I was > talking about was more about the conditions itself, because right now we > re-evaluate the index-only clauses on the heap tuple. > > Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2). > the patch splits this into two lists: > > index-only clauses: (a=1) > clauses: (a=1) AND (b=1) > > So we evaluate (a=1) first, and then we fetch the heap tuple and check > "clauses" again, which however includes the (a=1) again. For cheap > clauses (or when (a=1) eliminated a lot of tuples using just the index), > but for expensive clauses it might hurt. > > It's fixable, we'd just need to keep two versions of the "clauses" list, > one for IOS mode (when index-only clauses were checked) and a complete > one when we need to check all clauses. In some cases (where the VM doesn't allow us to use just the index tuple) we'd have to execute both lists against the heap tuple, right? > >> Most of the patch is pretty mechanical - particularly the planning part > >> is about identifying filters that can be evaluated on the index tuple, > >> and that code was mostly shamelessly copied from index-only scan. > >> > >> The matching of filters to index is done in check_index_filter(), and > >> it's simpler than match_clause_to_indexcol() as it does not need to > >> consider operators etc. (I think). But maybe it should be careful about > >> other things, not sure. > > > > This would end up requiring some refactoring of the existing index > > matching code (or alternative caching on IndexOptInfo), but > > match_filter_to_index() calling check_index_filter() results in > > constructs a bitmapset of index columns for every possible filter > > which seems wasteful (I recognize this is a bit of a proof-of-concept > > level v1). > > > > Probably, I'm sure there's a couple other places where the current API > was a bit cumbersome and we could optimize. > > >> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned > >> earlier, the idea is to check VM and evaluate the filters on the index > >> tuple if possible, similar to index-only scans. Except that we then have > >> to fetch the heap tuple. Unfortunately, this means the code can't use > >> index_getnext_slot() anymore. Perhaps we should invent a new variant > >> that'd allow evaluating the index filters in between. > > > > It does seem there are some refactoring opportunities there. > > > > Actually, I realized maybe we should switch this to index_getnext_tid() > because of the prefetching patch. That would allow us to introduce a > "buffer" of TIDs, populated by the index_getnext_tid(), and then do > prefetching based on that. It's similar to what bitmap scans do, except > that intead of the tbm iterator we get items from index_getnext_tid(). > > I haven't tried implementing this yet, but I kinda like the idea as it > works no matter what exactly the AM does (i.e. it'd work even for cases > like GiST with distance searches). Oh, interesting, I'll let you keep chewing on that then. > >> With the patch applied, the query plan changes from: > >> > >> ... > >> > >> to > >> > >> QUERY PLAN > >> ------------------------------------------------------------------- > >> Limit (cost=0.42..3662.15 rows=1 width=12) > >> (actual time=13.663..13.667 rows=0 loops=1) > >> Buffers: shared hit=544 > >> -> Index Scan using t_a_include_b on t > >> (cost=0.42..3662.15 rows=1 width=12) > >> (actual time=13.659..13.660 rows=0 loops=1) > >> Index Cond: (a > 1000000) > >> Index Filter: (b = 4) > >> Rows Removed by Index Recheck: 197780 > >> Filter: (b = 4) > >> Buffers: shared hit=544 > >> Planning Time: 0.105 ms > >> Execution Time: 13.690 ms > >> (10 rows) > >> > >> ... > > > > I did also confirm that this properly identifies cases Jeff had > > mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10) > > = 4))". > > > > Good! > > > I noticed also you still had questions/TODOs about handling index > > scans for join clauses. > > > > Not sure which questions/TODOs you refer to, but I don't recall any > issues with join clauses. But maybe I just forgot. I was referring to the comment: + * FIXME Maybe this should fill the filterset too? above match_eclass_clauses_to_index()'s definition. Regards, James
On 6/21/23 18:17, James Coleman wrote: > On Wed, Jun 21, 2023 at 11:28 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> >> >> On 6/21/23 14:45, James Coleman wrote: >>> Hello, >>> >>> I've cc'd Jeff Davis on this due to a conversation we had at PGCon >>> about applying filters on index tuples during index scans. >>> >>> I've also cc'd Andres Freund because I think this relates to his >>> musing in [1] that: >>>> One thing I have been wondering around this is whether we should not have >>>> split the code for IOS and plain indexscans... >>> >>> I think I remember Peter Geoghegan also wondering (I can't remember if >>> this was in conversation at PGCon about index skip scans or in a >>> hackers thread) about how we compose these various index scan >>> optimizations. >>> >>> To be certain this is probably a thing to tackle as a follow-on to >>> this patch, but it does seem to me that what we are implicitly >>> realizing is that (unlike with bitmap scans, I think) it doesn't >>> really make a lot of conceptual sense to have index only scans be a >>> separate node from index scans. Instead it's likely better to consider >>> it an optimization to index scans that can dynamically kick in when >>> it's able to be of use. That would allow it to compose with e.g. >>> prefetching in the aforelinked thread. At the very least we would need >>> pragmatic (e.g., cost of dynamically applying optimizations) rather >>> than conceptual reasons to argue they should continue to be separate. >>> >> >> I agree it seems a bit weird to have IOS as a separate node. In a way, I >> think there are two dimensions for "index-only" scans - which pages can >> be scanned like that, and which clauses can be evaluated with only the >> index tuple. The current approach focuses on page visibility, but >> ignores the other aspect entirely. Or more precisely, it disables IOS >> entirely as soon as there's a single condition requiring heap tuple. >> >> I agree it's probably better to see this as a single node with various >> optimizations that can be applied when possible / efficient (based on >> planner and/or dynamically). >> >> I'm not sure I see a direct link to the prefetching patch, but it's true >> that needs to deal with tids (instead of slots), just like IOS. So if >> the node worked with tids, maybe the prefetching could be done at that >> level (which I now realize may be what Andres meant by doing prefetching >> in the executor). > > The link to prefetching is that IOS (as a separate node) won't benefit > from prefetching (I think) with your current prefetching patch (in the > case where the VM doesn't allow us to just use the index tuple), > whereas if the nodes were combined that would more naturally be > composable. > Yeah, mostly. Although just unifying "regular" indexscans and IOS would not allow prefetching for IOS. The reason why the current design does not allow doing prefetching for IOS is that the prefetching happens deep in indexam.c, and down there we don't know which TIDs are not from all-visible pages and would need prefetching. Which is another good reason to do the prefetching at the executor level, I believe. >>> Apologies for that lengthy preamble; on to the patch under discussion: >>> >>> On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Hi, >>>> >>>> I took a stab at this and implemented the trick with the VM - during >>>> index scan, we also extract the filters that only need the indexed >>>> attributes (just like in IOS). And then, during the execution we: >>>> >>>> 1) scan the index using the scan keys (as before) >>>> >>>> 2) if the heap page is all-visible, we check the new filters that can >>>> be evaluated on the index tuple >>>> >>>> 3) fetch the heap tuple and evaluate the filters >>> >>> Thanks for working on this; I'm excited about this class of work >>> (along with index prefetching and other ideas I think there's a lot of >>> potential for improving index scans). >>> >>>> This is pretty much exactly the same thing we do for IOS, so I don't see >>>> why this would be incorrect while IOS is correct. >>>> >>>> This also adds "Index Filter" to explain output, to show which filters >>>> are executed on the index tuple (at the moment the filters are a subset >>>> of "Filter"), so if the index tuple matches we'll execute them again on >>>> the heap tuple. I guess that could be fixed by having two "filter" >>>> lists, depending on whether we were able to evaluate the index filters. >>> >>> Given that we show index filters and heap filters separately it seems >>> like we might want to maintain separate instrumentation counts of how >>> many tuple were filtered by each set of filters. >>> >> >> Yeah, separate instrumentation counters would be useful. What I was >> talking about was more about the conditions itself, because right now we >> re-evaluate the index-only clauses on the heap tuple. >> >> Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2). >> the patch splits this into two lists: >> >> index-only clauses: (a=1) >> clauses: (a=1) AND (b=1) >> >> So we evaluate (a=1) first, and then we fetch the heap tuple and check >> "clauses" again, which however includes the (a=1) again. For cheap >> clauses (or when (a=1) eliminated a lot of tuples using just the index), >> but for expensive clauses it might hurt. >> >> It's fixable, we'd just need to keep two versions of the "clauses" list, >> one for IOS mode (when index-only clauses were checked) and a complete >> one when we need to check all clauses. > > In some cases (where the VM doesn't allow us to use just the index > tuple) we'd have to execute both lists against the heap tuple, right? > Not quite. I suspect you imagine we'd have two lists L1: (a=1) L2: (b=1) and that for all-visible pages we'd check L1 on index, and then maybe L2 on heap. And for non-all-visible pages we'd check both on heap. But that doesn't work, because L1 has references to attnums in the index tuple, while L2 has attnums to heap. So we'd need L1: (a=1) -> against index tuple L2: (b=1) -> against heap tuple L3: (a=1) AND (b=1) -> against heap tuple And for non-all-visible pages we'd only use L3. (I wonder if we could check if the tuple is visible and then go back and check L1 on index tuple, but I doubt it'd be really more efficient.) >>>> Most of the patch is pretty mechanical - particularly the planning part >>>> is about identifying filters that can be evaluated on the index tuple, >>>> and that code was mostly shamelessly copied from index-only scan. >>>> >>>> The matching of filters to index is done in check_index_filter(), and >>>> it's simpler than match_clause_to_indexcol() as it does not need to >>>> consider operators etc. (I think). But maybe it should be careful about >>>> other things, not sure. >>> >>> This would end up requiring some refactoring of the existing index >>> matching code (or alternative caching on IndexOptInfo), but >>> match_filter_to_index() calling check_index_filter() results in >>> constructs a bitmapset of index columns for every possible filter >>> which seems wasteful (I recognize this is a bit of a proof-of-concept >>> level v1). >>> >> >> Probably, I'm sure there's a couple other places where the current API >> was a bit cumbersome and we could optimize. >> >>>> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned >>>> earlier, the idea is to check VM and evaluate the filters on the index >>>> tuple if possible, similar to index-only scans. Except that we then have >>>> to fetch the heap tuple. Unfortunately, this means the code can't use >>>> index_getnext_slot() anymore. Perhaps we should invent a new variant >>>> that'd allow evaluating the index filters in between. >>> >>> It does seem there are some refactoring opportunities there. >>> >> >> Actually, I realized maybe we should switch this to index_getnext_tid() >> because of the prefetching patch. That would allow us to introduce a >> "buffer" of TIDs, populated by the index_getnext_tid(), and then do >> prefetching based on that. It's similar to what bitmap scans do, except >> that intead of the tbm iterator we get items from index_getnext_tid(). >> >> I haven't tried implementing this yet, but I kinda like the idea as it >> works no matter what exactly the AM does (i.e. it'd work even for cases >> like GiST with distance searches). > > Oh, interesting, I'll let you keep chewing on that then. > Cool! >>>> With the patch applied, the query plan changes from: >>>> >>>> ... >>>> >>>> to >>>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> Limit (cost=0.42..3662.15 rows=1 width=12) >>>> (actual time=13.663..13.667 rows=0 loops=1) >>>> Buffers: shared hit=544 >>>> -> Index Scan using t_a_include_b on t >>>> (cost=0.42..3662.15 rows=1 width=12) >>>> (actual time=13.659..13.660 rows=0 loops=1) >>>> Index Cond: (a > 1000000) >>>> Index Filter: (b = 4) >>>> Rows Removed by Index Recheck: 197780 >>>> Filter: (b = 4) >>>> Buffers: shared hit=544 >>>> Planning Time: 0.105 ms >>>> Execution Time: 13.690 ms >>>> (10 rows) >>>> >>>> ... >>> >>> I did also confirm that this properly identifies cases Jeff had >>> mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10) >>> = 4))". >>> >> >> Good! >> >>> I noticed also you still had questions/TODOs about handling index >>> scans for join clauses. >>> >> >> Not sure which questions/TODOs you refer to, but I don't recall any >> issues with join clauses. But maybe I just forgot. > > I was referring to the comment: > > + * FIXME Maybe this should fill the filterset too? > > above match_eclass_clauses_to_index()'s definition. > Ah, yeah. I'm sure there are some loose ends in the matching code. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, here's a minor update of the patch, rebased to a current master and addressing a couple issues reported by cfbot. Most are minor tweaks, but the last one (4) is a somewhat more serious issue. 1) "tid" might have not been initialized in the IndexNext loop 2) add enable_indexonlyfilter GUC to postgresql.conf.sample (which is checked by one regression test) 3) accepts a couple plan changes, either switching to index scan (thanks to the costing changes) or showing the extra index-only filters in the explain output. The plan changes seem reasonable. 4) problems with opcintype != opckeytype (name_ops) While running the tests, I ran into an issue with name_ops, causing failures for \dT and other catalog queries. The root cause is that name_ops has opcintype = name, but opckeytype = cstring. The index-only clauses are copied from the table, with Vars mutated to reference the INDEX_VAR. But the type is not, so when we get to evaluating the expressions, CheckVarSlotCompatibility() fails because the Var has name, but the iss_IndexSlot (created with index tuple descriptor) has cstring. The rebased patch fixes this by explicitly adjusting types of the descriptor in ExecInitIndexScan(). However, maybe this indicates the very idea of evaluating expressions using slot with index tuple descriptor is misguided. This made me look at regular index-only scan (nodeIndexonlyscan.c), and that uses a slot with the "table" structure, and instead of evaluating the expression on the index index tuple it expands the index tuple into the table slot. Which is what StoreIndexTuple() does. So maybe this should do what IOS does - expand the index tuple into "table slot" and evaluate the expression on that. That'd also make the INDEX_VAR tweak in createplan.c unnecessary - in fact, that seemed a bit strange anyway, so ditching fix_indexfilter_mutator would be good. However, I wonder if the stuff StoreIndexTuple() is doing is actually safe. I mean, it's essentially copying values from the index tuple into the slot, ignoring the type difference. What if opcintype and opckeytype are not binary compatible? Is it possible to define an opclass with such opckeytype? I haven't notice any check enforcing such compatibility ... Also, it's a bit confusing the SGML docs say opckeytype is not supported for btree, but name_ops clearly does that. Later I found it's actually mentioned in pg_opclass.dat as a hack, to save space in catalogs. But then btree also has amstorage=false ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 7/15/23 16:20, Tomas Vondra wrote: > > ... > > 4) problems with opcintype != opckeytype (name_ops) > > While running the tests, I ran into an issue with name_ops, causing > failures for \dT and other catalog queries. The root cause is that > name_ops has opcintype = name, but opckeytype = cstring. The index-only > clauses are copied from the table, with Vars mutated to reference the > INDEX_VAR. But the type is not, so when we get to evaluating the > expressions, CheckVarSlotCompatibility() fails because the Var has name, > but the iss_IndexSlot (created with index tuple descriptor) has cstring. > > The rebased patch fixes this by explicitly adjusting types of the > descriptor in ExecInitIndexScan(). > > However, maybe this indicates the very idea of evaluating expressions > using slot with index tuple descriptor is misguided. This made me look > at regular index-only scan (nodeIndexonlyscan.c), and that uses a slot > with the "table" structure, and instead of evaluating the expression on > the index index tuple it expands the index tuple into the table slot. > Which is what StoreIndexTuple() does. > > So maybe this should do what IOS does - expand the index tuple into > "table slot" and evaluate the expression on that. That'd also make the > INDEX_VAR tweak in createplan.c unnecessary - in fact, that seemed a bit > strange anyway, so ditching fix_indexfilter_mutator would be good. > This kept bothering me, so I looked at it today, and reworked it to use the IOS approach. It's a bit more complicated because for IOS both slots have the same overall structure, except for the data types. But for regular index scans that's not the case - the code has to "expand" the index tuple into the larger "table slot". This works, and in general I think the result is much cleaner - in particular, it means we don't need to switch the Var nodes to reference the INDEX_VAR. While experimenting with this I realized again that we're not matching expressions to IOS. So if you have an expression index on (a+b), that can't be used even if the query only uses this particular expression. The same limitation applies to index-only filters, of course. It's not the fault of this patch, but perhaps it'd be an interesting improvement. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, On Sun, 2023-07-16 at 22:36 +0200, Tomas Vondra wrote: > This kept bothering me, so I looked at it today, and reworked it to > use > the IOS approach. Initial comments on patch 20230716: * check_index_filter() alredy looks at "canreturn", which should mean that you don't need to later check for opcintype<>opckeytype. But there's a comment in IndexNext() indicating that's a problem -- under what conditions is it a problem? * (may be a matter of taste) Recomputing the bitmapset from the canreturn array in check_index_filter() for each call seems awkward. I would just iterate through the bitmapset and check that all are set true in the amcanreturn array. * There are some tiny functions that don't seem to add much value or have slightly weird APIs. For instance, match_filter_to_index() could probably just return a boolean, and maybe doesn't even need to exist because it's such a thin wrapper over check_index_filter(). Similarly for fix_indexfilter_clause(). I'm OK with tiny functions even if the only value is a comment, but I didn't find these particularly helpful. * fix_indexfilter_references() could use a better comment. Perhaps refactor so that you can share code with fix_indexqual_references()? * it looks like index filters are duplicated with ordinary filters, is there a reason for that? * I'm confused about the relationship of an IOS to an index filter. It seems like the index filter only works for an ordinary index scan? Why is that? Regards, Jeff Davis
On 7/18/23 22:21, Jeff Davis wrote: > Hi, > > > On Sun, 2023-07-16 at 22:36 +0200, Tomas Vondra wrote: >> This kept bothering me, so I looked at it today, and reworked it to >> use >> the IOS approach. > > Initial comments on patch 20230716: > > * check_index_filter() alredy looks at "canreturn", which should mean > that you don't need to later check for opcintype<>opckeytype. But > there's a comment in IndexNext() indicating that's a problem -- under > what conditions is it a problem? > The comment in IndexNext() is a bit obsolete. There was an issue when using a slot matching the index, because then StoreIndexTuple might fail because of type mismatch (as explained in [1]). But that's no longer an issue, thanks to switching to the table slot in the last patch version. > * (may be a matter of taste) Recomputing the bitmapset from the > canreturn array in check_index_filter() for each call seems awkward. I > would just iterate through the bitmapset and check that all are set > true in the amcanreturn array. > check_index_filter() is a simplified version of check_index_only(), and that calculates the bitmap this way. > * There are some tiny functions that don't seem to add much value or > have slightly weird APIs. For instance, match_filter_to_index() could > probably just return a boolean, and maybe doesn't even need to exist > because it's such a thin wrapper over check_index_filter(). Similarly > for fix_indexfilter_clause(). I'm OK with tiny functions even if the > only value is a comment, but I didn't find these particularly helpful. > Yes, I agree some of this could be simplified. I only did the bare minimum to get this bit working. > * fix_indexfilter_references() could use a better comment. Perhaps > refactor so that you can share code with fix_indexqual_references()? > I don't think this can share code with fix_indexqual_references(), because that changes the Var nodes to point to the index (because it then gets translated to scan keys). The filters don't need that. > * it looks like index filters are duplicated with ordinary filters, is > there a reason for that? > Good point. That used to be necessary, because the index-only filters can be evaluated only on all-visible pages, and filters were had Vars referencing the index tuple. We'd have to maintain another list of clauses, which didn't seem worth it. But now that the filters reference the heap tuple, we could not include them into the second list. > * I'm confused about the relationship of an IOS to an index filter. It > seems like the index filter only works for an ordinary index scan? Why > is that? What would it do for IOS? IOS evaluates all filters on the index tuple, and it does not need the heap tuple at all (assuming allvisible=true). Index-only filters try to do something like that even for regular index scans, by evaluating as many expression on the index tuple, but may still require fetching the heap tuple in the end. regards [1] https://www.postgresql.org/message-id/97985ef2-ef9b-e62e-6fd4-e00a573d4ead@enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2023-07-19 at 00:36 +0200, Tomas Vondra wrote: > > * I'm confused about the relationship of an IOS to an index filter. > > It > > seems like the index filter only works for an ordinary index scan? > > Why > > is that? > > What would it do for IOS? The way it's presented is slightly confusing. If you have table x with and index on column i, then: EXPLAIN (ANALYZE, BUFFERS) SELECT i, j FROM x WHERE i = 7 and (i % 1000 = 7); Index Scan using x_idx on x (cost=0.42..8.45 rows=1 width=8) (actual time=0.094..0.098 rows=1 loops=1) Index Cond: (i = 7) Index Filter: ((i % 1000) = 7) But if you remove "j" from the target list, you get: EXPLAIN (ANALYZE, BUFFERS) SELECT i FROM x WHERE i = 7 and (i % 1000 = 7); Index Only Scan using x_idx on x (cost=0.42..4.45 rows=1 width=4) (actual time=0.085..0.088 rows=1 loops=1) Index Cond: (i = 7) Filter: ((i % 1000) = 7) The confused me at first because the "Filter" in the second plan means the same thing as the "Index Filter" in the first plan. Should we call the filter in an IOS an "Index Filter" too? Or is that redundant? Regards, Jeff Davis
On 7/19/23 01:22, Jeff Davis wrote: > On Wed, 2023-07-19 at 00:36 +0200, Tomas Vondra wrote: >>> * I'm confused about the relationship of an IOS to an index filter. >>> It >>> seems like the index filter only works for an ordinary index scan? >>> Why >>> is that? >> >> What would it do for IOS? > > The way it's presented is slightly confusing. If you have table x with > and index on column i, then: > > EXPLAIN (ANALYZE, BUFFERS) > SELECT i, j FROM x WHERE i = 7 and (i % 1000 = 7); > > Index Scan using x_idx on x (cost=0.42..8.45 rows=1 width=8) > (actual time=0.094..0.098 rows=1 loops=1) > Index Cond: (i = 7) > Index Filter: ((i % 1000) = 7) > > But if you remove "j" from the target list, you get: > > EXPLAIN (ANALYZE, BUFFERS) > SELECT i FROM x WHERE i = 7 and (i % 1000 = 7); > > Index Only Scan using x_idx on x (cost=0.42..4.45 rows=1 width=4) > (actual time=0.085..0.088 rows=1 loops=1) > Index Cond: (i = 7) > Filter: ((i % 1000) = 7) > > The confused me at first because the "Filter" in the second plan means > the same thing as the "Index Filter" in the first plan. Should we call > the filter in an IOS an "Index Filter" too? Or is that redundant? > I agree the naming in explain is a bit confusing. I wonder if Andres was right (in the index prefetch thread) that splitting regular index scans and index-only scans may not be ideal. In a way, this patch moves those nodes closer, both in capability and code (because now both use index_getnext_tid etc.). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote: > I wonder if Andres was right (in the index prefetch thread) that > splitting regular index scans and index-only scans may not be ideal. > In > a way, this patch moves those nodes closer, both in capability and > code > (because now both use index_getnext_tid etc.). Yeah. I could also imagine decomposing the index scan node into more pieces, but I don't think it would work out to be a clean data flow. Either way, probably out of scope for this patch. For this patch I think we should just tweak the EXPLAIN output so that it's a little more clear what parts are index-only (at least if VM bit is set) and what parts need to go to the heap. Regards, Jeff Davis
On 7/19/23 19:17, Jeff Davis wrote: > On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote: >> I wonder if Andres was right (in the index prefetch thread) that >> splitting regular index scans and index-only scans may not be ideal. >> In >> a way, this patch moves those nodes closer, both in capability and >> code >> (because now both use index_getnext_tid etc.). > > Yeah. I could also imagine decomposing the index scan node into more > pieces, but I don't think it would work out to be a clean data flow. > Either way, probably out of scope for this patch. > OK > For this patch I think we should just tweak the EXPLAIN output so that > it's a little more clear what parts are index-only (at least if VM bit > is set) and what parts need to go to the heap. > Makes sense, I also need to think about maybe not having duplicate clauses in the two lists. What annoys me on that it partially prevents the cost-based reordering done by order_qual_clauses(). So maybe we should have three lists ... Also, some of the expressions count be fairly expensive. BTW could you double-check how I expanded the index_getnext_slot()? I recall I wasn't entirely confident the result is correct, and I wanted to try getting rid of the "while (true)" loop. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote: > Makes sense, I also need to think about maybe not having duplicate > clauses in the two lists. What annoys me on that it partially > prevents > the cost-based reordering done by order_qual_clauses(). So maybe we > should have three lists ... Also, some of the expressions count be > fairly expensive. Can we just calculate the costs of the pushdown and do it when it's a win? If the random_page_cost savings exceed the costs from evaluating the clause earlier, then push down. > BTW could you double-check how I expanded the index_getnext_slot()? I > recall I wasn't entirely confident the result is correct, and I > wanted > to try getting rid of the "while (true)" loop. I suggest refactoring slightly to have the two loops in different functions (rather than nested loops in the same function) to make control flow a bit more clear. I'm not sure if the new function for the inner loop should be defined in nodeIndexscan.c or indexam.c; I suppose it depends on how clean the signature looks. Also please expand the tests a bit to show more EXPLAIN plans that illustrate the different cases. Regards, Jeff Davis
On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote: > > Makes sense, I also need to think about maybe not having duplicate > > clauses in the two lists. What annoys me on that it partially > > prevents > > the cost-based reordering done by order_qual_clauses(). So maybe we > > should have three lists ... Also, some of the expressions count be > > fairly expensive. > > Can we just calculate the costs of the pushdown and do it when it's a > win? If the random_page_cost savings exceed the costs from evaluating > the clause earlier, then push down. My patch that teaches nbtree to execute ScalarArrayOps intelligently (by dynamically choosing to not re-descend the btree to perform another "primitive index scan" when the data we need is located on the same leaf page as the current ScalarArrayOps arrays) took an interesting turn recently -- one that seems related. I found that certain kinds of queries are dramatically faster once you teach the optimizer to accept that multi-column ScalarArrayOps can be trusted to return tuples in logical/index order, at least under some circumstances. For example: pg@regression:5555 [583930]=# create index order_by_saop on tenk1(two,four,twenty); CREATE INDEX pg@regression:5555 [583930]=# EXPLAIN (ANALYZE, BUFFERS) select ctid, thousand from tenk1 where two in (0,1) and four in (1,2) and twenty in (1,2) order by two, four, twenty limit 20; This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared hit=13" with my patch. All because we can safely terminate the scan early now. The vast majority of the buffer hits the patch will avoid are against heap pages, even though I started out with the intention of eliminating unnecessary repeat index page accesses. Note that build_index_paths() currently refuses to allow SAOP clauses to be recognized as ordered with a multi-column index and a query with a clause for more than the leading column -- that is something that the patch needs to address (to get this particular improvement, at least). Allowing such an index path to have useful pathkeys is typically safe (in the sense that it won't lead to wrong answers to queries), but we still make a conservative assumption that they can lead to wrong answers. There are comments about "equality constraints" that describe the restrictions right now. But it's not just the question of basic correctness -- the optimizer is very hesitant to use multi-column SAOPs, even in cases that don't care about ordering. So it's also (I think, implicitly) a question of *risk*. The risk of getting very inefficient SAOP execution in nbtree, when it turns out that a huge number of "primitive index scans" are needed. But, if nbtree is taught to "coalesce together" primitive index scans at runtime, that risk goes way down. Anyway, this seems related to what you're talking about because the relationship between selectivity and ordering seems particularly important in this context. And because it suggests that there is at least some scope for adding "run time insurance" to the executor, which is valuable in the optimizer if it bounds the potential downside. If you can be practically certain that there is essentially zero risk of serious problems when the costing miscalculates (for a limited subset of cases), then life might be a lot easier -- clearly we should be biased in one particular direction with a case that has that kind of general profile. My current understanding of the optimizer side of things -- particularly things like costing for "filter quals/pqquals" versus costing for "true index quals" -- is rather basic. That will have to change. Curious to hear your thoughts (if any) on how what you're discussing may relate to what I need to do with my patch. Right now my patch assumes that making SAOP clauses into proper index quals (that usually preserve index ordering) is an unalloyed good (when safe!). This assumption is approximately true on average, as far as I can tell. But it's probably quite untrue in various specific cases, that somebody is bound to care about. -- Peter Geoghegan
On 7/19/23 23:38, Peter Geoghegan wrote: > On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis <pgsql@j-davis.com> wrote: >> On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote: >>> Makes sense, I also need to think about maybe not having duplicate >>> clauses in the two lists. What annoys me on that it partially >>> prevents >>> the cost-based reordering done by order_qual_clauses(). So maybe we >>> should have three lists ... Also, some of the expressions count be >>> fairly expensive. >> >> Can we just calculate the costs of the pushdown and do it when it's a >> win? If the random_page_cost savings exceed the costs from evaluating >> the clause earlier, then push down. > > My patch that teaches nbtree to execute ScalarArrayOps intelligently > (by dynamically choosing to not re-descend the btree to perform > another "primitive index scan" when the data we need is located on the > same leaf page as the current ScalarArrayOps arrays) took an > interesting turn recently -- one that seems related. > > I found that certain kinds of queries are dramatically faster once you > teach the optimizer to accept that multi-column ScalarArrayOps can be > trusted to return tuples in logical/index order, at least under some > circumstances. For example: > > pg@regression:5555 [583930]=# create index order_by_saop on > tenk1(two,four,twenty); > CREATE INDEX > > pg@regression:5555 [583930]=# EXPLAIN (ANALYZE, BUFFERS) > select ctid, thousand from tenk1 > where two in (0,1) and four in (1,2) and twenty in (1,2) > order by two, four, twenty limit 20; > > This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared > hit=13" with my patch. All because we can safely terminate the scan > early now. The vast majority of the buffer hits the patch will avoid > are against heap pages, even though I started out with the intention > of eliminating unnecessary repeat index page accesses. > > Note that build_index_paths() currently refuses to allow SAOP clauses > to be recognized as ordered with a multi-column index and a query with > a clause for more than the leading column -- that is something that > the patch needs to address (to get this particular improvement, at > least). Allowing such an index path to have useful pathkeys is > typically safe (in the sense that it won't lead to wrong answers to > queries), but we still make a conservative assumption that they can > lead to wrong answers. There are comments about "equality constraints" > that describe the restrictions right now. > > But it's not just the question of basic correctness -- the optimizer > is very hesitant to use multi-column SAOPs, even in cases that don't > care about ordering. So it's also (I think, implicitly) a question of > *risk*. The risk of getting very inefficient SAOP execution in nbtree, > when it turns out that a huge number of "primitive index scans" are > needed. But, if nbtree is taught to "coalesce together" primitive > index scans at runtime, that risk goes way down. > > Anyway, this seems related to what you're talking about because the > relationship between selectivity and ordering seems particularly > important in this context. And because it suggests that there is at > least some scope for adding "run time insurance" to the executor, > which is valuable in the optimizer if it bounds the potential > downside. If you can be practically certain that there is essentially > zero risk of serious problems when the costing miscalculates (for a > limited subset of cases), then life might be a lot easier -- clearly > we should be biased in one particular direction with a case that has > that kind of general profile. > > My current understanding of the optimizer side of things -- > particularly things like costing for "filter quals/pqquals" versus > costing for "true index quals" -- is rather basic. That will have to > change. Curious to hear your thoughts (if any) on how what you're > discussing may relate to what I need to do with my patch. Right now my > patch assumes that making SAOP clauses into proper index quals (that > usually preserve index ordering) is an unalloyed good (when safe!). > This assumption is approximately true on average, as far as I can > tell. But it's probably quite untrue in various specific cases, that > somebody is bound to care about. > I think the SAOP patch may need to be much more careful about this, but for this patch it's simpler because it doesn't really change any of the index internals, or the indexscan in general. If I simplify that a bit, we're just reordering the clauses in a way to maybe eliminate the heap fetch. The main risk seems to be that this will force an expensive qual to the front of the list, just because it can be evaluated on the index tuple. But the difference would need to be worse than what we save by not doing the I/O - considering how expensive the I/O is, that seems unlikely. Could happen for expensive quals that don't really eliminate many rows, I guess. Anyway, I see this as extension of what order_qual_clauses() does. The main issue is that even order_qual_clauses() admits the estimates are somewhat unreliable, so we can't expect to make perfect decisions. FWIW: While reading order_qual_clauses() I realized the code may need to be more careful about leakproof stuff. Essentially, if any of the non-index clauses is leakproof, we can only do leakproof quals on the index. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 20, 2023 at 4:35 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I think the SAOP patch may need to be much more careful about this, but > for this patch it's simpler because it doesn't really change any of the > index internals, or the indexscan in general. It's true that the SAOP patch needs relatively complicated infrastructure to assess whether or not the technique is safe with a given set of quals. You cannot safely get an ordered index scan for something like "select * from table where a < 5 and b in (1,2,3) order by a, b". With or without my patch. My patch isn't really changing all that much about the behavior in nbtree, as these things go. It's *surprising* how little has to change about the high level structure of index scans, in fact. (Actually, I'm glossing over a lot. The MDAM paper describes techniques that'd make even the really challenging cases safe, through a process of converting quals from conjunctive normal form into disjunctive normal form. This is more or less the form that the state machine implemented by _bt_advance_array_keys() produces already, today. But even with all this practically all of the heavy lifting takes place before the index scan even begins, during preprocessing -- so you still require surprisingly few changes to index scans themselves.) > If I simplify that a bit, we're just reordering the clauses in a way to > maybe eliminate the heap fetch. The main risk seems to be that this will > force an expensive qual to the front of the list, just because it can be > evaluated on the index tuple. My example query might have been poorly chosen, because it involved a limit. What I'm thinking of is more general than that. > But the difference would need to be worse > than what we save by not doing the I/O - considering how expensive the > I/O is, that seems unlikely. Could happen for expensive quals that don't > really eliminate many rows, I guess. That sounds like the same principle that I was talking about. I think that it can be pushed quite far, though. I am mostly talking about the worst case, and it seems like you might not be. You can easily construct examples where some kind of skew causes big problems with a multi-column index. I'm thinking of indexes whose leading columns are low cardinality, and queries where including the second column as an index qual looks kind of marginal to the optimizer. Each grouping represented in the most significant index column might easily have its own unique characteristics; the distribution of values in subsequent columns might be quite inconsistent across each grouping, in whatever way. Since nothing stops a qual on a lower order column having a wildly different selectivity for one particular grouping, it might not make sense to say that a problem in this area is due to a bad selectivity estimate. Even if we have perfect estimates, what good can they do if the optimal strategy is to vary our strategy at runtime, *within* an individual index scan, as different parts of the key space (different groupings) are traversed through? To skip or not to skip, say. This isn't about picking the cheapest plan, really. That's another huge advantage of index quals -- they can (at least in principle) allow you skip over big parts of the index when it just ends up making sense, in whatever way, for whatever reason. In the index, and in the heap. Often both. You'd likely always prefer to err in the direction of having more index quals rather than fewer, when doing so doesn't substantially change the plan itself. It could be very cheap insurance, even without any limit. (It would probably also be a lot faster, but it needn't be.) > Anyway, I see this as extension of what order_qual_clauses() does. The > main issue is that even order_qual_clauses() admits the estimates are > somewhat unreliable, so we can't expect to make perfect decisions. The attribute value independence assumption is wishful thinking, in no small part -- it's quite surprising that it works as well as it does, really. -- Peter Geoghegan
On 7/21/23 05:32, Peter Geoghegan wrote: > On Thu, Jul 20, 2023 at 4:35 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I think the SAOP patch may need to be much more careful about this, but >> for this patch it's simpler because it doesn't really change any of the >> index internals, or the indexscan in general. > > It's true that the SAOP patch needs relatively complicated > infrastructure to assess whether or not the technique is safe with a > given set of quals. You cannot safely get an ordered index scan for > something like "select * from table where a < 5 and b in (1,2,3) order > by a, b". With or without my patch. My patch isn't really changing all > that much about the behavior in nbtree, as these things go. It's > *surprising* how little has to change about the high level structure > of index scans, in fact. > > (Actually, I'm glossing over a lot. The MDAM paper describes > techniques that'd make even the really challenging cases safe, through > a process of converting quals from conjunctive normal form into > disjunctive normal form. This is more or less the form that the state > machine implemented by _bt_advance_array_keys() produces already, > today. But even with all this practically all of the heavy lifting > takes place before the index scan even begins, during preprocessing -- > so you still require surprisingly few changes to index scans > themselves.) > Ah, OK. I was assuming the execution might be more complex. But I was thinking more about the costing part - if you convert the clauses in some way, does that affect the reliability of estimates somehow? If the conversion from AND to OR makes the list of clauses more complex, that might be an issue ... The index-only filter does no such conversion, it just uses the same clauses as before. >> If I simplify that a bit, we're just reordering the clauses in a way to >> maybe eliminate the heap fetch. The main risk seems to be that this will >> force an expensive qual to the front of the list, just because it can be >> evaluated on the index tuple. > > My example query might have been poorly chosen, because it involved a > limit. What I'm thinking of is more general than that. > I wasn't really thinking about LIMIT, and I don't think it changes the overall behavior very much (sure, it's damn difficult to estimate for skewed data sets, but meh). The case I had in mind is something like this: CREATE TABLE t (a int, b int, c int); CREATE INDEX ON t (a); INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i); SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a; where bad_qual is expensive and matches almost all rows. Without the index-only filters, we'd evaluate the conditions in this order [b < 1], [c < 1], [bad_qual(a)] but with naive index-only filters we do this: [bad_qual(a)], [b < 1], [c < 1] which is bad as it runs the expensive thing on every row. FWIW the "ORDER BY" is necessary, because otherwise we may not even build the index path (no index keys, no interesting pathkeys). It's just an opportunistic optimization - if already doing index scan, try doing this too. I wonder if we should relax that ... >> But the difference would need to be worse >> than what we save by not doing the I/O - considering how expensive the >> I/O is, that seems unlikely. Could happen for expensive quals that don't >> really eliminate many rows, I guess. > > That sounds like the same principle that I was talking about. I think > that it can be pushed quite far, though. I am mostly talking about the > worst case, and it seems like you might not be. > Yeah, I was proposing the usual costing approach, based on "average" behavior. It has the usual weakness that if the estimates are far off, we can have issues. There have been various discussions about maybe considering how reliable the estimates are, to defend against that. Which is tough to do. In a way, focusing on the worst case does that by assuming the worst combination - which is fine, although it may choose the slower (but safer) approach in some cases. > You can easily construct examples where some kind of skew causes big > problems with a multi-column index. I'm thinking of indexes whose > leading columns are low cardinality, and queries where including the > second column as an index qual looks kind of marginal to the > optimizer. Each grouping represented in the most significant index > column might easily have its own unique characteristics; the > distribution of values in subsequent columns might be quite > inconsistent across each grouping, in whatever way. > > Since nothing stops a qual on a lower order column having a wildly > different selectivity for one particular grouping, it might not make > sense to say that a problem in this area is due to a bad selectivity > estimate. Even if we have perfect estimates, what good can they do if > the optimal strategy is to vary our strategy at runtime, *within* an > individual index scan, as different parts of the key space (different > groupings) are traversed through? To skip or not to skip, say. This > isn't about picking the cheapest plan, really. > Well, yeah. It's one thing to assign some cost estimate to the plan, and another thing what happens at runtime. It would be nice to be able to reflect the expected runtime behavior in the cost (otherwise you get confused users complaining that we pick the "wrong" plan). > That's another huge advantage of index quals -- they can (at least in > principle) allow you skip over big parts of the index when it just > ends up making sense, in whatever way, for whatever reason. In the > index, and in the heap. Often both. You'd likely always prefer to err > in the direction of having more index quals rather than fewer, when > doing so doesn't substantially change the plan itself. It could be > very cheap insurance, even without any limit. (It would probably also > be a lot faster, but it needn't be.) > True, although my patch doesn't change the number of index quals. It's just about the other quals. >> Anyway, I see this as extension of what order_qual_clauses() does. The >> main issue is that even order_qual_clauses() admits the estimates are >> somewhat unreliable, so we can't expect to make perfect decisions. > > The attribute value independence assumption is wishful thinking, in no > small part -- it's quite surprising that it works as well as it does, > really. > Yeah. For OLTP it works pretty well, for OLAP not so much. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 21, 2023 at 4:52 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > (Actually, I'm glossing over a lot. The MDAM paper describes > > techniques that'd make even the really challenging cases safe, through > > a process of converting quals from conjunctive normal form into > > disjunctive normal form. This is more or less the form that the state > > machine implemented by _bt_advance_array_keys() produces already, > > today. But even with all this practically all of the heavy lifting > > takes place before the index scan even begins, during preprocessing -- > > so you still require surprisingly few changes to index scans > > themselves.) > > > > Ah, OK. I was assuming the execution might be more complex. Sort of. Execution of individual "primitive index scans" effectively works the same way as it does already -- the preprocessing is required to generate disjunctive primitive index scans that look like one big index scan when combined (which is how native execution of SAOPs by nbtree works today). The challenge during execution of index scans (execution proper, not preprocessing) comes from processing a "flattened" DNF representation of your original quals efficiently. If you have (say) 3 SAOPs, then the total number of distinct DNF quals is the cartesian product of the 3 arrays -- which is multiplicative. But, you can skip over the single-value DNF quals quickly when they have no matches. Which isn't all that hard. We get some very useful invariants with these DNF quals: you can have at most one individual DNF qual as a match for any individual index tuple. Plus the quals are materialized in key space order, which is ideally suited to processing by an ordered scan. So just as you can use the array keys to skip over parts of the index when searching for an index tuple, you can use an index tuple to skip over the arrays when searching for the next relevant set of array keys. It works both ways! > But I was > thinking more about the costing part - if you convert the clauses in > some way, does that affect the reliability of estimates somehow? Obviously, it doesn't affect the selectivity at all. That seems most important (you kinda said the same thing yourself). > If the > conversion from AND to OR makes the list of clauses more complex, that > might be an issue ... That's definitely a concern. Even still, the biggest problem by far in this general area is selectivity estimation. Which, in a way, can be made a lot easier by this general approach. Let's say we have the tenk1 table, with the same composite index as in my example upthread (on "(two,four,twenty)"). Further suppose you have a very simple query: "select count(*) from tenk1". On master (and with the patch) that's going to give you an index-only scan on the composite index (assuming it's the only index), which is quite efficient despite being a full index scan -- 11 buffer hits. This much more complicated count(*) query is where it gets interesting: select count(*), two, four, twenty from tenk1_dyn_saop where two in (0, 1) and four in (1, 2, 3, 4) and twenty in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) group by two, four, twenty order by two, four, twenty; It's inherently very difficult to predict how selective this query will be using basic statistics. But maybe it doesn't need to matter so much, so often. The patch can execute this with an index-only scan + GroupAggregate. What ends up happening is that the patch gets 9 buffer hits -- so pretty close to 11. Master can use almost the same query plan (it uses quals but needs to hashagg+ sort). It ends up getting 245 buffer hits -- vastly more than what we see for the full index scan case (and nothing to do with the sort/an issue with a limit). That's nearly as many hits as you'd get with a sequential scan. (BTW, I don't need to coax the query planner to get this result on master.) With the patch you can vary the predicate in whatever way, so that the selectivity shifts up or down. Occasionally you'll get maybe one extra buffer access relative to the base full index scan case, but overall the patch makes the worst case look very much like a full index scan (plus some relatively tiny CPU overhead). This is just common sense, in a way; selectivities are always between 0.0 and 1.0. Why shouldn't we be able to think about it like that? > I wasn't really thinking about LIMIT, and I don't think it changes the > overall behavior very much (sure, it's damn difficult to estimate for > skewed data sets, but meh). > > The case I had in mind is something like this: > > CREATE TABLE t (a int, b int, c int); > CREATE INDEX ON t (a); > INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i); > > SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a; > > where bad_qual is expensive and matches almost all rows. You must distinguish between quals that can become required scan keys (or can terminate the scan), and all other quals. This is really important for my pending SAOP patch, but I think it might be important here too. I wonder if the best place to address the possibility of such a regression is in the index AM itself. Let's make your example a bit more concrete: let's assume that bad_qual is a very expensive integer comparison, against a column that has only one possible value. So now your example becomes: CREATE TABLE t (a expensive_int, b int, c int); CREATE INDEX ON t (a); INSERT INTO t SELECT 42, i, i FROM generate_series(1,1000000) s(i); SELECT * FROM t a in (7, 42) AND b < 1 AND c < 1 ORDER BY a; (I'm using a SAOP here because the planner will more or less disregard the ORDER BY if I make it "= 42" instead. Maybe that makes it simpler.) Sure, you're getting a full index scan here, and you get all these useless comparisons on "a" -- that's a real risk. But AFAICT there is no real need for it. There is another nbtree patch that might help. A patch that teaches nbtree's _bt_readpage function to skip useless comparisons like this: https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571@garret.ru In order for this kind of optimization to be possible at all, we must be able to reason about "a" as a column whose values will always be in key space order. That is, nbtree must recognize that "a" is the most significant key column, not (say) a low-order column from a composite index -- it's a required column in both directions. If _bt_readpage can determine that the first tuple on a leaf page has the value "42", and the high key has that same value, then we can skip all of the comparisons of "a" for that page, right away (in fact we don't require any comparisons). Now it doesn't matter that they're super expensive comparisons (or it hardly matters). It's natural to think of things like this _bt_readpage optimization as something that makes existing types of plan shapes run faster. But you can also think of them as things that make new and fundamentally better plan shapes feasible, by making risky things much less risky. > FWIW the "ORDER BY" is necessary, because otherwise we may not even > build the index path (no index keys, no interesting pathkeys). It's just > an opportunistic optimization - if already doing index scan, try doing > this too. I wonder if we should relax that ... I'm kinda doing the same thing with ordering in my own patch. In general, even if the query really doesn't care about the index order, there may be a lot of value in making the nbtree code understand that this is an ordered index scan. That's what enables skipping, in all its forms (skipping individual comparisons, skipping whole subsections of the index, etc). I'm not saying that this is 100% problem free. But it seems like a promising high level direction. > In a way, focusing on the worst case does that by assuming the worst > combination - which is fine, although it may choose the slower (but > safer) approach in some cases. I don't think that it has to be slower on average (even by a tiny bit). It might just end up being slightly faster on average, and way faster on occasion. -- Peter Geoghegan
On 7/21/23 21:17, Peter Geoghegan wrote: > ... >> But I was >> thinking more about the costing part - if you convert the clauses in >> some way, does that affect the reliability of estimates somehow? > > Obviously, it doesn't affect the selectivity at all. That seems most > important (you kinda said the same thing yourself). > Sorry, I think I meant 'cost estimates', not the selectivity estimates. If you convert the original "simple" clauses into the more complex list, presumably we'd cost that differently, right? I may be entirely wrong, but my intuition is that costing these tiny clauses will be much more difficult than costing the original clauses. >> If the >> conversion from AND to OR makes the list of clauses more complex, that >> might be an issue ... > > That's definitely a concern. Even still, the biggest problem by far in > this general area is selectivity estimation. Which, in a way, can be > made a lot easier by this general approach. > > Let's say we have the tenk1 table, with the same composite index as in > my example upthread (on "(two,four,twenty)"). Further suppose you have > a very simple query: "select count(*) from tenk1". On master (and with > the patch) that's going to give you an index-only scan on the > composite index (assuming it's the only index), which is quite > efficient despite being a full index scan -- 11 buffer hits. > > This much more complicated count(*) query is where it gets interesting: > > select > count(*), > two, > four, > twenty > from > tenk1_dyn_saop > where > two in (0, 1) > and four in (1, 2, 3, 4) > and twenty in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) > group by > two, > four, > twenty > order by > two, > four, > twenty; > > It's inherently very difficult to predict how selective this query > will be using basic statistics. But maybe it doesn't need to matter so > much, so often. > > The patch can execute this with an index-only scan + GroupAggregate. > What ends up happening is that the patch gets 9 buffer hits -- so > pretty close to 11. Master can use almost the same query plan (it uses > quals but needs to hashagg+ sort). It ends up getting 245 buffer hits > -- vastly more than what we see for the full index scan case (and > nothing to do with the sort/an issue with a limit). That's nearly as > many hits as you'd get with a sequential scan. (BTW, I don't need to > coax the query planner to get this result on master.) > > With the patch you can vary the predicate in whatever way, so that the > selectivity shifts up or down. Occasionally you'll get maybe one extra > buffer access relative to the base full index scan case, but overall > the patch makes the worst case look very much like a full index scan > (plus some relatively tiny CPU overhead). This is just common sense, > in a way; selectivities are always between 0.0 and 1.0. Why shouldn't > we be able to think about it like that? > Right, I agree with this reasoning in principle. But I'm getting a bit lost regarding what's the proposed costing strategy. It's hard to follow threads spanning days, with various other distractions, etc. In principle, I think: a) If we estimate the scan to return almost everything (or rather if we expect it to visit almost the whole index), it makes perfect sense to cost is as a full index scan. b) What should we do if we expect to read only a fraction of the index? If we're optimistic, and cost is according to the estimates, but then end up most of the index, how bad could it be (compared to the optimal plan choice)? Similarly, if we're pessimistic/defensive and cost it as full index scan, how many "good" cases would we reject based on the artificially high cost estimate? I don't have a very good idea how sensitive the cost is to selectivity changes, which I think is crucial for making judgments. >> I wasn't really thinking about LIMIT, and I don't think it changes the >> overall behavior very much (sure, it's damn difficult to estimate for >> skewed data sets, but meh). >> >> The case I had in mind is something like this: >> >> CREATE TABLE t (a int, b int, c int); >> CREATE INDEX ON t (a); >> INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i); >> >> SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a; >> >> where bad_qual is expensive and matches almost all rows. > > You must distinguish between quals that can become required scan keys > (or can terminate the scan), and all other quals. This is really > important for my pending SAOP patch, but I think it might be important > here too. I wonder if the best place to address the possibility of > such a regression is in the index AM itself. > > Let's make your example a bit more concrete: let's assume that > bad_qual is a very expensive integer comparison, against a column that > has only one possible value. So now your example becomes: > > CREATE TABLE t (a expensive_int, b int, c int); > CREATE INDEX ON t (a); > INSERT INTO t SELECT 42, i, i FROM generate_series(1,1000000) s(i); > SELECT * FROM t a in (7, 42) AND b < 1 AND c < 1 ORDER BY a; > > (I'm using a SAOP here because the planner will more or less disregard > the ORDER BY if I make it "= 42" instead. Maybe that makes it > simpler.) > > Sure, you're getting a full index scan here, and you get all these > useless comparisons on "a" -- that's a real risk. But AFAICT there is > no real need for it. There is another nbtree patch that might help. A > patch that teaches nbtree's _bt_readpage function to skip useless > comparisons like this: > > https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571@garret.ru > > In order for this kind of optimization to be possible at all, we must > be able to reason about "a" as a column whose values will always be in > key space order. That is, nbtree must recognize that "a" is the most > significant key column, not (say) a low-order column from a composite > index -- it's a required column in both directions. If _bt_readpage > can determine that the first tuple on a leaf page has the value "42", > and the high key has that same value, then we can skip all of the > comparisons of "a" for that page, right away (in fact we don't require > any comparisons). Now it doesn't matter that they're super expensive > comparisons (or it hardly matters). > > It's natural to think of things like this _bt_readpage optimization as > something that makes existing types of plan shapes run faster. But you > can also think of them as things that make new and fundamentally > better plan shapes feasible, by making risky things much less risky. > That'd be an interesting optimization, but I don't think that matters for this patch, as it's not messing with index scan keys at all. I mean, it does not affect what scan keys get passed to the AM at all, or what scan keys are required. And it does not influence what the AM does. So all this seems interesting, but rather orthogonal to this patch. >> FWIW the "ORDER BY" is necessary, because otherwise we may not even >> build the index path (no index keys, no interesting pathkeys). It's just >> an opportunistic optimization - if already doing index scan, try doing >> this too. I wonder if we should relax that ... > > I'm kinda doing the same thing with ordering in my own patch. In > general, even if the query really doesn't care about the index order, > there may be a lot of value in making the nbtree code understand that > this is an ordered index scan. That's what enables skipping, in all > its forms (skipping individual comparisons, skipping whole subsections > of the index, etc). > > I'm not saying that this is 100% problem free. But it seems like a > promising high level direction. > I was rather thinking about maybe relaxing the rules about which index paths we create, to include indexes that might be interesting thanks to index-only filters (unless already picked thanks to sorting). >> In a way, focusing on the worst case does that by assuming the worst >> combination - which is fine, although it may choose the slower (but >> safer) approach in some cases. > > I don't think that it has to be slower on average (even by a tiny > bit). It might just end up being slightly faster on average, and way > faster on occasion. > OK, that'd be nice. I don't have a very good intuition about behavior for these queries, I'd need to play with & benchmark it the way I did for the prefetching patch etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jul 23, 2023 at 5:04 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Sorry, I think I meant 'cost estimates', not the selectivity estimates. > If you convert the original "simple" clauses into the more complex list, > presumably we'd cost that differently, right? I may be entirely wrong, > but my intuition is that costing these tiny clauses will be much more > difficult than costing the original clauses. I think that that's definitely true (it is more difficult), but that there may be a bigger picture. > Right, I agree with this reasoning in principle. > > But I'm getting a bit lost regarding what's the proposed costing > strategy. To be clear, I don't have a clue how to better estimate the cardinality of multiple attributes from a composite index. The big and immediate change to the SAOP costing with my patch is that genericcostestimate/btcostestimate can safely assume that visiting each leaf page more than once is a physical impossibility. Because it is. It is no longer necessary to treat SAOPs similarly to a nested loop join during costing, which is how it works today. Now, whenever you add increasingly complicated clauses to a multi-column SAOP query (like the ones I've shown you), it makes sense for the cost to "saturate" at a certain point. That should be representative of the physical reality, for both CPU costs and I/O costs. Right now the worst case is really relevant to the average case, since the risk of the costs just exploding at runtime is very real. If the only problem in this area was the repeated accesses to the same leaf pages (accesses that happen in very close succession anyway), then all of this would be a nice win, but not much more. It certainly wouldn't be expected to change the way we think about stuff that isn't directly and obviously relevant. But, it's not just the index pages. Once you start to consider the interactions with filter/qpquals, it gets much more interesting. Now you're talking about completely avoiding physical I/Os for heap accesses, which has the potential to make a dramatic difference to some types of queries, particularly in the worst case. > It's hard to follow threads spanning days, with various other > distractions, etc. I have to admit that my thinking on this very high level stuff is rather unrefined. As much as anything else, I'm trying to invent (or discover) a shared vocabulary for discussing these issues. I might have gone about it clumsily, at times. I appreciate being able to bounce this stuff off you. > I don't have a very good idea how sensitive the cost is to selectivity > changes, which I think is crucial for making judgments. I'm not trying to find a way for the optimizer to make better judgments about costing with a multi-column index. What I'm suggesting (rather tentatively) is to find a way for it to make fewer (even no) judgements at all. If you can find a way of reducing the number of possible choices without any real downside -- in particular by just not producing index paths that cannot possibly be a win in some cases -- then you reduce the number of bad choices. The challenge is making that kind of approach in the optimizer actually representative of the executor in the real world. The executor has to have robust performance under a variety of runtime conditions for any of this to make sense. > > It's natural to think of things like this _bt_readpage optimization as > > something that makes existing types of plan shapes run faster. But you > > can also think of them as things that make new and fundamentally > > better plan shapes feasible, by making risky things much less risky. > > > > That'd be an interesting optimization, but I don't think that matters > for this patch, as it's not messing with index scan keys at all. I mean, > it does not affect what scan keys get passed to the AM at all, or what > scan keys are required. And it does not influence what the AM does. So > all this seems interesting, but rather orthogonal to this patch. Your patch is approximately the opposite of what I'm talking about, in terms of its structure. The important commonality is that each patch adds "superfluous" quals that can be very useful at runtime, under the right circumstances -- which can be hard to predict. Another similarity is that both patches inspire some of the same kind of lingering doubts about extreme cases -- cases where (somehow) the usual/expected cost asymmetry that usually works in our favor doesn't apply. My current plan is to post v1 of my patch early next week. It would be better to discuss this on the thread that I create for that patch. You're right that "exploiting index ordering" on the index AM side is totally unrelated to your patch. Sorry about that. > I was rather thinking about maybe relaxing the rules about which index > paths we create, to include indexes that might be interesting thanks to > index-only filters (unless already picked thanks to sorting). That seems like it would make sense. In general I think that we overuse and over rely on bitmap index scans -- we should try to chip away at the artificial advantages that bitmap index scans have, so that we can get the benefit of index scans more often -- accessing the heap/VM inline does open up a lot of possibilities that bitmap scans will never be able to match. (BTW, I'm hoping that your work on index prefetching will help with that.) I see that your patch has this diff change (which is 1 out of only 2 or 3 plan changes needed by the patch): --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1838,18 +1838,13 @@ DROP TABLE onek_with_null; EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); - QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ - Bitmap Heap Scan on tenk1 - Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42))) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 1)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 3)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 42)) -(9 rows) + QUERY PLAN +----------------------------------------------------------------------- + Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond: (thousand = 42) + Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) + Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) +(4 rows) That does seem like an improvement to me. But, an even better plan is possible. Or would be possible once this SAOP-OR-transformation patch is in place (if it was then combined with my own SAOP patch): https://www.postgresql.org/message-id/flat/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru#9d877caf48c4e331e507b5c63914228e That could give us an index scan plan that is perfectly efficient. Like the new plan shown here, it would pin/lock a single leaf page from the tenk1_thous_tenthous index, once. But unlike the plan shown here, it would be able to terminate as soon as the index scan reached an index tuple where thousand=42 and tenthous>42. That makes a significant difference if we have to do heap accesses for those extra tuples. Plus this hypothetical other plan of mine would be more robust: it would tolerate misestimates. It happens to be the case that there just aren't that many tuples with thousand=42 -- they take up only a fraction of one leaf page. But...why take a chance on that being true, if we don't have to? The old bitmap index scan plan has this same advantage already -- it is robust in the very same way. Because it managed to pass down specific "tenthous" index quals. It would be nice to have both advantages, together. (To be clear, I'm certainly not suggesting that the example argues against what you want to do here -- it's just an example that jumped out at me.) Perhaps this example will make my confusion about the boundaries between each of our patches a bit more understandable. I was confused -- and I still am. I look forward to being less confused at some point in the future. -- Peter Geoghegan
On Sun, 2023-07-23 at 14:04 +0200, Tomas Vondra wrote: > But I'm getting a bit lost regarding what's the proposed costing > strategy. It's hard to follow threads spanning days, with various > other > distractions, etc. I'm getting a bit lost in this discussion as well -- for the purposes of this feature, we only need to know whether to push down a clause as an Index Filter or not, right? Could we start out conservatively and push down as an Index Filter unless there is some other clause ahead of it that can't be pushed down? That would allow users to have some control by writing clauses in the desired order or wrapping them in functions with a declared cost. Regards, Jeff Davis
On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis <pgsql@j-davis.com> wrote: > I'm getting a bit lost in this discussion as well -- for the purposes > of this feature, we only need to know whether to push down a clause as > an Index Filter or not, right? I think so. > Could we start out conservatively and push down as an Index Filter > unless there is some other clause ahead of it that can't be pushed > down? That would allow users to have some control by writing clauses in > the desired order or wrapping them in functions with a declared cost. I'm a bit concerned about cases like the one I described from the regression tests. The case in question shows a cheaper plan replacing a more expensive plan -- so it's a win by every conventional measure. But, the new plan is less robust in the sense that I described yesterday: it will be much slower than the current plan when there happens to be many more "thousand = 42" tuples than expected. We have a very high chance of a small benefit (we save repeated index page accesses), but a very low chance of a high cost (we incur many more heap accesses). Which seems less than ideal. One obvious way of avoiding that problem (that's probably overly conservative) is to just focus on the original complaint from Maxim. The original concern was limited to non-key columns from INCLUDE indexes. If you only apply the optimization there then you don't run the risk of generating a path that "out competes" a more robust path in the sense that I've described. This is obviously true because there can't possibly be index quals/scan keys for non-key columns within the index AM. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis <pgsql@j-davis.com> wrote: >> Could we start out conservatively and push down as an Index Filter >> unless there is some other clause ahead of it that can't be pushed >> down? That would allow users to have some control by writing clauses in >> the desired order or wrapping them in functions with a declared cost. > I'm a bit concerned about cases like the one I described from the > regression tests. Please do not put in any code that assumes that restriction clause order is preserved, or encourages users to think it is. There are already cases where that's not so, eg equivalence clauses tend to get shuffled around. regards, tom lane
On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote: > Please do not put in any code that assumes that restriction clause > order is preserved, or encourages users to think it is. Agreed. I didn't mean to add any extra guarantee of preserving clause order; just to follow the current way order_qual_clauses() works, which has a comment saying: "So we just order by security level then estimated per-tuple cost, being careful not to change the order when (as is often the case) the estimates are identical." I assumed that the reason for "being careful" above was to not unnecessarily override how the user writes the qual clauses, but perhaps there's another reason? Regardless, my point was just to make minimal changes now that are unlikely to cause regressions. If we come up with better ways of ordering the clauses later, that could be part of a separate change. (I think Peter G. is pointing out a complication with that idea, to which I'll respond separately.) Regards, Jeff Davis
On Mon, 2023-07-24 at 10:54 -0700, Peter Geoghegan wrote: > The case in question shows a cheaper plan replacing a more expensive > plan -- so it's a win by every conventional measure. But, the new > plan > is less robust in the sense that I described yesterday: it will be > much slower than the current plan when there happens to be many more > "thousand = 42" tuples than expected. We have a very high chance of a > small benefit (we save repeated index page accesses), but a very low > chance of a high cost (we incur many more heap accesses). Which seems > less than ideal. I see. You're concerned that lowering the cost of an index scan path too much, due to pushing down a clause as an Index Filter, could cause it to out-compete a more "robust" plan. That might be true but I'm not sure what to do about that unless we incorporate some "robustness" measure into the costing. If every measure we have says one plan is better, don't we have to choose it? > The original concern was limited to non-key columns from INCLUDE > indexes. If you only apply the optimization there then you don't run > the risk of generating a path that "out competes" a more robust path > in the sense that I've described. This is obviously true because > there > can't possibly be index quals/scan keys for non-key columns within > the > index AM. If I understand correctly, you mean we couldn't use an Index Filter on a key column? That seems overly restrictive, there are plenty of clauses that might be useful as an Index Filter but cannot be an Index Cond for one reason or another. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote: >> Please do not put in any code that assumes that restriction clause >> order is preserved, or encourages users to think it is. > Agreed. I didn't mean to add any extra guarantee of preserving clause > order; just to follow the current way order_qual_clauses() works, which > has a comment saying: > "So we just order by security level then estimated per-tuple cost, > being careful not to change the order when (as is often the case) the > estimates are identical." > I assumed that the reason for "being careful" above was to not > unnecessarily override how the user writes the qual clauses, but > perhaps there's another reason? I think the point was just to not make any unnecessary behavioral changes from the way things were before we added that sorting logic. But there are other places that will result in clause ordering changes, plus there's the whole business of possibly-intermixed restriction and join clauses. regards, tom lane
On Mon, Jul 24, 2023 at 11:37 AM Jeff Davis <pgsql@j-davis.com> wrote: > I see. You're concerned that lowering the cost of an index scan path > too much, due to pushing down a clause as an Index Filter, could cause > it to out-compete a more "robust" plan. The optimizer correctly determines that 3 index scans (plus a bitmap OR node) are more expensive than 1 very similar index scan. It's hard to argue with that. > That might be true but I'm not sure what to do about that unless we > incorporate some "robustness" measure into the costing. If every > measure we have says one plan is better, don't we have to choose it? I'm mostly concerned about the possibility itself -- it's not a matter of tuning the cost. I agree that that approach would probably be hopeless. There is a principled (albeit fairly involved) way of addressing this. The patch allows the optimizer to produce a plan that has 1 index scan, that treats the first column as an index qual, and the second column as a filter condition. There is no fundamental reason why we can't just have 1 index scan that makes both columns into index quals (instead of 3 highly duplicative variants of the same index scan). That's what I'm working towards right now. > If I understand correctly, you mean we couldn't use an Index Filter on > a key column? That seems overly restrictive, there are plenty of > clauses that might be useful as an Index Filter but cannot be an Index > Cond for one reason or another. I think that you're probably right about it being overly restrictive -- that was just a starting point for discussion. Perhaps there is an identifiable class of clauses that can benefit, but don't have the downside that I'm concerned about. -- Peter Geoghegan
On Mon, Jul 24, 2023 at 11:59 AM Peter Geoghegan <pg@bowt.ie> wrote: > > That might be true but I'm not sure what to do about that unless we > > incorporate some "robustness" measure into the costing. If every > > measure we have says one plan is better, don't we have to choose it? > > I'm mostly concerned about the possibility itself -- it's not a matter > of tuning the cost. I agree that that approach would probably be > hopeless. This seems related to the fact that EXPLAIN doesn't expose the difference between what Markus Winand calls "Access Predicates" and "Index Filter Predicates", as explained here: https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates That is, both "Access Predicates" and "Index Filter Predicates" are shown after an "Index Cond: " entry in Postgres EXPLAIN output, in general. Even though these are two very different things. I believe that the underlying problem for the implementation (the reason why we can't easily break this out further in EXPLAIN output) is that we don't actually know what kind of predicate it is ourselves -- at least not until execution time. We wait until then to do nbtree preprocessing/scan setup. Though perhaps we should do more of this during planning instead [1], for several reasons (fixing this is just one of those reasons). The risk to "robustness" for cases like the one I drew attention to on this thread would probably have been obvious all along if EXPLAIN output were more like what Markus would have us do -- he certainly has a good point here, in general. Breaking things out in EXPLAIN output along these lines might also give us a better general sense of when a similar plan shift like this was actually okay -- even according to something like my non-traditional "query robustness" criteria. It's much harder for me to argue that a shift in plans from what Markus calls an "Index Filter Predicate" to what the patch will show under "Index Filter:" is a novel new risk. That would be a much less consequential difference, because those two things are fairly similar anyway. Besides, such a shift in plan would have to "legitimately win" for things to work out like this. If we're essentially picking between two different subtypes of "Index Filter Predicate", then there can't be the same weird second order effects that we see when an "Access Predicate" is out-competed by an "Index Filter Predicate". It's possible that expression evaluation of a small-ish conjunctive predicate like "Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))" will be faster than a natively executed SAOP. You can't do that kind of expression evaluation in the index AM itself (assuming that there is an opclass for nbtree to use in the first place, which there might not be in the case of any non-key INCLUDE columns). With the patch, you can do all this. And I think that you can derisk it without resorting to the overly conservative approach of limiting ourselves to non-key columns from INCLUDE indexes. To summarize: As Markus says on the same page. "Index filter predicates give a false sense of safety; even though an index is used, the performance degrades rapidly on a growing data volume or system load". That's essentially what I want to avoid here. I'm much less concerned about competition between what are really "Index Filter Predicate" subtypes. Allowing that competition to take place is not entirely risk-free, of course, but it seems about as risky as anything else in this area. [1] https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us -- Peter Geoghegan
On 8/2/23 02:50, Peter Geoghegan wrote: > On Mon, Jul 24, 2023 at 11:59 AM Peter Geoghegan <pg@bowt.ie> wrote: >>> That might be true but I'm not sure what to do about that unless we >>> incorporate some "robustness" measure into the costing. If every >>> measure we have says one plan is better, don't we have to choose it? >> >> I'm mostly concerned about the possibility itself -- it's not a matter >> of tuning the cost. I agree that that approach would probably be >> hopeless. > > This seems related to the fact that EXPLAIN doesn't expose the > difference between what Markus Winand calls "Access Predicates" and > "Index Filter Predicates", as explained here: > > https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates > > That is, both "Access Predicates" and "Index Filter Predicates" are > shown after an "Index Cond: " entry in Postgres EXPLAIN output, in > general. Even though these are two very different things. I believe > that the underlying problem for the implementation (the reason why we > can't easily break this out further in EXPLAIN output) is that we > don't actually know what kind of predicate it is ourselves -- at least > not until execution time. We wait until then to do nbtree > preprocessing/scan setup. Though perhaps we should do more of this > during planning instead [1], for several reasons (fixing this is just > one of those reasons). > How come we don't know that until the execution time? Surely when building the paths/plans, we match the clauses to the index keys, no? Or are you saying that just having a scan key is not enough for it to be "access predicate"? Anyway, this patch is mostly about "Index Cond" mixing two types of predicates. But the patch is really about "Filter" predicates - moving some of them from table to index. So quite similar to the "index filter predicates" except that those are handled by the index AM. > The risk to "robustness" for cases like the one I drew attention to on > this thread would probably have been obvious all along if EXPLAIN > output were more like what Markus would have us do -- he certainly has > a good point here, in general. > > Breaking things out in EXPLAIN output along these lines might also > give us a better general sense of when a similar plan shift like this > was actually okay -- even according to something like my > non-traditional "query robustness" criteria. It's much harder for me > to argue that a shift in plans from what Markus calls an "Index Filter > Predicate" to what the patch will show under "Index Filter:" is a > novel new risk. That would be a much less consequential difference, > because those two things are fairly similar anyway. > But differentiating between access and filter predicates (at the index AM level) seems rather independent of what this patch aims to do. FWIW I agree we should make the differences visible in the explain. That seems fairly useful for non-trivial index access paths, and it does not change the execution at all. I think it'd be fine to do that only for VERBOSE mode, and only for EXPLAIN ANALYZE (if we only do this at execution time for now). > Besides, such a shift in plan would have to "legitimately win" for > things to work out like this. If we're essentially picking between two > different subtypes of "Index Filter Predicate", then there can't be > the same weird second order effects that we see when an "Access > Predicate" is out-competed by an "Index Filter Predicate". It's > possible that expression evaluation of a small-ish conjunctive > predicate like "Index Filter: ((tenthous = 1) OR (tenthous = 3) OR > (tenthous = 42))" will be faster than a natively executed SAOP. You > can't do that kind of expression evaluation in the index AM itself > (assuming that there is an opclass for nbtree to use in the first > place, which there might not be in the case of any non-key INCLUDE > columns). With the patch, you can do all this. And I think that you > can derisk it without resorting to the overly conservative approach of > limiting ourselves to non-key columns from INCLUDE indexes. > I'm not following. Why couldn't there be some second-order effects? Maybe it's obvious / implied by something said earlier, but I think every time we decide between multiple choices, there's a risk of picking wrong. Anyway, is this still about this patch or rather about your SAOP patch? > To summarize: As Markus says on the same page. "Index filter > predicates give a false sense of safety; even though an index is used, > the performance degrades rapidly on a growing data volume or system > load". That's essentially what I want to avoid here. I'm much less > concerned about competition between what are really "Index Filter > Predicate" subtypes. Allowing that competition to take place is not > entirely risk-free, of course, but it seems about as risky as anything > else in this area. > True. IMHO the danger or "index filter" predicates is that people assume index conditions eliminate large parts of the index - which is not necessarily the case here. If we can improve this, cool. But again, this is not what this patch does, right? It's about moving stuff from "table filter" to "index filter". And those clauses were not matched to the index AM at all, so it's not really relevant to the discussion about different subtypes of predicates. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > How come we don't know that until the execution time? Surely when > building the paths/plans, we match the clauses to the index keys, no? Or > are you saying that just having a scan key is not enough for it to be > "access predicate"? In principle we can and probably should recognize the difference between "Access Predicates" and "Index Filter Predicates" much earlier on, during planning. Probably when index paths are first built. It seems quite likely that there will turn out to be at least 2 or 3 reasons to do this. The EXPLAIN usability issue might be enough of a reason on its own, though. > Anyway, this patch is mostly about "Index Cond" mixing two types of > predicates. But the patch is really about "Filter" predicates - moving > some of them from table to index. So quite similar to the "index filter > predicates" except that those are handled by the index AM. I understand that that's how the patch is structured. It is nevertheless possible (as things stand) that the patch will make the planner shift from a plan that uses "Access Predicates" to the maximum extent possible when scanning a composite index, to a similar plan that has a similar index scan, for the same index, but with fewer "Access Predicates" in total. In effect, the patched planner will swap one type of predicate for another type because doing so enables the executor to scan an index once instead of scanning it several times. I don't dispute the fact that this can only happen when the planner believes (with good reason) that the expected cost will be lower. But I maintain that there is a novel risk to be concerned about, which is meaningfully distinct from the general risk of regressions that comes from making just about any change to the planner. The important principle here is that we should have a strong bias in the direction of making quals into true "Access Predicates" whenever practical. Yeah, technically the patch doesn't directly disrupt how existing index paths get generated. But it does have the potential to disrupt it indirectly, by providing an alternative very-similar index path that's likely to outcompete the existing one in these cases. I think that we should have just one index path that does everything well instead. > But differentiating between access and filter predicates (at the index > AM level) seems rather independent of what this patch aims to do. My concern is directly related to the question of "access predicates versus filter predicates", and the planner's current ignorance on which is which. The difference may not matter too much right now, but ISTM that your patch makes it matter a lot more. And so in that indirect sense it does seem relevant. The planner has always had a strong bias in the direction of making clauses that can be index quals into index quals, rather than filter predicates. It makes sense to do that even when they aren't very selective. This is a similar though distinct principle. It's awkward to discuss the issue, since we don't really have official names for these things just yet (I'm just going with what Markus calls them for now). And because there is more than one type of "index filter predicate" in play with the patch (namely those in the index AM, and those in the index scan executor node). But my concern boils down to access predicates being strictly better than equivalent index filter predicates. > I'm not following. Why couldn't there be some second-order effects? > Maybe it's obvious / implied by something said earlier, but I think > every time we decide between multiple choices, there's a risk of picking > wrong. Naturally, I agree that some amount of risk is inherent. I believe that the risk I have identified is qualitatively different to the standard, inherent kind of risk. > Anyway, is this still about this patch or rather about your SAOP patch? It's mostly not about my SAOP patch. It's just that my SAOP patch will do exactly the right thing in this particular case (at least once combined with Alena Rybakina's OR-to-SAOP transformation patch). It is therefore quite clear that a better plan is possible in principle. Clearly we *can* have the benefit of only one single index scan (i.e. no BitmapOr node combining multiple similar index scans), without accepting any novel new risk to get that benefit. We should just have one index path that does it all, while avoiding duplicative index paths that embody a distinction that really shouldn't exist -- it should be dealt with at runtime instead. > True. IMHO the danger or "index filter" predicates is that people assume > index conditions eliminate large parts of the index - which is not > necessarily the case here. If we can improve this, cool. I think that we'd find a way to use the same information in the planner, too. It's not just users that should care about the difference. And I don't think that it's particularly likely to be limited to SAOP/MDAM stuff. As Markus said, we're talking about an important general principle here. > But again, this is not what this patch does, right? It's about moving > stuff from "table filter" to "index filter". And those clauses were not > matched to the index AM at all, so it's not really relevant to the > discussion about different subtypes of predicates. I understand that what I've said is not particularly helpful. At least not on its own. You quite naturally won't want to tie the fate of this patch to my SAOP patch, which is significantly more complicated. I do think that my concern about this being a novel risk needs to be carefully considered. Maybe it's possible to address my concern outside of the context of my own SAOP patch. That would definitely be preferable all around. But "access predicates versus filter predicates" seems important and relevant, either way. -- Peter Geoghegan
On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg@bowt.ie> wrote: > I don't dispute the fact that this can only happen when the planner > believes (with good reason) that the expected cost will be lower. But > I maintain that there is a novel risk to be concerned about, which is > meaningfully distinct from the general risk of regressions that comes > from making just about any change to the planner. The important > principle here is that we should have a strong bias in the direction > of making quals into true "Access Predicates" whenever practical. > > Yeah, technically the patch doesn't directly disrupt how existing > index paths get generated. But it does have the potential to disrupt > it indirectly, by providing an alternative very-similar index path > that's likely to outcompete the existing one in these cases. I think > that we should have just one index path that does everything well > instead. You can see this for yourself, quite easily. Start by running the relevant query from the regression tests, which is: SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query slightly faster, as expected. I see 7 buffer hits for the bitmap index scan plan on master, versus only 4 buffer hits for the patch's index scan. Obviously, this is because we go from multiple index scans (multiple bitmap index scans) to only one. But, if I run this insert statement and try the same thing again, things look very different: insert into tenk1 (thousand, tenthous) select 42, i from generate_series(43, 1000) i; (Bear in mind that we've inserted rows that don't actually need to be selected by the query in question.) Now the master branch's plan works in just the same way as before -- it has exactly the same overhead (7 buffer hits). Whereas the patch still gets the same risky plan -- which now blows up. The plan now loses by far more than it could ever really hope to win by: 336 buffer hits. (It could be a lot higher than this, even, but you get the picture.) Sure, it's difficult to imagine a very general model that captures this sort of risk, in the general case. But you don't need a degree in actuarial science to understand that it's inherently a bad idea to juggle chainsaws -- no matter what your general risk tolerance happens to be. Some things you just don't do. -- Peter Geoghegan
On 8/3/23 07:26, Peter Geoghegan wrote: > On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg@bowt.ie> wrote: >> I don't dispute the fact that this can only happen when the planner >> believes (with good reason) that the expected cost will be lower. But >> I maintain that there is a novel risk to be concerned about, which is >> meaningfully distinct from the general risk of regressions that comes >> from making just about any change to the planner. The important >> principle here is that we should have a strong bias in the direction >> of making quals into true "Access Predicates" whenever practical. >> OK, preference for access predicates sounds like a reasonable principle. >> Yeah, technically the patch doesn't directly disrupt how existing >> index paths get generated. But it does have the potential to disrupt >> it indirectly, by providing an alternative very-similar index path >> that's likely to outcompete the existing one in these cases. I think >> that we should have just one index path that does everything well >> instead. > > You can see this for yourself, quite easily. Start by running the > relevant query from the regression tests, which is: > > SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous > = 3 OR tenthous = 42); > > EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query > slightly faster, as expected. I see 7 buffer hits for the bitmap index > scan plan on master, versus only 4 buffer hits for the patch's index > scan. Obviously, this is because we go from multiple index scans > (multiple bitmap index scans) to only one. > > But, if I run this insert statement and try the same thing again, > things look very different: > > insert into tenk1 (thousand, tenthous) select 42, i from > generate_series(43, 1000) i; > > (Bear in mind that we've inserted rows that don't actually need to be > selected by the query in question.) > > Now the master branch's plan works in just the same way as before -- > it has exactly the same overhead (7 buffer hits). Whereas the patch > still gets the same risky plan -- which now blows up. The plan now > loses by far more than it could ever really hope to win by: 336 buffer > hits. (It could be a lot higher than this, even, but you get the > picture.) Are you sure? Because if I try this on master (62e9af4c without any patches), I get this: regression=# explain (verbose, analyze, buffers) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..1416.32 rows=1 width=244) (actual time=0.078..1.361 rows=1 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Index Cond: (tenk1.thousand = 42) Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Rows Removed by Filter: 967 Buffers: shared hit=335 Planning Time: 0.225 ms Execution Time: 1.430 ms (8 rows) So not sure about the claim that master works fine as before. OTOH, the patched branch (with "my" patch 2023/07/16, just to be clear) does this: QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..23.57 rows=1 width=244) (actual time=0.077..0.669 rows=1 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Index Cond: (tenk1.thousand = 42) Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Rows Removed by Index Recheck: 967 Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Buffers: shared hit=7 Planning Time: 0.211 ms Execution Time: 0.722 ms (9 rows) Which is just the 7 buffers ... Did I do something wrong? > > Sure, it's difficult to imagine a very general model that captures > this sort of risk, in the general case. But you don't need a degree in > actuarial science to understand that it's inherently a bad idea to > juggle chainsaws -- no matter what your general risk tolerance happens > to be. Some things you just don't do. > Oh come on. I've been juggling chainsaws (lit on fire!) since I was a little boy! There's nothing risky about it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/3/23 03:32, Peter Geoghegan wrote: > On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> How come we don't know that until the execution time? Surely when >> building the paths/plans, we match the clauses to the index keys, no? Or >> are you saying that just having a scan key is not enough for it to be >> "access predicate"? > > In principle we can and probably should recognize the difference > between "Access Predicates" and "Index Filter Predicates" much earlier > on, during planning. Probably when index paths are first built. > > It seems quite likely that there will turn out to be at least 2 or 3 > reasons to do this. The EXPLAIN usability issue might be enough of a > reason on its own, though. > OK >> Anyway, this patch is mostly about "Index Cond" mixing two types of >> predicates. But the patch is really about "Filter" predicates - moving >> some of them from table to index. So quite similar to the "index filter >> predicates" except that those are handled by the index AM. > > I understand that that's how the patch is structured. It is > nevertheless possible (as things stand) that the patch will make the > planner shift from a plan that uses "Access Predicates" to the maximum > extent possible when scanning a composite index, to a similar plan > that has a similar index scan, for the same index, but with fewer > "Access Predicates" in total. In effect, the patched planner will swap > one type of predicate for another type because doing so enables the > executor to scan an index once instead of scanning it several times. > That seems very much like something the costing is meant to handle, no? I mean, surely "access predicate" and "filter" should affect the cost differently, with "filter" being more expensive (and table filter being more expensive than index filter). I was not sure why would the patch affect the plan choice, as it does not really affect the index predicates (passed to the AM) at all. But I think I get the point now - the thing is in having two different index paths, like: PATH #1: access predicates (A,B) PATH #2: access predicate A, index filter B AFAICS the assumption is that path #1 should be better, as it has two proper access predicates. But maybe if we add another condition C, it might end up like this: PATH #1: access predicates (A,B), table filter C PATH #2: access predicate A, index filter (B,C) and #2 will end up winning. I still think this seems more like a costing issue (and I'd guess we may already have similar cases for index-only scans). IMO we can either consider the different predicate types during costing. Sure, then we have the usual costing risks, but that's expected. Or we could just ignore this during costing entirely (and ditch the costing from the patch). Then the cost doesn't change, and we don't have any new risks. > I don't dispute the fact that this can only happen when the planner > believes (with good reason) that the expected cost will be lower. But > I maintain that there is a novel risk to be concerned about, which is > meaningfully distinct from the general risk of regressions that comes > from making just about any change to the planner. The important > principle here is that we should have a strong bias in the direction > of making quals into true "Access Predicates" whenever practical. > > Yeah, technically the patch doesn't directly disrupt how existing > index paths get generated. But it does have the potential to disrupt > it indirectly, by providing an alternative very-similar index path > that's likely to outcompete the existing one in these cases. I think > that we should have just one index path that does everything well > instead. > Yeah, I think that's the scenario I described above ... >> But differentiating between access and filter predicates (at the index >> AM level) seems rather independent of what this patch aims to do. > > My concern is directly related to the question of "access predicates > versus filter predicates", and the planner's current ignorance on > which is which. The difference may not matter too much right now, but > ISTM that your patch makes it matter a lot more. And so in that > indirect sense it does seem relevant. > I'm not sure my patch makes it matter a lot more. Yes, moving a filter from the table to the index may lower the scan cost, but that can happen for a lot of other reasons ... > The planner has always had a strong bias in the direction of making > clauses that can be index quals into index quals, rather than filter > predicates. It makes sense to do that even when they aren't very > selective. This is a similar though distinct principle. > > It's awkward to discuss the issue, since we don't really have official > names for these things just yet (I'm just going with what Markus calls > them for now). And because there is more than one type of "index > filter predicate" in play with the patch (namely those in the index > AM, and those in the index scan executor node). But my concern boils > down to access predicates being strictly better than equivalent index > filter predicates. > I like the discussion, but it feels a bit abstract (and distant from what the patch aimed to do) and I have trouble turning it into something actionable. >> I'm not following. Why couldn't there be some second-order effects? >> Maybe it's obvious / implied by something said earlier, but I think >> every time we decide between multiple choices, there's a risk of picking >> wrong. > > Naturally, I agree that some amount of risk is inherent. I believe > that the risk I have identified is qualitatively different to the > standard, inherent kind of risk. > >> Anyway, is this still about this patch or rather about your SAOP patch? > > It's mostly not about my SAOP patch. It's just that my SAOP patch will > do exactly the right thing in this particular case (at least once > combined with Alena Rybakina's OR-to-SAOP transformation patch). It is > therefore quite clear that a better plan is possible in principle. > Clearly we *can* have the benefit of only one single index scan (i.e. > no BitmapOr node combining multiple similar index scans), without > accepting any novel new risk to get that benefit. We should just have > one index path that does it all, while avoiding duplicative index > paths that embody a distinction that really shouldn't exist -- it > should be dealt with at runtime instead. > Does this apply to the index scan vs. index-only scans too? That is, do you think we should we have just one index-scan path, doing index-only stuff when possible? >> True. IMHO the danger or "index filter" predicates is that people assume >> index conditions eliminate large parts of the index - which is not >> necessarily the case here. If we can improve this, cool. > > I think that we'd find a way to use the same information in the > planner, too. It's not just users that should care about the > difference. And I don't think that it's particularly likely to be > limited to SAOP/MDAM stuff. As Markus said, we're talking about an > important general principle here. > If we want/need to consider this during costing, this seems necessary. >> But again, this is not what this patch does, right? It's about moving >> stuff from "table filter" to "index filter". And those clauses were not >> matched to the index AM at all, so it's not really relevant to the >> discussion about different subtypes of predicates. > > I understand that what I've said is not particularly helpful. At least > not on its own. You quite naturally won't want to tie the fate of this > patch to my SAOP patch, which is significantly more complicated. I do > think that my concern about this being a novel risk needs to be > carefully considered. > > Maybe it's possible to address my concern outside of the context of my > own SAOP patch. That would definitely be preferable all around. But > "access predicates versus filter predicates" seems important and > relevant, either way. > If we can form some sort of plan what needs to be done (both for my patch and for the SAOP patch), I'm willing to work on it ... But it's not quite clear to me what the requirements are. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Which is just the 7 buffers ... > > Did I do something wrong? I think that it might have something to do with your autovacuum settings. Note that the plan that you've shown for the master branch isn't the same one that appears in src/test/regress/expected/create_index.out for the master branch -- that plan (the BitmapOr plan) was my baseline case for master. That said, I am a little surprised that you could ever get the plan that you showed for master (without somehow unnaturally forcing it). It's almost the same plan that your patch gets, but much worse. Your patch can use an index filter, but master uses a table filter instead. While the plan used by the patch is risky in the way that I described, the plan you saw for master is just horrible. I mean, it's not even risky -- it seems almost certain to lose. Whereas at least the plan from the patch really is cheaper than the BitmapOr plan (the master branch plan from create_index.out) on average. -- Peter Geoghegan
On 8/3/23 18:47, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Which is just the 7 buffers ... >> >> Did I do something wrong? > > I think that it might have something to do with your autovacuum > settings. Note that the plan that you've shown for the master branch > isn't the same one that appears in > src/test/regress/expected/create_index.out for the master branch -- > that plan (the BitmapOr plan) was my baseline case for master. > > That said, I am a little surprised that you could ever get the plan > that you showed for master (without somehow unnaturally forcing it). > It's almost the same plan that your patch gets, but much worse. Your > patch can use an index filter, but master uses a table filter instead. > Well I did force it - I thought we're talking about regular index scans, so I disabled bitmap scans. Without doing that I get the BitmapOr plan like you. However, with the patch I get this behavior (starting from a "fresh" state right after "make installcheck") QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..8.38 rows=1 width=244) (actual time=0.033..0.036 rows=1 loops=1) Index Cond: (thousand = 42) Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Rows Removed by Index Recheck: 9 Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Buffers: shared read=4 Planning: Buffers: shared hit=119 read=32 Planning Time: 0.673 ms Execution Time: 0.116 ms (10 rows) insert into tenk1 (thousand, tenthous) select 42, i from generate_series(43, 1000) i; QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..8.38 rows=1 width=244) (actual time=0.038..0.605 rows=1 loops=1) Index Cond: (thousand = 42) Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Rows Removed by Index Recheck: 967 Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Buffers: shared hit=336 Planning Time: 0.114 ms Execution Time: 0.632 ms (8 rows) analyze tenk1; QUERY PLAN ------------------------------------------------------------------------ Bitmap Heap Scan on tenk1 (cost=12.89..16.91 rows=1 width=244) (actual time=0.016..0.019 rows=1 loops=1) Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42))) Heap Blocks: exact=1 Buffers: shared hit=7 -> BitmapOr ... Buffers: shared hit=6 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 1)) Buffers: shared hit=2 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 3)) Buffers: shared hit=2 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 42)) Buffers: shared hit=2 Planning Time: 0.344 ms Execution Time: 0.044 ms (19 rows) vacuum analyze tenk1; QUERY PLAN ------------------------------------------------------------------------ Bitmap Heap Scan on tenk1 (cost=12.89..16.91 rows=1 width=244) (actual time=0.017..0.019 rows=1 loops=1) Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42))) Heap Blocks: exact=1 Buffers: shared hit=7 -> BitmapOr ... Buffers: shared hit=6 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 1)) Buffers: shared hit=2 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 3)) Buffers: shared hit=2 -> Bitmap Index Scan on tenk1_thous_tenthous ... Index Cond: ((thousand = 42) AND (tenthous = 42)) Buffers: shared hit=2 Planning Time: 0.277 ms Execution Time: 0.046 ms (19 rows) set enable_bitmapscan = off; QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..23.57 rows=1 width=244) (actual time=0.042..0.235 rows=1 loops=1) Index Cond: (thousand = 42) Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Rows Removed by Index Recheck: 967 Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42)) Buffers: shared hit=7 Planning Time: 0.119 ms Execution Time: 0.261 ms (8 rows) So yeah, it gets Buffers: shared hit=336 right after the insert, but it seems to be mostly about visibility map (and having to fetch heap tuples), as it disappears after vacuum. There seems to be some increase in cost, so we switch back to the bitmap plan. I haven't looked into that, but I guess there's either some thinko in the costing change, or maybe it's due to correlation. > While the plan used by the patch is risky in the way that I described, > the plan you saw for master is just horrible. I mean, it's not even > risky -- it seems almost certain to lose. Whereas at least the plan > from the patch really is cheaper than the BitmapOr plan (the master > branch plan from create_index.out) on average. > Not sure. I'm a bit confused about what exactly is so risky on the plan produced with the patch. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > I understand that that's how the patch is structured. It is > > nevertheless possible (as things stand) that the patch will make the > > planner shift from a plan that uses "Access Predicates" to the maximum > > extent possible when scanning a composite index, to a similar plan > > that has a similar index scan, for the same index, but with fewer > > "Access Predicates" in total. In effect, the patched planner will swap > > one type of predicate for another type because doing so enables the > > executor to scan an index once instead of scanning it several times. > > > > That seems very much like something the costing is meant to handle, no? > I mean, surely "access predicate" and "filter" should affect the cost > differently, with "filter" being more expensive (and table filter being > more expensive than index filter). I'm not 100% sure that it's not a costing issue, but intuitively it doesn't seem like one. As Goetz Graefe once said, "choice is confusion". It seems desirable to have fewer, better index paths. This is possible whenever there is a way to avoid the index paths that couldn't possibly be better in the first place. Though we must also make sure that there is no real downside -- possibly by teaching the executor to behave adaptively instead of needlessly trusting what the planner says. Turning a plan time decision into a runtime decision seems strictly better. Obviously the planner will always need to be trusted to a significant degree (especially for basic things like join order), but why not avoid it when we can avoid it without any real downsides? Having lots of slightly different index paths with slightly different types of logically equivalent predicates seems highly undesirable, and quite avoidable. ISTM that it should be possible to avoid generating some of these index paths based on static rules that assume that: 1. An "access predicate" is always strictly better than an equivalent "index filter predicate" (for any definition of "index filter predicate" you can think of). 2. An "Index Filter: " is always strictly better than an equivalent "Filter: " (i.e. table filter). The first item is what I've been going on about, of course. The second item is the important principle behind your patch -- and one that I also agree with. I don't see any contradictions here -- these two principles are compatible. I think that we can have it both ways. > AFAICS the assumption is that path #1 should be better, as it has two > proper access predicates. But maybe if we add another condition C, it > might end up like this: > > PATH #1: access predicates (A,B), table filter C > PATH #2: access predicate A, index filter (B,C) > > and #2 will end up winning. Why wouldn't we expect there to also be this path: PATH #3: access predicates (A,B), index filter C And why wouldn't we also expect this other path to always be better? So much better that we don't even need to bother generating PATH #1 and PATH #2 in the first place, even? Right now there are weird reasons why it might not be so -- strange interactions with things like BitmapOr nodes that could make either PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem particularly fundamental to me. We should probably just avoid those plan shapes that have the potential to make PATH #1 and PATH #2 slightly cheaper, due only to perverse interactions. > I like the discussion, but it feels a bit abstract (and distant from > what the patch aimed to do) and I have trouble turning it into something > actionable. I think that I have gotten a lot out of this discussion -- it has made my thinking about this stuff more rigorous. I really appreciate that. > Does this apply to the index scan vs. index-only scans too? That is, do > you think we should we have just one index-scan path, doing index-only > stuff when possible? I think so, yes. But index-only scans don't appear under BitmapOr nodes, so right now I can't think of an obvious way of demonstrating that this is true. Maybe it accidentally doesn't come up with index-only scans in practice, but the same underlying principles should be just as true. > If we can form some sort of plan what needs to be done (both for my > patch and for the SAOP patch), I'm willing to work on it ... But it's > not quite clear to me what the requirements are. I do hope to have more concrete proposals soon. Thanks for being patient. For what it's worth, I actually think that there is a good chance that I'll end up relying on what you've done here to make certain things I want to do with the SOAP patch okay. It would be rather convenient to be able to handle some of the SAOP safety issues without needing any table filters (just index filters), in some corner cases. I think that what you're doing here makes a lot of sense. FWIW, I am already personally invested in the success of your patch. -- Peter Geoghegan
On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Not sure. I'm a bit confused about what exactly is so risky on the plan > produced with the patch. It's all about the worst case. In the scenarios that I'm concerned about, we can be quite sure that the saving from not using a BitmapOr will be fairly low -- the cost of not having to repeat the same index page accesses across several similar index scans is, at best, some small multiple of the would-be number of index scans that the BitmapOr plan gets. We can be certain that the possible benefits are fixed and low. This is always true; presumably the would-be BitmapOr plan can never have all that many index scans. And we always know how many index scans a BitmapOr plan would use up front. On the other hand, the possible downsides have no obvious limit. So even if we're almost certain to win on average, we only have to be unlucky once to lose all we gained before that point. As a general rule, we want the index AM to have all the context required to terminate its scan at the earliest possible opportunity. This is enormously important in the worst case. It's easier for me to make this argument because I know that we don't really need to make any trade-off here. But even if that wasn't the case, I'd probably arrive at the same general conclusion. Importantly, it isn't possible to make a similar argument that works in the opposite direction -- IMV that's the difference between this flavor of riskiness, and the inevitable riskiness that comes with any planner change. In other words, your patch isn't going to win by an unpredictably high amount. Not in the specific scenarios that I'm focussed on here, with a BitmapOr + multiple index scans getting displaced. The certainty about the upside is just as important as the uncertainty about the downside. The huge asymmetry matters, and is fairly atypical. If, somehow, there was less certainty about the possible upside, then my argument wouldn't really work. -- Peter Geoghegan
On 8/3/23 20:50, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >>> I understand that that's how the patch is structured. It is >>> nevertheless possible (as things stand) that the patch will make the >>> planner shift from a plan that uses "Access Predicates" to the maximum >>> extent possible when scanning a composite index, to a similar plan >>> that has a similar index scan, for the same index, but with fewer >>> "Access Predicates" in total. In effect, the patched planner will swap >>> one type of predicate for another type because doing so enables the >>> executor to scan an index once instead of scanning it several times. >>> >> >> That seems very much like something the costing is meant to handle, no? >> I mean, surely "access predicate" and "filter" should affect the cost >> differently, with "filter" being more expensive (and table filter being >> more expensive than index filter). > > I'm not 100% sure that it's not a costing issue, but intuitively it > doesn't seem like one. > > As Goetz Graefe once said, "choice is confusion". It seems desirable > to have fewer, better index paths. This is possible whenever there is > a way to avoid the index paths that couldn't possibly be better in the > first place. Though we must also make sure that there is no real > downside -- possibly by teaching the executor to behave adaptively > instead of needlessly trusting what the planner says. Turning a plan > time decision into a runtime decision seems strictly better. > Sure, having more choices means a risk of making mistakes. But does simply postponing the choices to runtime actually solves this? Even if you're able to do a perfect choice at that point, it's only works for that particular path type (e.g. index scan). You still need to cost it somehow, to decide which path type to pick ... Perhaps my mental model of what you intend to do is wrong? > Obviously the planner will always need to be trusted to a significant > degree (especially for basic things like join order), but why not > avoid it when we can avoid it without any real downsides? Having lots > of slightly different index paths with slightly different types of > logically equivalent predicates seems highly undesirable, and quite > avoidable. > > ISTM that it should be possible to avoid generating some of these > index paths based on static rules that assume that: > > 1. An "access predicate" is always strictly better than an equivalent > "index filter predicate" (for any definition of "index filter > predicate" you can think of). Yes, probably. > 2. An "Index Filter: " is always strictly better than an equivalent > "Filter: " (i.e. table filter). Not sure about this. As I explained earlier, I think it needs to consider the cost/selectivity of the predicate, and fraction of allvisible pages. But yes, it's a static decision. > > The first item is what I've been going on about, of course. The second > item is the important principle behind your patch -- and one that I > also agree with. I don't see any contradictions here -- these two > principles are compatible. I think that we can have it both ways. > >> AFAICS the assumption is that path #1 should be better, as it has two >> proper access predicates. But maybe if we add another condition C, it >> might end up like this: >> >> PATH #1: access predicates (A,B), table filter C >> PATH #2: access predicate A, index filter (B,C) >> >> and #2 will end up winning. > > Why wouldn't we expect there to also be this path: > > PATH #3: access predicates (A,B), index filter C > Why? Maybe the index doesn't have all the columns needed for condition C, in which case it has to be evaluated as a table filter. (I didn't say it explicitly, but this assumes those paths are not for the same index. If they were, then PATH #3 would have to exist too.) > And why wouldn't we also expect this other path to always be better? > So much better that we don't even need to bother generating PATH #1 > and PATH #2 in the first place, even? > Yes, I agree that path would likely be superior to the other paths. > Right now there are weird reasons why it might not be so -- strange > interactions with things like BitmapOr nodes that could make either > PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem > particularly fundamental to me. We should probably just avoid those > plan shapes that have the potential to make PATH #1 and PATH #2 > slightly cheaper, due only to perverse interactions. > >> I like the discussion, but it feels a bit abstract (and distant from >> what the patch aimed to do) and I have trouble turning it into something >> actionable. > > I think that I have gotten a lot out of this discussion -- it has made > my thinking about this stuff more rigorous. I really appreciate that. > I feel a bit like the rubber duck from [1], but I'm OK with that ;-) >> Does this apply to the index scan vs. index-only scans too? That is, do >> you think we should we have just one index-scan path, doing index-only >> stuff when possible? > > I think so, yes. But index-only scans don't appear under BitmapOr > nodes, so right now I can't think of an obvious way of demonstrating > that this is true. Maybe it accidentally doesn't come up with > index-only scans in practice, but the same underlying principles > should be just as true. > >> If we can form some sort of plan what needs to be done (both for my >> patch and for the SAOP patch), I'm willing to work on it ... But it's >> not quite clear to me what the requirements are. > > I do hope to have more concrete proposals soon. Thanks for being patient. > > For what it's worth, I actually think that there is a good chance that > I'll end up relying on what you've done here to make certain things I > want to do with the SOAP patch okay. It would be rather convenient to > be able to handle some of the SAOP safety issues without needing any > table filters (just index filters), in some corner cases. I think that > what you're doing here makes a lot of sense. FWIW, I am already > personally invested in the success of your patch. > Great! I'm happy those bits are likely useful for what you're doing. regards [1] https://en.wikipedia.org/wiki/Rubber_duck_debugging -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/3/23 21:21, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Not sure. I'm a bit confused about what exactly is so risky on the plan >> produced with the patch. > > It's all about the worst case. In the scenarios that I'm concerned > about, we can be quite sure that the saving from not using a BitmapOr > will be fairly low -- the cost of not having to repeat the same index > page accesses across several similar index scans is, at best, some > small multiple of the would-be number of index scans that the BitmapOr > plan gets. We can be certain that the possible benefits are fixed and > low. This is always true; presumably the would-be BitmapOr plan can > never have all that many index scans. And we always know how many > index scans a BitmapOr plan would use up front. > When you say "index page accesses" do you mean accesses to index pages, or accesses to heap pages from the index scan? Because my patch is all about reducing the heap pages, which are usually the expensive part of the index scan. But you're right the "index scan" with index filter may access more index pages, because it has fewer "access predicates". I don't quite see that with the tenk1 query we've been discussing (the extra buffers were due to non-allvisible heap pages), but I guess that's possible. > On the other hand, the possible downsides have no obvious limit. So > even if we're almost certain to win on average, we only have to be > unlucky once to lose all we gained before that point. As a general > rule, we want the index AM to have all the context required to > terminate its scan at the earliest possible opportunity. This is > enormously important in the worst case. > Yeah, I agree there's some asymmetry in the risk/benefit. It's not unlike e.g. seqscan vs. index scan, where the index scan can't really save more than what the seqscan costs, but it can get (almost) arbitrarily expensive. > It's easier for me to make this argument because I know that we don't > really need to make any trade-off here. But even if that wasn't the > case, I'd probably arrive at the same general conclusion. > > Importantly, it isn't possible to make a similar argument that works > in the opposite direction -- IMV that's the difference between this > flavor of riskiness, and the inevitable riskiness that comes with any > planner change. In other words, your patch isn't going to win by an > unpredictably high amount. Not in the specific scenarios that I'm > focussed on here, with a BitmapOr + multiple index scans getting > displaced. > True. It probably can't beat BitmapOr plan if it means moving access predicate into index filter (or even worse a table filter). > The certainty about the upside is just as important as the uncertainty > about the downside. The huge asymmetry matters, and is fairly > atypical. If, somehow, there was less certainty about the possible > upside, then my argument wouldn't really work. > regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 3, 2023 at 2:46 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Sure, having more choices means a risk of making mistakes. But does > simply postponing the choices to runtime actually solves this? It solves that one problem, yes. This is particularly important in cases where we would otherwise get truly pathological performance -- not just mediocre or bad performance. Most of the time, mediocre performance isn't such a big deal. Having a uniform execution strategy for certain kinds of index scans is literally guaranteed to beat a static strategy in some cases. For example, with some SAOP scans (with my patch), we'll have to skip lots of the index, and then scan lots of the index -- just because of a bunch of idiosyncratic details that are almost impossible to predict using statistics. Such an index scan really shouldn't be considered "moderately skippy". It is not the average of two opposite things -- it is more like two separate things that are opposites. It's true that even this "moderately skippy" case needs to be costed. But if we can entirely eliminate variation that really looks like noise, it should be more likely that we'll get the cheapest plan. Costing may not be any easier, but getting the cheapest plan might be. > > 1. An "access predicate" is always strictly better than an equivalent > > "index filter predicate" (for any definition of "index filter > > predicate" you can think of). > > Yes, probably. > > > 2. An "Index Filter: " is always strictly better than an equivalent > > "Filter: " (i.e. table filter). > > Not sure about this. As I explained earlier, I think it needs to > consider the cost/selectivity of the predicate, and fraction of > allvisible pages. But yes, it's a static decision. What I said is limited to "equivalent" predicates. If it's just not possible to get an "access predicate" at all, then my point 1 doesn't apply. Similarly, if it just isn't possible to get an "Index Filter" (only a table filter), then my point #2 doesn't apply. This does mean that there could still be competition between multiple index paths for the same composite index, but I have no objections to that -- it makes sense to me because it isn't duplicative in the way that I'm concerned about. It just isn't possible to delay anything until run time in this scenario, so nothing that I've said should apply. > (I didn't say it explicitly, but this assumes those paths are not for > the same index. If they were, then PATH #3 would have to exist too.) That changes everything, then. If they're completely different indexes then nothing I've said should apply. I can't think of a way to avoid making an up-front commitment to that in the planner (I'm thinking of far more basic things than that). > I feel a bit like the rubber duck from [1], but I'm OK with that ;-) Not from my point of view. Besides, even when somebody says that they just don't understand what I'm saying at all (which wasn't ever fully the case here), that is often useful feedback in itself. -- Peter Geoghegan
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > When you say "index page accesses" do you mean accesses to index pages, > or accesses to heap pages from the index scan? Yes, that's exactly what I mean. Note that that's the dominant cost for the original BitmapOr plan. As I said upthread, the original BitmapOr plan has 7 buffer hits. The breakdown is 1 single heap page access, 3 root page accesses, and 3 leaf page accesses. There is only 1 heap page access because only 1 out of the 3 index scans that feed into the BitmapOr actually end up finding any matching rows in the index. In short, the dominant cost here is index page accesses. It's a particularly good case for my SAOP patch! > Because my patch is all about reducing the heap pages, which are usually > the expensive part of the index scan. But you're right the "index scan" > with index filter may access more index pages, because it has fewer > "access predicates". The fact is that your patch correctly picks the cheapest plan, which is kinda like a risky version of the plan that my SAOP patch would pick -- it is cheaper for the very same reason. I understand that that's not your intention at all, but this is clearly what happened. That's what I meant by "weird second order effects". To me, it really does kinda look like your patch accidentally discovered a plan that's fairly similar to the plan that my SAOP patch would have found by design! Perhaps I should have been clearer on this point earlier. (If you're only now seeing this for yourself for the first time, then...oops. No wonder you were confused about which patch it was I was going on about!) > I don't quite see that with the tenk1 query we've been discussing (the > extra buffers were due to non-allvisible heap pages), but I guess that's > possible. The extra buffer hits occur because I made them occur by inserting new tuples where thousand = 42. Obviously, I did it that way because I had a point that I wanted to make. Obviously, there wouldn't have been any notable regression from your patch at all if I had (say) inserted tuples where thousand = 43 instead. (Not for the original "42" query, at least.) That's part of the problem, as I see it. Local changes like that can have outsized impact on individual queries, even though there is no inherent reason to expect it. How can statistics reliably guide the planner here? Statistics are supposed to be a summary of the whole attribute, that allow us to make various generalizations during planning. But this plan leaves us sensitive to relatively small changes in one particular "thousand" grouping, with potentially outsized impact. And, this can happen very suddenly, because it's so "local". Making this plan perform robustly just doesn't seem to be one of the things that statistics can be expected to help us with very much. -- Peter Geoghegan
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Because my patch is all about reducing the heap pages, which are usually > the expensive part of the index scan. But you're right the "index scan" > with index filter may access more index pages, because it has fewer > "access predicates". It's not so much the unnecessary index page accesses that bother me. At least I didn't push that aspect very far when I constructed my adversarial test case -- index pages were only a small part of the overall problem. (I mean the problem at runtime, in the executor. The planner expected to save a small number of leaf page accesses, which was kinda, sorta the problem there -- though the planner might have technically still been correct about that, and can't have been too far wrong in any case.) The real problem that my adversarial case seemed to highlight was a problem of extra heap page accesses. The tenk1 table's VM is less than one page in size, so how could it have been VM buffer hits? Sure, there were no "table filters" involved -- only "index filters". But even "index filters" require heap access when the page isn't marked all-visible in the VM. That problem just cannot happen with a similar plan that eliminates the same index tuples within the index AM proper (the index quals don't even have to be "access predicates" for this to apply, either). Of course, we never need to check the visibility of index tuples just to be able to consider eliminating them via nbtree search scan keys/index quals -- and so there is never any question of heap/VM access for tuples that don't pass index quals. Not so for "index filters", where there is at least some chance of accessing the heap proper just to be able to eliminate non-matches. While I think that it makes sense to assume that "index filters" are strictly better than "table filters" (assuming they're directly equivalent in that they contain the same clause), they're not *reliably* any better. So "index filters" are far from being anywhere near as good as an equivalent index qual (AFAICT we should always assume that that's true). This is true of index quals generally -- this advantage is *not* limited to "access predicate index quals". (It is most definitely not okay for "index filters" to displace equivalent "access predicate index quals", but it's also not really okay to allow them to displace equivalent "index filter predicate index quals" -- the latter case is less bad, but AFAICT they both basically aren't acceptable "substitutions".) -- Peter Geoghegan
On 8/4/23 02:08, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> When you say "index page accesses" do you mean accesses to index pages, >> or accesses to heap pages from the index scan? > > Yes, that's exactly what I mean. Note that that's the dominant cost > for the original BitmapOr plan. > Well, I presented multiple options, so "yes" doesn't really clarify which of them applies. But my understanding is you meant the index pages accesses. > As I said upthread, the original BitmapOr plan has 7 buffer hits. The > breakdown is 1 single heap page access, 3 root page accesses, and 3 > leaf page accesses. There is only 1 heap page access because only 1 > out of the 3 index scans that feed into the BitmapOr actually end up > finding any matching rows in the index. > > In short, the dominant cost here is index page accesses. It's a > particularly good case for my SAOP patch! > Understood. It makes sense considering the SAOP patch is all about optimizing the index walk / processing fewer pages. >> Because my patch is all about reducing the heap pages, which are usually >> the expensive part of the index scan. But you're right the "index scan" >> with index filter may access more index pages, because it has fewer >> "access predicates". > > The fact is that your patch correctly picks the cheapest plan, which > is kinda like a risky version of the plan that my SAOP patch would > pick -- it is cheaper for the very same reason. I understand that > that's not your intention at all, but this is clearly what happened. > That's what I meant by "weird second order effects". > > To me, it really does kinda look like your patch accidentally > discovered a plan that's fairly similar to the plan that my SAOP patch > would have found by design! Perhaps I should have been clearer on this > point earlier. (If you're only now seeing this for yourself for the > first time, then...oops. No wonder you were confused about which > patch it was I was going on about!) > Thanks. I think I now see the relationship between the plan with my patch and your SAOP patch. It's effectively very similar, except that the responsibilities are split a bit differently. With my patch the OR clause happens outside AM, while the SAOP patch would do that in the AM and also use that to walk the index more efficiently. >> I don't quite see that with the tenk1 query we've been discussing (the >> extra buffers were due to non-allvisible heap pages), but I guess that's >> possible. > > The extra buffer hits occur because I made them occur by inserting new > tuples where thousand = 42. Obviously, I did it that way because I had > a point that I wanted to make. Obviously, there wouldn't have been any > notable regression from your patch at all if I had (say) inserted > tuples where thousand = 43 instead. (Not for the original "42" query, > at least.) > Right. I know where the heap accesses come from, but clearly we're able to skip those when allvisible=true, and we don't need to scan more index pages either. But I guess we could make the data more complex to make this part worse (for my patch, while the SAOP would not be affected). > That's part of the problem, as I see it. Local changes like that can > have outsized impact on individual queries, even though there is no > inherent reason to expect it. How can statistics reliably guide the > planner here? Statistics are supposed to be a summary of the whole > attribute, that allow us to make various generalizations during > planning. But this plan leaves us sensitive to relatively small > changes in one particular "thousand" grouping, with potentially > outsized impact. And, this can happen very suddenly, because it's so > "local". > > Making this plan perform robustly just doesn't seem to be one of the > things that statistics can be expected to help us with very much. > I agree there certainly are cases where the estimates will be off. This is not that different from correlated columns, in fact it's exactly the same issue, I think. But it's also not a novel/unique issue - we should probably do the "usual" thing, i.e. plan based on estimates (maybe with some safety margin), and have some fallback strategy at runtime. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 4, 2023 at 4:47 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Well, I presented multiple options, so "yes" doesn't really clarify > which of them applies. But my understanding is you meant the index pages > accesses. Sorry. Your understanding of what I must have meant before was correct -- your patch picked that plan because it reduced the number of index page accesses significantly. Just like my SAOP patch would have. > > In short, the dominant cost here is index page accesses. It's a > > particularly good case for my SAOP patch! > > > > Understood. It makes sense considering the SAOP patch is all about > optimizing the index walk / processing fewer pages. Actually, some of the most compelling cases for my SAOP patch are those involving heap page access savings, which come from the planner changes. Basically, the nbtree/executor changes make certain index access patterns much more efficient. Which is useful in itself, but often much more useful as an indirect enabler of avoiding heap page accesses by altering other related aspects of a plan in the planner. Sometimes by replacing table filter quals with index quals (the nbtree changes make nbtree a strictly better place for the quals). You know, somewhat like your patch. That's really why I'm so interested in your patch, and its relationship with my own patch, and the BitmapOr issue. If your patch enables some tricks that are really quite similar to the tricks that my own patch enables, then delineating which patch does which exact trick when is surely important for both patches. I actually started out just thinking about index page accesses, before eventually coming to understand that heap page accesses were also very relevant. Whereas you started out just thinking about heap page accesses, and now see some impact from saving on index page accesses. > Thanks. I think I now see the relationship between the plan with my > patch and your SAOP patch. It's effectively very similar, except that > the responsibilities are split a bit differently. With my patch the OR > clause happens outside AM, while the SAOP patch would do that in the AM > and also use that to walk the index more efficiently. Right. That's where my idea of structuring things so that there is only one best choice really comes from. > I agree there certainly are cases where the estimates will be off. This > is not that different from correlated columns, in fact it's exactly the > same issue, I think. In one way I think you're right that it's the same issue -- if you just focus on that one executor node, then it's the same issue. But I don't think it's the same issue in a deeper sense, since this is one case where you simply don't have to accept any risk. We really should be able to just not ever do this, for a limited though important subset of cases involving ORs + indexable operators. -- Peter Geoghegan
On 8/4/23 04:07, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Because my patch is all about reducing the heap pages, which are usually >> the expensive part of the index scan. But you're right the "index scan" >> with index filter may access more index pages, because it has fewer >> "access predicates". > > It's not so much the unnecessary index page accesses that bother me. > At least I didn't push that aspect very far when I constructed my > adversarial test case -- index pages were only a small part of the > overall problem. (I mean the problem at runtime, in the executor. The > planner expected to save a small number of leaf page accesses, which > was kinda, sorta the problem there -- though the planner might have > technically still been correct about that, and can't have been too far > wrong in any case.) > Thanks for the clarification, I think I understand better now. Let me briefly sum my understanding for the two patches: - The SAOP patch eliminates those heap accesses because it manages to evaluate all clauses in the AM, including clauses that would previously be treated as "table filters" and evaluated on the heap row. - My patch achieves a similar result by evaluating the clauses as index filters (i.e. on the index tuple). That's probably not as good as proper access predicates, so it can't help with the index page accesses, but better than what we had before. There's a couple more related thoughts later in my reply. > The real problem that my adversarial case seemed to highlight was a > problem of extra heap page accesses. The tenk1 table's VM is less than > one page in size, so how could it have been VM buffer hits? Sure, > there were no "table filters" involved -- only "index filters". But > even "index filters" require heap access when the page isn't marked > all-visible in the VM. > No, the extra accesses were not because of VM buffer hits - it was because of having to actually fetch the heap tuple for pages that are not fully visible, which is what happens right after the insert. The patch does what we index-only scans do - before evaluating the filters on an index tuple, we check if the page is fully visible. If not, we fetch the heap tuple and evaluate the filters on it. This means even an index-only scan would behave like this too. And it goes away as the table gets vacuumed, at which point we can eliminate the rows using only the index tuple again. > That problem just cannot happen with a similar plan that eliminates > the same index tuples within the index AM proper (the index quals > don't even have to be "access predicates" for this to apply, either). > Of course, we never need to check the visibility of index tuples just > to be able to consider eliminating them via nbtree search scan > keys/index quals -- and so there is never any question of heap/VM > access for tuples that don't pass index quals. Not so for "index > filters", where there is at least some chance of accessing the heap > proper just to be able to eliminate non-matches. > Right. This however begs a question - why would we actually need to check the visibility map before evaluating the index filter, when the index tuple alone is clearly good enough for the bitmapOr plan? Because if we didn't need to do that VM check, this issue with extra heap accesses would disappear. I copied this from the IOS somewhat blindly, but now I'm starting to think it was misguided. I thought it's a protection against processing "invalid" tuples - not tuples broken after a crash (as that would be fixed by crash recovery), but e.g. tuples with schema changes from an aborted transaction. But can this actually happen for indexes? For heap it's certainly possible (BEGIN - ALTER - INSERT - ROLLBACK will leave behind tuples like that). But we don't support changing indexes like this, right? And if we had this issue, how come the bitmapOr plan (which ultimately does the same thing, although entirely in the AM) does not need to do these VM checks / heap accesses too? It's evaluating essentially the same conditions on the index tuple ... So I'm starting to think this just my misunderstanding of why IOS does this VM check - it's purely to determine visibility of the result. When it sees a pointer to page that is not all-visible, it decides it'll need to do check visibility on a heap tuple anyway, and just fetches the heap tuple right away. Which however ignores that the filters may eliminate many of those tuples, so IOS could also make such unnecessary heap accesses. It might be better to check the filters first, and only then maybe fetch the heap tuple ... > While I think that it makes sense to assume that "index filters" are > strictly better than "table filters" (assuming they're directly > equivalent in that they contain the same clause), they're not > *reliably* any better. So "index filters" are far from being anywhere > near as good as an equivalent index qual (AFAICT we should always > assume that that's true). This is true of index quals generally -- > this advantage is *not* limited to "access predicate index quals". (It > is most definitely not okay for "index filters" to displace equivalent > "access predicate index quals", but it's also not really okay to allow > them to displace equivalent "index filter predicate index quals" -- > the latter case is less bad, but AFAICT they both basically aren't > acceptable "substitutions".) > I'm not quite sure what are the differences between "index qual" vs. "access predicate index qual" vs. "index filter predicate index quals", or what "dispacing" would mean exactly. But I agree there's a hierarchy of qual types, and some "promotions" are likely guaranteed to be good. FWIW this also reminds me that this whole discussion mostly focused on SAOP clauses (and how they may be translated into access predicates etc.). My patch is however way more general - it applies to all clauses, not just SAOP ones, including clauses with no chance of evaluating at the AM level. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 4, 2023 at 4:34 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Thanks for the clarification, I think I understand better now. Honestly, it's gratifying to be understood at all in a discussion like this one. Figuring out how to articulate some of my more subtle ideas (without leaving out important nuances) can be a real struggle for me. > Let me briefly sum my understanding for the two patches: > > - The SAOP patch eliminates those heap accesses because it manages to > evaluate all clauses in the AM, including clauses that would previously > be treated as "table filters" and evaluated on the heap row. Yes, exactly. > - My patch achieves a similar result by evaluating the clauses as index > filters (i.e. on the index tuple). That's probably not as good as proper > access predicates, so it can't help with the index page accesses, but > better than what we had before. Yes, exactly. > No, the extra accesses were not because of VM buffer hits - it was > because of having to actually fetch the heap tuple for pages that are > not fully visible, which is what happens right after the insert. Yes, exactly. > The patch does what we index-only scans do - before evaluating the > filters on an index tuple, we check if the page is fully visible. If > not, we fetch the heap tuple and evaluate the filters on it. > > This means even an index-only scan would behave like this too. And it > goes away as the table gets vacuumed, at which point we can eliminate > the rows using only the index tuple again. Yes, exactly. > Right. This however begs a question - why would we actually need to > check the visibility map before evaluating the index filter, when the > index tuple alone is clearly good enough for the bitmapOr plan? > > Because if we didn't need to do that VM check, this issue with extra > heap accesses would disappear. The index AM is entitled to make certain assumptions of opclass members -- assumptions that cannot be made during expression evaluation. The classic example is division-by-zero during evaluation of a qual, for a tuple that wasn't visible anyway. Our assumption is that stuff like that just cannot happen with index quals -- users shouldn't ever encounter sanity check errors caused by invisible-to-their-MVCC-snapshot tuples. I think that that's the main difficulty, as far as avoiding heap access for index filters is concerned. Of course, you could just limit yourself to those cases where the index AM assumptions were safe. But at that point, why not simply make sure to generate true index quals, and be done with it? > I copied this from the IOS somewhat blindly, but now I'm starting to > think it was misguided. I thought it's a protection against processing > "invalid" tuples - not tuples broken after a crash (as that would be > fixed by crash recovery), but e.g. tuples with schema changes from an > aborted transaction. I agree that schema changes for indexes shouldn't be an issue, though. > I'm not quite sure what are the differences between "index qual" vs. > "access predicate index qual" vs. "index filter predicate index quals", > or what "dispacing" would mean exactly. For the purposes of this point about "a hierarchy of quals", it doesn't really matter -- that was the point I was trying to make. In other words, "index quals" are strictly better than equivalent "index filters", which are themselves strictly better than equivalent "table filters". While it is also true that you can meaningfully classify "index quals" into their own hierarchy (namely access predicates vs index filter predicates), that doesn't necessarily need to be discussed when discussing the hierarchy from a planner point of view, since it is (at least for now) internal to the nbtree index AM. On second thought, I tend to doubt that your patch needs to worry about each type of index qual directly. It probably needs to worry about index quals in general. It is always better to make what could be an "index filter" into an index qual. Of course, the current practical problem for you is figuring out how to deal with that in cases like the BitmapOr case. Since it's not as if the planner is wrong, exactly...it really is the cheapest plan, so the planner is at least right on its own terms. I am interested in finding a solution to that problem. > FWIW this also reminds me that this whole discussion mostly focused on > SAOP clauses (and how they may be translated into access predicates > etc.). My patch is however way more general - it applies to all clauses, > not just SAOP ones, including clauses with no chance of evaluating at > the AM level. I agree that your patch is way more general, in one way. But the SAOP patch is also much more general than you might think. For one thing the whole BitmapOr plan issue actually required that you compared your patch to a combination of my SAOP patch and Alena Rybakina's OR-to-SAOP transformation patch -- you needed both patches. Her patch effectively made my own patch much more general. But there are all kinds of other transformations that might further extend the applicability of nbtree executor changes from my patch -- the MDAM techniques are certainly not limited to ORs/SAOPs. For example, it's easy to imagine inventing a new type of SAOP-style clause that represented "BETWEEN x AND y" expressions, that would make range predicates into "first class indexable operators" -- ScalarRangeOprExpr, or something. With multi-column indexes, these ScalarRangeOprExpr clauses could be composed beside ScalarArrayOpExpr clauses, as well as simpler clauses -- all while preserving index order on output. So there are quite a few plan shapes that aren't possible at all right now, that become possible, many of which don't even have SAOPs/ORs. Of course it won't ever be possible to create a transformation that doesn't ultimately flatten everything into MDAM style "single value" DNF predicates, which have to use simple B-Tree opclass operators -- obviously there are fundamental limits to it. So even in a perfect world, with every possible MDAM-ish transformation implemented, we'll still have a significant need for your patch. -- Peter Geoghegan
On 8/5/23 02:53, Peter Geoghegan wrote: > ... > >> Right. This however begs a question - why would we actually need to >> check the visibility map before evaluating the index filter, when the >> index tuple alone is clearly good enough for the bitmapOr plan? >> >> Because if we didn't need to do that VM check, this issue with extra >> heap accesses would disappear. > > The index AM is entitled to make certain assumptions of opclass > members -- assumptions that cannot be made during expression > evaluation. The classic example is division-by-zero during evaluation > of a qual, for a tuple that wasn't visible anyway. Our assumption is > that stuff like that just cannot happen with index quals -- users > shouldn't ever encounter sanity check errors caused by > invisible-to-their-MVCC-snapshot tuples. > Thanks for reminding me, I keep forgetting about this. > I think that that's the main difficulty, as far as avoiding heap > access for index filters is concerned. Of course, you could just limit > yourself to those cases where the index AM assumptions were safe. But > at that point, why not simply make sure to generate true index quals, > and be done with it? > Yeah, if it's possible to generate true index quals, it'd be stupid not to do that. But clearly there are cases where that's not possible (we may not have the code doing that, or maybe it's just not possible in principle). Would it be possible to inspect the expression and determine if it's "safe" to be treated almost like an index qual? Say, given a complex expression, we might check if it consists only of expressions that could be treated as an index qual. So a bit like leak-proof, which may actually be relevant here too I guess (e.g. int4div is not leak-proof, for example, so e.g. the division-by-zero would not allow index-qual treatment). I now recall there probably was a past discussion about how leak-proof relates to this, but IIRC the conclusion was it's not quite the same thing. But maybe I just remember things wrong. Anyway, I think there are always going to be clauses that would be safe to evaluate on the index, but the index AM does not know to handle them for some reason. For example it might require extending the AM to handle generic expressions, which doesn't seem quite desirable. So I think I see three points where we could evaluate expressions: 1) in the AM, as access preditates / index quals (doing this more often is kinda what the SAOP patches aim to do) 2) in the index scan, before checking VM / visibility (if the qual is safe to be evaluated early) 3) in the index scan, after checking VM / visibility (if the expression does unsafe things) >> I copied this from the IOS somewhat blindly, but now I'm starting to >> think it was misguided. I thought it's a protection against processing >> "invalid" tuples - not tuples broken after a crash (as that would be >> fixed by crash recovery), but e.g. tuples with schema changes from an >> aborted transaction. > > I agree that schema changes for indexes shouldn't be an issue, though. > >> I'm not quite sure what are the differences between "index qual" vs. >> "access predicate index qual" vs. "index filter predicate index quals", >> or what "dispacing" would mean exactly. > > For the purposes of this point about "a hierarchy of quals", it > doesn't really matter -- that was the point I was trying to make. > > In other words, "index quals" are strictly better than equivalent > "index filters", which are themselves strictly better than equivalent > "table filters". While it is also true that you can meaningfully > classify "index quals" into their own hierarchy (namely access > predicates vs index filter predicates), that doesn't necessarily need > to be discussed when discussing the hierarchy from a planner point of > view, since it is (at least for now) internal to the nbtree index AM. > > On second thought, I tend to doubt that your patch needs to worry > about each type of index qual directly. It probably needs to worry > about index quals in general. > I agree. That seems like a discussion relevant to the general topic of "upgrading" clauses. If anything, the patch may need to worry about moving table filters to index filters, that's the only thing it does. > It is always better to make what could be an "index filter" into an > index qual. Of course, the current practical problem for you is > figuring out how to deal with that in cases like the BitmapOr case. > Since it's not as if the planner is wrong, exactly...it really is the > cheapest plan, so the planner is at least right on its own terms. I am > interested in finding a solution to that problem. > Well, if the planner is not wrong, what solution are we looking for? ;-) FWIW if the problem is the patch may make the new plan look cheaper than some "actually better" plan (e.g. the BitmapOr one). In that case, we could just keep the old costing (kinda assuming the worst case, but as you said, the benefits are limited, while the risks are arbitrary). That's the only idea I have. >> FWIW this also reminds me that this whole discussion mostly focused on >> SAOP clauses (and how they may be translated into access predicates >> etc.). My patch is however way more general - it applies to all clauses, >> not just SAOP ones, including clauses with no chance of evaluating at >> the AM level. > > I agree that your patch is way more general, in one way. But the SAOP > patch is also much more general than you might think. > > For one thing the whole BitmapOr plan issue actually required that you > compared your patch to a combination of my SAOP patch and Alena > Rybakina's OR-to-SAOP transformation patch -- you needed both patches. > Her patch effectively made my own patch much more general. But there > are all kinds of other transformations that might further extend the > applicability of nbtree executor changes from my patch -- the MDAM > techniques are certainly not limited to ORs/SAOPs. For example, it's > easy to imagine inventing a new type of SAOP-style clause that > represented "BETWEEN x AND y" expressions, that would make range > predicates into "first class indexable operators" -- > ScalarRangeOprExpr, or something. With multi-column indexes, these > ScalarRangeOprExpr clauses could be composed beside ScalarArrayOpExpr > clauses, as well as simpler clauses -- all while preserving index > order on output. So there are quite a few plan shapes that aren't > possible at all right now, that become possible, many of which don't > even have SAOPs/ORs. > > Of course it won't ever be possible to create a transformation that > doesn't ultimately flatten everything into MDAM style "single value" > DNF predicates, which have to use simple B-Tree opclass operators -- > obviously there are fundamental limits to it. So even in a perfect > world, with every possible MDAM-ish transformation implemented, we'll > still have a significant need for your patch. > Yup, that's exactly my point. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Aug 6, 2023 at 10:23 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > The index AM is entitled to make certain assumptions of opclass > > members -- assumptions that cannot be made during expression > > evaluation. > Thanks for reminding me, I keep forgetting about this. I was almost certain that you already knew that, actually. It's easy to forget such details in a discussion like this one, where the focus zooms out and then zooms back in, again and again. > Would it be possible to inspect the expression and determine if it's > "safe" to be treated almost like an index qual? Say, given a complex > expression, we might check if it consists only of expressions that could > be treated as an index qual. So a bit like leak-proof, which may > actually be relevant here too I guess (e.g. int4div is not leak-proof, > for example, so e.g. the division-by-zero would not allow index-qual > treatment). Clearly you're talking about a distinct set of guarantees to the ones that B-Tree opclasses make about not throwing errors when scanning maybe-not-visible index tuples. The B-Tree opclass guarantees might not even be written down anywhere -- they seem like common sense, almost. What you've described definitely seems like it could be very useful, but I don't think that it solves the fundamental problem with cases like the BitmapOr plan. Even if you do things in a way that precludes the possibility of extra heap page accesses (when the VM bit isn't set), you still have the problem of "access predicates vs index filter predicates". Which is a big problem, in and of itself. > Anyway, I think there are always going to be clauses that would be safe > to evaluate on the index, but the index AM does not know to handle them > for some reason. For example it might require extending the AM to handle > generic expressions, which doesn't seem quite desirable. Actually, I mostly don't think we'd need to teach nbtree or other index AMs anything about simplifying expressions. Structurally, we should try to make things like ScalarArrayOpExpr into "just another indexable operator", which has little to no difference with any other indexable operator at runtime. There probably are several good reasons why "normalizing to CNF" in the planner is a good idea (to some extent we do this already). Alena Rybakina's OR-to-SAOP transformation patch was written well before anybody knew about the MDAM/SAOP patch I'm working on. The original motivation was to lower expression evaluation overhead. You could probably find a third of even a fourth reason to do that specific transformation, if you thought about it for a while. Top-down planner designs such as Cascades really have to spend a lot of time on this kind of normalization process. For very general reasons -- many of which are bound to apply in our own bottom-up planner design. > So I think I see three points where we could evaluate expressions: > > 1) in the AM, as access preditates / index quals (doing this more often > is kinda what the SAOP patches aim to do) > > 2) in the index scan, before checking VM / visibility (if the qual is > safe to be evaluated early) > > 3) in the index scan, after checking VM / visibility (if the expression > does unsafe things) Agreed. Presumably there would also be a class of expressions that the patch should make into index filters rather than table filters, despite being unable to determine whether they're safe to evaluate early. Even if there is only a small chance of it helping at runtime, there is no reason (or infinitesimally small reason) to not to just do prefer index filters where possible -- so it's desirable to always prefer index filters, regardless of opclass/type restrictions on early evaluation. Right? Assuming the answer is yes, then I think that you still need all of the index-only scan stuff that can "fallback to heap access", just to be able to cover this other class of expression. I don't think that this class that I've described will be rarely encountered, or anything. > I agree. That seems like a discussion relevant to the general topic of > "upgrading" clauses. If anything, the patch may need to worry about > moving table filters to index filters, that's the only thing it does. Obviously that will have indirect consequences due to the changes in the costing. > > It is always better to make what could be an "index filter" into an > > index qual. Of course, the current practical problem for you is > > figuring out how to deal with that in cases like the BitmapOr case. > > Since it's not as if the planner is wrong, exactly...it really is the > > cheapest plan, so the planner is at least right on its own terms. I am > > interested in finding a solution to that problem. > > > > Well, if the planner is not wrong, what solution are we looking for? ;-) I imagine that you really don't want to have to rely on some wishy-washy philosophical argument about the planner's expectation being the only reasonable basis for choosing a plan. Just as I don't want to have to rely on some similarly hand-wavy argument about risk. The ideal outcome is one that doesn't require any of that, from either of us. I believe that the patch that I'm working on can allow us to totally avoid it. I hesitate to say this, since it might sound like I'm imposing conditions in a self-interested way. AFAICT it really does provide us with a practical way of just not having to go down the road that nobody wants to go down. So I am just about ready to say that I believe that that will end up being the solution we use. It just seems to make sense. By normalizing to CNF, the planner is given the ability to work with higher-level index paths, that abstract-away inessential "physical plan" differences. Suppose, for example, we're building index paths for a scan that comes from an SAOP that was generated through OR transformation. But, it's an index AM that lacks native support for SAOPs -- not nbtree. That index path will still end up using a BitmapOr, in the end, since it'll ultimately have to compensate for the lack of index AM infrastructure. So the final "physical plan" will be exactly the same as today -- the OR transformation will actually have changed nothing about the physical plan in these sorts of cases. The CNF transformation process just puts what was already true ("these two plans are logically equivalent") on a more formal footing -- and so implicitly avoids "risky plans" like the one we've discussed. We'll only be relying on the nbtree work to get those small efficiencies that come from avoiding duplicative primitive index scans. Since that was never actually a goal of your patch to begin with, limiting that benefit to nbtree scans (where we can do it without novel risks) seems more than acceptable. Since you're not relying on the nbtree work at all here, really (just on the transformation process itself), the strategic risk that this adds to your project isn't too great. It's not like this ties the success of your patch to the success of my own patch. At most it ties the success of your patch to something like Alena Rybakina's OR-to-SAOP transformation patch, which seems manageable to me. (To be clear, I'm not relying on that work in the same way myself -- for my patch the transformation process is just a nice-to-have.) > FWIW if the problem is the patch may make the new plan look cheaper than > some "actually better" plan (e.g. the BitmapOr one). In that case, we > could just keep the old costing (kinda assuming the worst case, but as > you said, the benefits are limited, while the risks are arbitrary). > That's the only idea I have. That's almost ideal for my patch. I should be able to pursue my patch without concern about your patch interfering too much -- presumably my patch will be able to generate the cheapest plan in cases like the BitmapOr case. It'll at least be slightly cheaper in physical reality, and now you'll artificially penalize the paths that your patch generates -- so cases like the BitmapOr case should do the right thing by seeing the SAOP path as cheaper during planning. But is that approach really ideal for your patch? I doubt it. Why wouldn't we want to give the index paths with index filters credit for being cheaper in almost all cases? That's why this doesn't feel like a costing issue to me (it's more of a "just don't do that" issue, I think). Your patch seems too important to nerf like this, even if it's convenient in some ways. -- Peter Geoghegan
On Sun, Aug 6, 2023 at 1:13 PM Peter Geoghegan <pg@bowt.ie> wrote: > Since you're not relying on the nbtree work at all here, really (just > on the transformation process itself), the strategic risk that this > adds to your project isn't too great. It's not like this ties the > success of your patch to the success of my own patch. At most it ties > the success of your patch to something like Alena Rybakina's > OR-to-SAOP transformation patch, which seems manageable to me. (To be > clear, I'm not relying on that work in the same way myself -- for my > patch the transformation process is just a nice-to-have.) I decided to verify my understanding by checking what would happen when I ran the OR-heavy tenk1 regression test query against a combination of your patch, and v7 of the OR-to-SAOP transformation patch. (To be clear, this is without my patch.) I found that the problem that I saw with the OR-heavy tenk1 regression test goes away (though only when I "set or_transform_limit=0"). That is, we'll get an index scan plan that uses a SAOP. This index scan plan is comparable to the master branch's BitmapOr scan. In particular, both plans get 7 buffer hits. More importantly, the new plan is (like the master branch plan) not risky in the way I've been going on about. This does mean that your patch gets a *slightly* slower plan, due to the issue of added index page accesses. Improving that should be a job for my patch -- it's not your problem, since there is no regression. I'm not sure if it's somehow still possible that SAOP expression evaluation is able to "do the risky thing" in the same way that your patch's "Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42))" plan. But it certainly looks like it can't. Increasingly, the problem here appears to me to be a problem of lacking useful CNF transformations/normalization -- nothing more. Structuring things so that we reliably use "the native representation of ORs" via normalization seems likely to be all you really need. We may currently be over relying on a similar process that happens indirectly, via BitmapOr paths. I suspect that you're right to be concerned about how this might already be affecting index-only scans. Once we have CNF normalization/transformation in place, we will of course continue to use some BitmapOr plans that may look a little like the ones I'm focussed on, and some plans that use "index filters" (due to your patch) that are also a little like that. But there's nothing objectionable about those cases IMV (quite the opposite), since there is no question of displacing/out-competing very similar plans that can use index quals. (You might also find a way to avoid ever requiring heap access/visibility checks for a subset of the "index filter" cases where it is determined to be safe up front, but that's just a bonus.) -- Peter Geoghegan
On Sun, Aug 6, 2023 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote: > I decided to verify my understanding by checking what would happen > when I ran the OR-heavy tenk1 regression test query against a > combination of your patch, and v7 of the OR-to-SAOP transformation > patch. (To be clear, this is without my patch.) I also spotted what looks like it might be a problem with your patch when looking at this query (hard to be sure if it's truly a bug, though). I manually SAOP-ify the OR-heavy tenk1 regression test query like so: select * from tenk1 where thousand = 42 and tenthous in (1, 3, 42); Sure enough, I continue to get 7 buffer hits with this query. Just like with the BitmapOr plan (and exactly like the original query with the OR-to-SAOP transformation patch in place). As I continue to add SAOP constants to the original "tenthous" IN(), eventually the planner switches over to not using index quals on the "tenthous" low order index column (they're only used on the high order "thousand" index column). Here's where the switch to only using the leading column from the index happens for me: select * from tenk1 where thousand = 42 and tenthous in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); This plan switchover isn't surprising in itself -- it's one of the most important issues addressed by my SAOP patch. However, it *is* a little surprising that your patch doesn't even manage to use "Index Filter" quals. It appears that it is only capable of using table filter quals. Obviously, the index has all the information that expression evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement over master here. Interestingly enough, your patch only has this problem with SAOPs, at least that I know of -- the spelling/style matters. If I add many additional "tenthous" constants to the original version of the query from the regression tests in the same way, but using the "longform" (tenthous = 1 or tenthous = 3 ...) spelling, then your patch does indeed use index filters/expression evaluation. Just like the original "risky" plan (it's just a much bigger expression, with many more ORs). -- Peter Geoghegan
On 8/7/23 02:38, Peter Geoghegan wrote: > On Sun, Aug 6, 2023 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote: >> I decided to verify my understanding by checking what would happen >> when I ran the OR-heavy tenk1 regression test query against a >> combination of your patch, and v7 of the OR-to-SAOP transformation >> patch. (To be clear, this is without my patch.) > > I also spotted what looks like it might be a problem with your patch > when looking at this query (hard to be sure if it's truly a bug, > though). > > I manually SAOP-ify the OR-heavy tenk1 regression test query like so: > > select > * > from > tenk1 > where > thousand = 42 > and tenthous in (1, 3, 42); > > Sure enough, I continue to get 7 buffer hits with this query. Just > like with the BitmapOr plan (and exactly like the original query with > the OR-to-SAOP transformation patch in place). > > As I continue to add SAOP constants to the original "tenthous" IN(), > eventually the planner switches over to not using index quals on the > "tenthous" low order index column (they're only used on the high order > "thousand" index column). Here's where the switch to only using the > leading column from the index happens for me: > > select > * > from > tenk1 > where > thousand = 42 > and > tenthous in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); > > This plan switchover isn't surprising in itself -- it's one of the most > important issues addressed by my SAOP patch. However, it *is* a little > surprising that your patch doesn't even manage to use "Index Filter" > quals. It appears that it is only capable of using table filter quals. > Obviously, the index has all the information that expression > evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY > ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement > over master here. > > Interestingly enough, your patch only has this problem with SAOPs, at > least that I know of -- the spelling/style matters. If I add many > additional "tenthous" constants to the original version of the query > from the regression tests in the same way, but using the "longform" > (tenthous = 1 or tenthous = 3 ...) spelling, then your patch does > indeed use index filters/expression evaluation. Just like the original > "risky" plan (it's just a much bigger expression, with many more ORs). > Right. This happens because the matching of SAOP to indexes happens in multiple places. Firstly, create_index_paths() matches the clauses to the index by calling match_restriction_clauses_to_index -> match_clauses_to_index -> match_clause_to_index Which is where we also decide which *unmatched* clauses can be filters. And this *does* match the SAOP to the index key, hence no index filter. But then we call get_index_paths/build_index_path a little bit later, and that decides to skip "lower SAOP" (which seems a bit strange, because the column is "after" the equality, but meh). Anyway, at this point we already decided what's a filter, ignoring the index clauses, and not expecting any backsies. The simples fix seems to be to add these skipped SAOP clauses as filters. We know it can be evaluated on the index ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > But then we call get_index_paths/build_index_path a little bit later, > and that decides to skip "lower SAOP" (which seems a bit strange, > because the column is "after" the equality, but meh). Anyway, at this > point we already decided what's a filter, ignoring the index clauses, > and not expecting any backsies. I'm not surprised that it's due to the issue around "lower SAOP" clauses within get_index_paths/build_index_path. That whole approach seems rather ad-hoc to me. As you probably realize already, my own patch has to deal with lots of issues in the same area. > The simples fix seems to be to add these skipped SAOP clauses as > filters. We know it can be evaluated on the index ... Right. Obviously, my preferred solution to the problem at hand is to make everything into index quals via an approach like the one from my patch -- that works sensibly, no matter the length of the SAOP arrays. But even if you're willing to assume that that work will be in place for 17, there are still certain remaining gaps, that also seem important. Even my patch cannot always make SAOP clauses into index quals. There are specific remaining gaps that I hope that your patch will still cover. The simplest example is a similar NOT IN() inequality, like this: select ctid, * from tenk1 where thousand = 42 and tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); There is no way that my patch can handle this case. Where your patch seems to be unable to do better than master here, either -- just like with the "tenthous in ( )" variant. Once again, the inequality SAOP also ends up as table filter quals, not index filter quals. It would also be nice if we found a way of doing this, while still reliably avoiding all visibility checks (just like "real index quals" will) -- since that should be safe in this specific case. The MDAM paper describes a scheme for converting NOT IN() clauses into DNF single value predicates. But that's not going to happen for 17, and doesn't seem all that helpful with a query like this in any case. But it does suggest an argument in favor of visibility checks not being truly required for SAOP inequalities like this one (when they appear in index filters). I'm not sure if that idea is too particular to SAOP inequalities to be interesting -- just a suggestion. -- Peter Geoghegan
On Mon, Aug 7, 2023 at 3:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > Even my patch cannot always make SAOP clauses into index quals. There > are specific remaining gaps that I hope that your patch will still > cover. The simplest example is a similar NOT IN() inequality, like > this: > > select > ctid, * > from > tenk1 > where > thousand = 42 > and > tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); > > There is no way that my patch can handle this case. Where your patch > seems to be unable to do better than master here, either -- just like > with the "tenthous in ( )" variant. Once again, the inequality SAOP > also ends up as table filter quals, not index filter quals. > > It would also be nice if we found a way of doing this, while still > reliably avoiding all visibility checks (just like "real index quals" > will) -- since that should be safe in this specific case. Actually, this isn't limited to SAOP inequalities. It appears as if *any* simple inequality has the same limitation. So, for example, the following query can only use table filters with the patch (never index filters): select ctid, * from tenk1 where thousand = 42 and tenthous != 1; This variant will use index filters, as expected (though with some risk of heap accesses when VM bits aren't set): select ctid, * from tenk1 where thousand = 42 and tenthous is distinct from 1; Offhand I suspect that it's a similar issue to the one you described for SAOPs. I see that get_op_btree_interpretation() will treat != as a kind of honorary member of an opfamily whose = operator has our != operator as its negator. Perhaps we should be finding a way to pass != quals into the index AM so that they become true index quals (obviously they would only be index filter predicates, never access predicates). That has the advantage of working in a way that's analogous to the way that index quals already avoid visibility checks. -- Peter Geoghegan
On 8/8/23 00:18, Peter Geoghegan wrote: > On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> But then we call get_index_paths/build_index_path a little bit later, >> and that decides to skip "lower SAOP" (which seems a bit strange, >> because the column is "after" the equality, but meh). Anyway, at this >> point we already decided what's a filter, ignoring the index clauses, >> and not expecting any backsies. > > I'm not surprised that it's due to the issue around "lower SAOP" > clauses within get_index_paths/build_index_path. That whole approach > seems rather ad-hoc to me. As you probably realize already, my own > patch has to deal with lots of issues in the same area. > Yeah. It's be much easier if the decision was done in one place, without then changing it later. >> The simples fix seems to be to add these skipped SAOP clauses as >> filters. We know it can be evaluated on the index ... > > Right. Obviously, my preferred solution to the problem at hand is to > make everything into index quals via an approach like the one from my > patch -- that works sensibly, no matter the length of the SAOP arrays. > But even if you're willing to assume that that work will be in place > for 17, there are still certain remaining gaps, that also seem > important. > Agreed. > Even my patch cannot always make SAOP clauses into index quals. There > are specific remaining gaps that I hope that your patch will still > cover. The simplest example is a similar NOT IN() inequality, like > this: > > select > ctid, * > from > tenk1 > where > thousand = 42 > and > tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); > > There is no way that my patch can handle this case. Where your patch > seems to be unable to do better than master here, either -- just like > with the "tenthous in ( )" variant. Once again, the inequality SAOP > also ends up as table filter quals, not index filter quals. > Are you sure? Because if I try with the 20230716 patch, I get this plan (after disabling bitmapscan): QUERY PLAN ------------------------------------------------------------------- Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.31..44.54 rows=10 width=250) Index Cond: (thousand = 42) Index Filter: (tenthous <> ALL ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[])) Filter: (tenthous <> ALL ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[])) (4 rows) So the condition is recognized as index filter. Or did you mean something different? > It would also be nice if we found a way of doing this, while still > reliably avoiding all visibility checks (just like "real index quals" > will) -- since that should be safe in this specific case. > > The MDAM paper describes a scheme for converting NOT IN() clauses into > DNF single value predicates. But that's not going to happen for 17, > and doesn't seem all that helpful with a query like this in any case. > But it does suggest an argument in favor of visibility checks not > being truly required for SAOP inequalities like this one (when they > appear in index filters). I'm not sure if that idea is too particular > to SAOP inequalities to be interesting -- just a suggestion. > Not sure. A couple messages back I suggested that maybe there is a way to check which expression would be safe to evaluate before checking the visibility. This seems similar, although what you're suggesting really applies to the "transformed" SAOP, and I'm not sure it can be extended to the original SAOP. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/8/23 06:21, Peter Geoghegan wrote: > On Mon, Aug 7, 2023 at 3:18 PM Peter Geoghegan <pg@bowt.ie> wrote: >> Even my patch cannot always make SAOP clauses into index quals. There >> are specific remaining gaps that I hope that your patch will still >> cover. The simplest example is a similar NOT IN() inequality, like >> this: >> >> select >> ctid, * >> from >> tenk1 >> where >> thousand = 42 >> and >> tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50); >> >> There is no way that my patch can handle this case. Where your patch >> seems to be unable to do better than master here, either -- just like >> with the "tenthous in ( )" variant. Once again, the inequality SAOP >> also ends up as table filter quals, not index filter quals. >> >> It would also be nice if we found a way of doing this, while still >> reliably avoiding all visibility checks (just like "real index quals" >> will) -- since that should be safe in this specific case. > > Actually, this isn't limited to SAOP inequalities. It appears as if > *any* simple inequality has the same limitation. So, for example, the > following query can only use table filters with the patch (never index > filters): > > select > ctid, * > from > tenk1 > where > thousand = 42 and tenthous != 1; > > This variant will use index filters, as expected (though with some > risk of heap accesses when VM bits aren't set): > > select > ctid, * > from > tenk1 > where > thousand = 42 and tenthous is distinct from 1; > > Offhand I suspect that it's a similar issue to the one you described for SAOPs. > > I see that get_op_btree_interpretation() will treat != as a kind of > honorary member of an opfamily whose = operator has our != operator as > its negator. Perhaps we should be finding a way to pass != quals into > the index AM so that they become true index quals (obviously they > would only be index filter predicates, never access predicates). That > has the advantage of working in a way that's analogous to the way that > index quals already avoid visibility checks. > Are you sure you're using the right build? Because I get this plan: QUERY PLAN ------------------------------------------------------------------- Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..44.48 rows=10 width=250) Index Cond: (thousand = 42) Index Filter: (tenthous <> 1) Filter: (tenthous <> 1) (4 rows) Again, the inequality is clearly recognized as index filter. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 8, 2023 at 7:28 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Are you sure? Because if I try with the 20230716 patch, I get this plan > (after disabling bitmapscan): > So the condition is recognized as index filter. Or did you mean > something different? No, I was just wrong about this (and about inequalities in general). I now see why the planner preferred a bitmap scan, which makes sense. Apologies. > Not sure. A couple messages back I suggested that maybe there is a way > to check which expression would be safe to evaluate before checking the > visibility. This seems similar, although what you're suggesting really > applies to the "transformed" SAOP, and I'm not sure it can be extended > to the original SAOP. The transformation doesn't necessarily have to happen in order for it to be possible in principle (and correct). My point was that there are a handful of important types of expressions (SAOPs among them, but possibly also RowCompareExpr and IS NULL tests) that are "index quals in spirit". These expressions therefore don't seem to need visibility checks at all -- the index qual guarantees "apply transitively". It's possible that an approach that focuses on leakproof quals won't have any problems, and will be strictly better than "extending the index qual guarantees to index-qual-like expressions". Really not sure about that. In any case I see a huge amount of value in differentiating between cases that need visibility checks (if only via the VM) and those that do not, ever. I'm speaking very generally here -- nothing to do with my adversarial tenk1 test case. It's weird that index quals have such a massive advantage over even simple index filters -- that feels artificial. I suggest that you focus on that aspect, since it has the potential to make what is already a compelling patch into a much more compelling patch. -- Peter Geoghegan
On Mon, Aug 7, 2023 at 9:21 PM Peter Geoghegan <pg@bowt.ie> wrote: > I see that get_op_btree_interpretation() will treat != as a kind of > honorary member of an opfamily whose = operator has our != operator as > its negator. Perhaps we should be finding a way to pass != quals into > the index AM so that they become true index quals (obviously they > would only be index filter predicates, never access predicates). That > has the advantage of working in a way that's analogous to the way that > index quals already avoid visibility checks. The approach in your patch can only really work with index scans (and index-only scans). So while it is more general than true index quals in some ways, it's also less general in other ways: it cannot help bitmap index scans. While I accept that the inability of bitmap index scans to use index filters in this way is, to some degree, a natural and inevitable downside of bitmap index scans, that isn't always true. For example, it doesn't seem to be the case with simple inequalities. Bitmap index scans argue for making cases involving quals that are "index quals in spirit" into actual index quals. Even if you can reliably avoid extra heap accesses for plain index scans using expression evaluation, I can't see that working for bitmap index scans. More concretely, if we have an index on "tenk1 (four, two)", then we miss out on the opportunity to eliminate heap accesses for a query like this one: select ctid, * from tenk1 where four = 1 and two != 1; This will get a bitmap index scan plan (that uses our composite index), which makes sense overall. But the details beyond that make no sense -- since we're using table filter quals for "two". It turns out that the bitmap heap scan will access every single heap page in the tenk1 table as a result, even though we could have literally avoided all heap accesses had we been able to push down the != as an index qual. This is a difference in "buffers hit" that is close to 2 orders of magnitude. I'd be okay with treating these cases as out of scope for this patch, but we should probably agree on the parameters. The patch certainly shouldn't make it any harder to fix cases such as this. -- Peter Geoghegan
On 8/8/23 19:43, Peter Geoghegan wrote: > On Mon, Aug 7, 2023 at 9:21 PM Peter Geoghegan <pg@bowt.ie> wrote: >> I see that get_op_btree_interpretation() will treat != as a kind of >> honorary member of an opfamily whose = operator has our != operator as >> its negator. Perhaps we should be finding a way to pass != quals into >> the index AM so that they become true index quals (obviously they >> would only be index filter predicates, never access predicates). That >> has the advantage of working in a way that's analogous to the way that >> index quals already avoid visibility checks. > > The approach in your patch can only really work with index scans (and > index-only scans). So while it is more general than true index quals > in some ways, it's also less general in other ways: it cannot help > bitmap index scans. > > While I accept that the inability of bitmap index scans to use index > filters in this way is, to some degree, a natural and inevitable > downside of bitmap index scans, that isn't always true. For example, > it doesn't seem to be the case with simple inequalities. Bitmap index > scans argue for making cases involving quals that are "index quals in > spirit" into actual index quals. Even if you can reliably avoid extra > heap accesses for plain index scans using expression evaluation, I > can't see that working for bitmap index scans. > > More concretely, if we have an index on "tenk1 (four, two)", then we > miss out on the opportunity to eliminate heap accesses for a query > like this one: > > select > ctid, * > from > tenk1 > where > four = 1 and two != 1; > > This will get a bitmap index scan plan (that uses our composite > index), which makes sense overall. But the details beyond that make no > sense -- since we're using table filter quals for "two". It turns out > that the bitmap heap scan will access every single heap page in the > tenk1 table as a result, even though we could have literally avoided > all heap accesses had we been able to push down the != as an index > qual. This is a difference in "buffers hit" that is close to 2 orders > of magnitude. > > I'd be okay with treating these cases as out of scope for this patch, > but we should probably agree on the parameters. The patch certainly > shouldn't make it any harder to fix cases such as this. > I agree this patch shouldn't make it harder to improve these cases, but TBH I don't quite see which part of the patch would do that? Which bit are you objecting to? If we decide to change how match_clause_to_index() deals with these cases, to recognize them as index quals, the patch will be working just fine. The only thing the patch does is it looks at clauses we decided not to treat as index quals, and do maybe still evaluate them on index. And I don't think I want to move these goalposts much further. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 8, 2023 at 11:14 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I agree this patch shouldn't make it harder to improve these cases, but > TBH I don't quite see which part of the patch would do that? Which bit > are you objecting to? If we decide to change how match_clause_to_index() > deals with these cases, to recognize them as index quals, the patch will > be working just fine. Well, I also recently said that I think that it's important that you figure out a way to reliably avoid visibility checks, in cases where it can be avoided entirely -- since that can lead to huge savings in heap accesses. You haven't done that yet, but you really should look into it IMV. Assuming that that happens, then it immediately gives index scans a huge advantage over bitmap index scans. At that point it seems important to describe (in high level terms) where it is that the advantage is innate, and where it's just because we haven't done the required work for bitmap index scans. I became confused on this point myself yesterday. Admittedly I should have been able to figure it out on my own -- but it is confusing. > The only thing the patch does is it looks at clauses we decided not to > treat as index quals, and do maybe still evaluate them on index. And I > don't think I want to move these goalposts much further. Avoiding the need for visibility checks completely (in at least a subset of cases) was originally your idea. I'm not going to insist on it, or anything like that. It just seems like something that'll add a great deal of value over what you have already. -- Peter Geoghegan
On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote: > Assuming that that happens, then it immediately gives index scans a > huge advantage over bitmap index scans. At that point it seems > important to describe (in high level terms) where it is that the > advantage is innate, and where it's just because we haven't done the > required work for bitmap index scans. I became confused on this point > myself yesterday. Admittedly I should have been able to figure it out > on my own -- but it is confusing. I also have some doubts about the costing. That contributed to my confusion. Take my " four = 1 and two != 1" example query, from earlier today. As I said, that gets a bitmap index scan, which does a hugely excessive amount of heap access. But once I force the planner to use an index scan, then (as predicted) there are useful index filters -- filters that can eliminate 100% of all heap accesses. Yet the planner still thinks that the total cost of the bitmap scan plan is only 415.28, versus 714.89 for the index scan plan. Perhaps that's just because this is a tricky case, for whatever reason...but it's not obvious what that reason really is. You keep pointing out that your patch only makes isolated, local changes to certain specific plans. While that is true, it's also true that there will be fairly far removed consequences. Why shouldn't I treat those things as in scope? -- Peter Geoghegan
On 8/8/23 20:36, Peter Geoghegan wrote: > On Tue, Aug 8, 2023 at 11:14 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I agree this patch shouldn't make it harder to improve these cases, but >> TBH I don't quite see which part of the patch would do that? Which bit >> are you objecting to? If we decide to change how match_clause_to_index() >> deals with these cases, to recognize them as index quals, the patch will >> be working just fine. > > Well, I also recently said that I think that it's important that you > figure out a way to reliably avoid visibility checks, in cases where > it can be avoided entirely -- since that can lead to huge savings in > heap accesses. You haven't done that yet, but you really should look > into it IMV. > > Assuming that that happens, then it immediately gives index scans a > huge advantage over bitmap index scans. At that point it seems > important to describe (in high level terms) where it is that the > advantage is innate, and where it's just because we haven't done the > required work for bitmap index scans. I became confused on this point > myself yesterday. Admittedly I should have been able to figure it out > on my own -- but it is confusing. > Yeah, I agree that might help a lot, particularly for tables that have a significant fraction of not-all-visible pages. >> The only thing the patch does is it looks at clauses we decided not to >> treat as index quals, and do maybe still evaluate them on index. And I >> don't think I want to move these goalposts much further. > > Avoiding the need for visibility checks completely (in at least a > subset of cases) was originally your idea. I'm not going to insist on > it, or anything like that. It just seems like something that'll add a > great deal of value over what you have already. > Right, and I'm not against improving that, but I see it more like an independent task. I don't think it needs (or should) to be part of this patch - skipping visibility checks would apply to IOS, while this is aimed only at plain index scans. Furthermore, I don't have a very good idea how to do that (except maybe for relying on the leakproof flag). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/8/23 21:15, Peter Geoghegan wrote: > On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote: >> Assuming that that happens, then it immediately gives index scans a >> huge advantage over bitmap index scans. At that point it seems >> important to describe (in high level terms) where it is that the >> advantage is innate, and where it's just because we haven't done the >> required work for bitmap index scans. I became confused on this point >> myself yesterday. Admittedly I should have been able to figure it out >> on my own -- but it is confusing. > > I also have some doubts about the costing. That contributed to my confusion. > > Take my " four = 1 and two != 1" example query, from earlier today. As > I said, that gets a bitmap index scan, which does a hugely excessive > amount of heap access. But once I force the planner to use an index > scan, then (as predicted) there are useful index filters -- filters > that can eliminate 100% of all heap accesses. Yet the planner still > thinks that the total cost of the bitmap scan plan is only 415.28, > versus 714.89 for the index scan plan. Perhaps that's just because > this is a tricky case, for whatever reason...but it's not obvious what > that reason really is. > Right. I haven't checked how the costs are calculated in these cases, but I'd bet it's a combination of having correlated conditions, and the bitmap costing being fairly rough (with plenty of constants etc). The correlation seems like an obvious culprit, considering the explain says Bitmap Heap Scan on public.tenk1 (cost=31.35..413.85 rows=1250 width=250) (actual time=2.698..2.703 rows=0 loops=1) So we expect 1250 rows. If that was accurate, the index scan would have to do 1250 heap fetches. It's just luck the index scan doesn't need to do that. I don't this there's a chance to improve this costing - if the inputs are this off, it can't do anything. Also, I think this is related to the earlier discussion about maybe costing it according to the worst case - i.e. as if we still needed fetch the same number of heap tuples as before. Which will inevitably lead to similar issues, with worse plans looking cheaper. > You keep pointing out that your patch only makes isolated, local > changes to certain specific plans. While that is true, it's also true > that there will be fairly far removed consequences. Why shouldn't I > treat those things as in scope? > That is certainly true - I'm trying to keep the scope somewhat close to the original goal. Obviously, there may be additional things the patch really needs to consider, but I'm not sure this is one of those cases (perhaps I just don't understand what the issue is - the example seems like a run-of-the-mill case of poor estimate / costing). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 8, 2023 at 1:24 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Assuming that that happens, then it immediately gives index scans a > > huge advantage over bitmap index scans. At that point it seems > > important to describe (in high level terms) where it is that the > > advantage is innate, and where it's just because we haven't done the > > required work for bitmap index scans. I became confused on this point > > myself yesterday. Admittedly I should have been able to figure it out > > on my own -- but it is confusing. > > > > Yeah, I agree that might help a lot, particularly for tables that have a > significant fraction of not-all-visible pages. It also has the potential to make the costing a lot easier in certain important cases. Accurately deriving just how many heap accesses can be avoided via the VM from the statistics that are available to the planner is likely always going to be very difficult. Finding a way to make that just not matter at all (in these important cases) can also make it safe to bias the costing, such that the planner tends to favor index scans (and index-only scans) over bitmap index scans that cannot possibly eliminate any heap page accesses via an index filter qual. > Right, and I'm not against improving that, but I see it more like an > independent task. I don't think it needs (or should) to be part of this > patch - skipping visibility checks would apply to IOS, while this is > aimed only at plain index scans. I'm certainly not going to insist on it. Worth considering if putting it in scope could make certain aspects of this patch (like the costing) easier, though. I think that it wouldn't be terribly difficult to make simple inequalities into true index quals. I think I'd like to have a go at it myself. To some degree I'm trying to get a sense of how much that'd help you. -- Peter Geoghegan
On Tue, Aug 8, 2023 at 1:49 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > So we expect 1250 rows. If that was accurate, the index scan would have > to do 1250 heap fetches. It's just luck the index scan doesn't need to > do that. I don't this there's a chance to improve this costing - if the > inputs are this off, it can't do anything. Well, that depends. If we can find a way to make the bitmap index scan capable of doing something like the same trick through other means, in some other patch, then this particular problem (involving a simple inequality) just goes away. There may be other cases that look a little similar, with a more complicated expression, where it just isn't reasonable to expect a bitmap index scan to compete. Ideally, bitmap index scans will only be at a huge disadvantage when it just makes sense, due to the particulars of the expression. I'm not trying to make this your problem. I'm just trying to establish the general nature of the problem. > Also, I think this is related to the earlier discussion about maybe > costing it according to the worst case - i.e. as if we still needed > fetch the same number of heap tuples as before. Which will inevitably > lead to similar issues, with worse plans looking cheaper. Not in those cases where it just doesn't come up, because we can totally avoid visibility checks. As I said, securing that guarantee has the potential to make the costing a lot more reliable/easier to implement. > That is certainly true - I'm trying to keep the scope somewhat close to > the original goal. Obviously, there may be additional things the patch > really needs to consider, but I'm not sure this is one of those cases > (perhaps I just don't understand what the issue is - the example seems > like a run-of-the-mill case of poor estimate / costing). I'm not trying to impose any particular interpretation here. It's early in the cycle, and my questions are mostly exploratory. I'm still trying to develop my own understanding of the trade-offs in this area. -- Peter Geoghegan
On 8/8/23 23:03, Peter Geoghegan wrote: > On Tue, Aug 8, 2023 at 1:49 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> So we expect 1250 rows. If that was accurate, the index scan would have >> to do 1250 heap fetches. It's just luck the index scan doesn't need to >> do that. I don't this there's a chance to improve this costing - if the >> inputs are this off, it can't do anything. > > Well, that depends. If we can find a way to make the bitmap index scan > capable of doing something like the same trick through other means, in > some other patch, then this particular problem (involving a simple > inequality) just goes away. There may be other cases that look a > little similar, with a more complicated expression, where it just > isn't reasonable to expect a bitmap index scan to compete. Ideally, > bitmap index scans will only be at a huge disadvantage when it just > makes sense, due to the particulars of the expression. > > I'm not trying to make this your problem. I'm just trying to establish > the general nature of the problem. > >> Also, I think this is related to the earlier discussion about maybe >> costing it according to the worst case - i.e. as if we still needed >> fetch the same number of heap tuples as before. Which will inevitably >> lead to similar issues, with worse plans looking cheaper. > > Not in those cases where it just doesn't come up, because we can > totally avoid visibility checks. As I said, securing that guarantee > has the potential to make the costing a lot more reliable/easier to > implement. > But in the example you shared yesterday, the problem is not really about visibility checks. In fact, the index scan costing completely ignores the VM checks - it didn't matter before, and the patch did not change this. It's about the number of rows the index scan is expected to produce - and those will always do a random I/O, we can't skip those. >> That is certainly true - I'm trying to keep the scope somewhat close to >> the original goal. Obviously, there may be additional things the patch >> really needs to consider, but I'm not sure this is one of those cases >> (perhaps I just don't understand what the issue is - the example seems >> like a run-of-the-mill case of poor estimate / costing). > > I'm not trying to impose any particular interpretation here. It's > early in the cycle, and my questions are mostly exploratory. I'm still > trying to develop my own understanding of the trade-offs in this area. > Understood. I think this whole discussion is about figuring out these trade offs and also how to divide the various improvements into "minimum viable" changes. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 9, 2023 at 9:05 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > But in the example you shared yesterday, the problem is not really about > visibility checks. In fact, the index scan costing completely ignores > the VM checks - it didn't matter before, and the patch did not change > this. It's about the number of rows the index scan is expected to > produce - and those will always do a random I/O, we can't skip those. I wasn't really talking about that example here. As I see it, the problem from my example is that plain index scans had an "unnatural" advantage over bitmap index scans. There was no actual reason why the system couldn't just deal with the inequality on the second column uniformly, so that index scans and bitmap index scans both filtered out all non-matches inexpensively, without heap access. Then the costing could have been quite off, and it really wouldn't have mattered at runtime, because the index scan and bitmap index scan would do approximately the same thing in any case. As I've said, it's obviously also true that there are many other cases where there really will be a "natural" advantage for index scans, that bitmap index scans just cannot hope to offer. These are the cases where the mechanism from your patch is best placed to be the thing that avoids heap accesses, or maybe even avoid all visibility checks (despite not using true index quals). > Understood. I think this whole discussion is about figuring out these > trade offs and also how to divide the various improvements into "minimum > viable" changes. That's exactly how I see it myself. Obviously, there is still plenty of gray area here -- cases where it's not at all clear whether or not we should rely on the mechanism from your patch, or whether we should provide some alternative, more specialized mechanism. For example, I've made a lot out of simple != inequalities recently, but it's natural to wonder what that might mean for NOT IN ( ... ) SAOP inequalities. Am I also going to add specialized code that passes those down to the index AM? Where do you draw the line? I totally accept that there is a significant amount of gray area, and that that's likely to remain true for the foreseeable future. But I also believe that there is a small but important number of things that are either exactly black or exactly white. If we can actually firmly agree on what these other things are in days or weeks (which seems quite doable), then we'll have the right framework for figuring everything else out over time (possibly over multiple releases). We'll at least have the right shared vocabulary for discussing the problems, which is a very good start. I want to have a general structure that has the right general concepts in place from the start -- that's all. I also suspect that we'll discover that the large amount of gray area clauses/items are those that tend to be far less important than "exactly black" and "exactly white" items. So even if we can only agree that a small handful of things are in either category, that small handful will likely be very overrepresented in real world queries. For example, simple inequalities are very common -- it's surprising that nbtree can't already handle them directly. I should have thought of this myself, long ago, but it took your patch to force me to think about it. The problem with simple inequalities was "hiding in plain sight" for a very long time. Could there be anything else like that? -- Peter Geoghegan
On 8/8/23 22:54, Peter Geoghegan wrote: > On Tue, Aug 8, 2023 at 1:24 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >>> Assuming that that happens, then it immediately gives index scans a >>> huge advantage over bitmap index scans. At that point it seems >>> important to describe (in high level terms) where it is that the >>> advantage is innate, and where it's just because we haven't done the >>> required work for bitmap index scans. I became confused on this point >>> myself yesterday. Admittedly I should have been able to figure it out >>> on my own -- but it is confusing. >>> >> >> Yeah, I agree that might help a lot, particularly for tables that have a >> significant fraction of not-all-visible pages. > > It also has the potential to make the costing a lot easier in certain > important cases. Accurately deriving just how many heap accesses can > be avoided via the VM from the statistics that are available to the > planner is likely always going to be very difficult. Finding a way to > make that just not matter at all (in these important cases) can also > make it safe to bias the costing, such that the planner tends to favor > index scans (and index-only scans) over bitmap index scans that cannot > possibly eliminate any heap page accesses via an index filter qual. > Yes, if there's a way to safely skip the visibility check for some conditions, that would probably make the costing simpler. Anyway, I find this discussion rather abstract and I'll probably forget half the important cases by next week. Maybe it'd be good to build a set of examples demonstrating the interesting cases? We've already used a couple tenk1 queries for that purpose ... >> Right, and I'm not against improving that, but I see it more like an >> independent task. I don't think it needs (or should) to be part of this >> patch - skipping visibility checks would apply to IOS, while this is >> aimed only at plain index scans. > > I'm certainly not going to insist on it. Worth considering if putting > it in scope could make certain aspects of this patch (like the > costing) easier, though. > > I think that it wouldn't be terribly difficult to make simple > inequalities into true index quals. I think I'd like to have a go at > it myself. To some degree I'm trying to get a sense of how much that'd > help you. > I'm trying to make the patch to not dependent on such change. In a way, once a clause gets recognized as index qual, it becomes irrelevant for my patch. But the patch also doesn't get any simpler, because it still needs to do the same thing for the remaining quals. OTOH if there was some facility to decide if a qual is "safe" to be executed on the index tuple, that'd be nice. But as I already said, I see it more as an additional optimization, as it only applies to a subset of cases. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 9, 2023 at 10:00 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Anyway, I find this discussion rather abstract and I'll probably forget > half the important cases by next week. Maybe it'd be good to build a set > of examples demonstrating the interesting cases? We've already used a > couple tenk1 queries for that purpose ... That seems wise. I'll try to come up with a list of general principles with specific and easy to run examples later on today. > I'm trying to make the patch to not dependent on such change. In a way, > once a clause gets recognized as index qual, it becomes irrelevant for > my patch. But the patch also doesn't get any simpler, because it still > needs to do the same thing for the remaining quals. Practically speaking, I don't see any reason why you won't be able to sign off on the set of principles that I'll lay out for work in this area, while at the same time continuing with this patch more or less as originally planned. At one point I thought that your patch might be obligated to compensate for its tendency to push down OR-heavy clauses as expressions, leading to "risky" plans. While I still have a concern about that now, I'm not going to try to make it your problem. I'm now inclined to think of this as an existing problem, that your patch will increase the prevalence of -- but not to the extent that it makes sense to hold it up. To some extent it is up to me to put my money where my mouth is. I'm optimistic about the prospect of significantly ameliorating (if not fixing) the "risky OR expression plan" problem in the scope of my work on 17. But even if that doesn't quite happen (say we don't end up normalizing to CNF in the way that I've suggested for 17), we should at least reach agreement on the best way forward. If we could just agree that evaluating complicated OR expressions in index filters is much worse than finding a way to pass them down as an equivalent index qual (an SAOP), then I could live with it for another release or two. As I said, I mostly just care about having the right general principles in place at this point. > OTOH if there was some facility to decide if a qual is "safe" to be > executed on the index tuple, that'd be nice. But as I already said, I > see it more as an additional optimization, as it only applies to a > subset of cases. I'm happy to go with that. -- Peter Geoghegan
On 8/9/23 19:53, Peter Geoghegan wrote: > On Wed, Aug 9, 2023 at 10:00 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Anyway, I find this discussion rather abstract and I'll probably forget >> half the important cases by next week. Maybe it'd be good to build a set >> of examples demonstrating the interesting cases? We've already used a >> couple tenk1 queries for that purpose ... > > That seems wise. I'll try to come up with a list of general principles > with specific and easy to run examples later on today. > Cool. I'll try to build my own set of examples that I find interesting either because it's what the patch aims to help with, or because I expect it to be problematic for some reason. And then we can compare. >> I'm trying to make the patch to not dependent on such change. In a way, >> once a clause gets recognized as index qual, it becomes irrelevant for >> my patch. But the patch also doesn't get any simpler, because it still >> needs to do the same thing for the remaining quals. > > Practically speaking, I don't see any reason why you won't be able to > sign off on the set of principles that I'll lay out for work in this > area, while at the same time continuing with this patch more or less > as originally planned. > > At one point I thought that your patch might be obligated to > compensate for its tendency to push down OR-heavy clauses as > expressions, leading to "risky" plans. While I still have a concern > about that now, I'm not going to try to make it your problem. I'm now > inclined to think of this as an existing problem, that your patch will > increase the prevalence of -- but not to the extent that it makes > sense to hold it up. To some extent it is up to me to put my money > where my mouth is. > > I'm optimistic about the prospect of significantly ameliorating (if > not fixing) the "risky OR expression plan" problem in the scope of my > work on 17. But even if that doesn't quite happen (say we don't end up > normalizing to CNF in the way that I've suggested for 17), we should > at least reach agreement on the best way forward. If we could just > agree that evaluating complicated OR expressions in index filters is > much worse than finding a way to pass them down as an equivalent index > qual (an SAOP), then I could live with it for another release or two. > Yup, I agree with that principle. The AM can evaluate the expression in a smarter way, without the visibility checks. > As I said, I mostly just care about having the right general > principles in place at this point. > >> OTOH if there was some facility to decide if a qual is "safe" to be >> executed on the index tuple, that'd be nice. But as I already said, I >> see it more as an additional optimization, as it only applies to a >> subset of cases. > > I'm happy to go with that. > regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 9, 2023 at 11:15 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Cool. I'll try to build my own set of examples that I find interesting > either because it's what the patch aims to help with, or because I > expect it to be problematic for some reason. And then we can compare. That would be great. I definitely want to make this a collaborative thing. > Yup, I agree with that principle. The AM can evaluate the expression in > a smarter way, without the visibility checks. Attached text document (which I guess might be my first draft) is an attempt to put the discussion up until this point on a more formal footing. The format here tries to reduce the principles to a handful of bullet points. For example, one line reads: + Index quals are better than equivalent index filters because bitmap index scans can only use index quals I'm pretty sure that these are all points that you and I both agree on already. But you should confirm that. And make your own revisions, as you see fit. It's definitely possible that I overlooked an interesting and durable advantage that index filters have. If there is some other way in which index filters are likely to remain the best and only viable approach, then we should note that. I just went with the really obvious case of an expression that definitely needs visibility checks to avoid ever throwing a division-by-zero error, related to some invisible tuple. It's an extreme case in that it focuses on requirements that seem just about unavoidable in any future world. (Come to think of it, the biggest and most durable advantage for index filters is probably just how general they are, which I do mention.) -- Peter Geoghegan
Вложения
On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote: > + Index quals are better than equivalent index filters because bitmap > index scans can only use index quals It seems there's consensus that: * Index Filters (Tomas's patch and the topic of this thread) are more general, because they can work on any expression, e.g. 1/x, which can throw an error if x=0. * Index quals are more optimizable, because such operators are not supposed to throw errors or have side effects, so we can evaluate them before determining visibility. I wouldn't describe one as "better" than the other, but I assume you meant "more optimizable". It's interesting that there's overlap in utility between Tomas's current patch and Peter's work on optimizing SAOPs. But I don't see a lot of tension there -- it seems like Tomas's patch will always be useful for filters that might throw an error (or have other side effects). Is there any part of the design here that's preventing this patch from moving forward? If not, what are the TODOs for the current patch, or is it just waiting on more review? Regards, Jeff Davis
On Thu, Aug 17, 2023 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote: > > + Index quals are better than equivalent index filters because bitmap > > index scans can only use index quals > > It seems there's consensus that: > > * Index Filters (Tomas's patch and the topic of this thread) are more > general, because they can work on any expression, e.g. 1/x, which can > throw an error if x=0. Agreed, but with one small caveat: they're not more general when it comes to bitmap index scans, which seem like they'll only ever be able to support index quals. But AFAICT that's the one and only exception. > * Index quals are more optimizable, because such operators are not > supposed to throw errors or have side effects, so we can evaluate them > before determining visibility. Right. Though there is a second reason: the index AM can use them to navigate the index more efficiently with true index quals. At least in the case of SAOPs, and perhaps in other cases in the future. > I wouldn't describe one as "better" than the other, but I assume you > meant "more optimizable". The use of the term "better" is actually descriptive of Tomas' patch. It is structured in a way that conditions the use of index filters on the absence of equivalent index quals for eligible/indexed clauses. So it really does make sense to think of it as a "qual hierarchy" (to use Tomas' term). It's also true that it will always be fundamentally impossible to use index quals for a significant proportion of all clause types, so "better when feasible at all" is slightly more accurate. > Is there any part of the design here that's preventing this patch from > moving forward? If not, what are the TODOs for the current patch, or is > it just waiting on more review? Practically speaking, I have no objections to moving ahead with this. I believe that the general structure of the patch makes sense, and I don't expect Tomas to do anything that wasn't already expected before I chimed in. -- Peter Geoghegan
On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote: > > The only thing the patch does is it looks at clauses we decided not to > > treat as index quals, and do maybe still evaluate them on index. And I > > don't think I want to move these goalposts much further. > > Avoiding the need for visibility checks completely (in at least a > subset of cases) was originally your idea. I'm not going to insist on > it, or anything like that. It just seems like something that'll add a > great deal of value over what you have already. Another idea in this area occurred to me today: it would be cool if non-key columns from INCLUDE indexes could completely avoid visibility checks (not just avoid heap accesses using the visibility map) in roughly the same way that we'd expect with an equivalent key column already, today (if it was an index filter index qual). Offhand I think that it would make sense to do that outside of index AMs, by extending the mechanism from Tomas' patch to this special class of expression. We'd invent some other class of index filter that "outranks" conventional index filters when the optimizer can safely determine that they're "index filters with the visibility characteristics of true index quals". I am mostly thinking of simple, common cases here (e.g., an equality operator + constant). This might require the involvement of the relevant btree opclass, since that's where the no-visibility-check-safety property actually comes from. However, it wouldn't need to be limited to INCLUDE B-Tree indexes. You could for example do this with a GiST INCLUDE index that had no opclass information about the datatype/operator. That'd be a natural advantage of an approach based on index filters. -- Peter Geoghegan
On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 8/7/23 02:38, Peter Geoghegan wrote: > > This plan switchover isn't surprising in itself -- it's one of the most > > important issues addressed by my SAOP patch. However, it *is* a little > > surprising that your patch doesn't even manage to use "Index Filter" > > quals. It appears that it is only capable of using table filter quals. > > Obviously, the index has all the information that expression > > evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY > > ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement > > over master here. > Right. This happens because the matching of SAOP to indexes happens in > multiple places. Firstly, create_index_paths() matches the clauses to > the index by calling > > match_restriction_clauses_to_index > -> match_clauses_to_index > -> match_clause_to_index > > Which is where we also decide which *unmatched* clauses can be filters. > And this *does* match the SAOP to the index key, hence no index filter. > > But then we call get_index_paths/build_index_path a little bit later, > and that decides to skip "lower SAOP" (which seems a bit strange, > because the column is "after" the equality, but meh). Anyway, at this > point we already decided what's a filter, ignoring the index clauses, > and not expecting any backsies. > > The simples fix seems to be to add these skipped SAOP clauses as > filters. We know it can be evaluated on the index ... Update on this: I recently posted v2 of my patch, which completely removes build_index_paths's "skip_lower_saop" mechanism. This became possible in v2 because it fully eliminated all of the advantages that SOAP style filter quals might have had over true index quals, through further enhancements on the nbtree side. There is simply no reason to generate alternative index paths with filter quals in the first place. (As I seem to like to say, "choice is confusion".) In short, v2 of my patch fully adheres to the principles set out in the "qual hierarchy" doc. The planner no longer needs to know anything about how nbtree executes SAOP index quals, except when costing them. To the planner, there is pretty much no difference between "=" and "= ANY()" (for index AMs that natively support SAOP execution). I imagine that this general planner structure will be ideal for your patch. If I'm not mistaken, it will allow you to completely avoid treating SAOPs as a special case. Although the build_index_paths "skip_lower_saop" thing might have created issues for the approach your patch takes in the planner, that seems to me to work best as an argument against the "skip_lower_saop" mechanism -- it was always a kludge IMV. -- Peter Geoghegan
Hi, It took me a while but I finally got back to working on this patch. I reread the summary of principles Peter shared in August, and I do agree with all the main points - the overall qual hierarchy and the various examples and counter-examples. I'll respond to a couple things in this sub-thread, and then I plan to post an updated version of the patch with a discussion of a couple problems I still haven't managed to solve (not necessarily new ones). On 8/19/23 00:19, Peter Geoghegan wrote: > On Thu, Aug 17, 2023 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote: >> On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote: >>> + Index quals are better than equivalent index filters because bitmap >>> index scans can only use index quals >> >> It seems there's consensus that: >> >> * Index Filters (Tomas's patch and the topic of this thread) are more >> general, because they can work on any expression, e.g. 1/x, which can >> throw an error if x=0. > > Agreed, but with one small caveat: they're not more general when it > comes to bitmap index scans, which seem like they'll only ever be able > to support index quals. But AFAICT that's the one and only exception. > Yeah. Some conditions are never gonna be translated into proper index quals, either because it's not really possible or because no one just implemented that. FWIW it's not immediately obvious to me if bitmap index scans are unable to support index filters because of some inherent limitation, or whether it's something we could implement. Some index AMs simply can't support index filters, because they don't know the "full" index tuple tuple (e.g. GIN), but maybe other AMs could support that ... >> * Index quals are more optimizable, because such operators are not >> supposed to throw errors or have side effects, so we can evaluate them >> before determining visibility. > > Right. Though there is a second reason: the index AM can use them to > navigate the index more efficiently with true index quals. At least in > the case of SAOPs, and perhaps in other cases in the future. > >> I wouldn't describe one as "better" than the other, but I assume you >> meant "more optimizable". > > The use of the term "better" is actually descriptive of Tomas' patch. > It is structured in a way that conditions the use of index filters on > the absence of equivalent index quals for eligible/indexed clauses. So > it really does make sense to think of it as a "qual hierarchy" (to use > Tomas' term). > > It's also true that it will always be fundamentally impossible to use > index quals for a significant proportion of all clause types, so > "better when feasible at all" is slightly more accurate. > Yeah, I agree with all of this too. I think that in all practical cases an index qual is strictly better than an index filter, for exactly these reasons. >> Is there any part of the design here that's preventing this patch from >> moving forward? If not, what are the TODOs for the current patch, or is >> it just waiting on more review? > > Practically speaking, I have no objections to moving ahead with this. > I believe that the general structure of the patch makes sense, and I > don't expect Tomas to do anything that wasn't already expected before > I chimed in. > I agree the general idea is sound. The main "problem" in the original patch was about costing changes and the implied risk of picking the wrong plan (instead of e.g. a bitmap index scan). That's pretty much what the "risk" in example (4) in the principles.txt is about, AFAICS. I think the consensus is that if we don't change the cost, this issue mostly goes away - we won't pick index scan in cases where we'd pick a different plan without the patch, and for index scans it's (almost) always correct to replace "table filter" with "index filter". If that issue goes away, I think there are two smaller ones - picking which conditions to promote to index filters, and actually tweaking the index scan code to do that. The first issue seems similar to the costing issue - in fact, it really is a costing problem. The goal is to minimize the amount of work needed to evaluate the conditions on all rows, and if we knew selectivity+cost for each condition, it'd be a fairly simple matter. But in practice we don't know this - the selectivity estimates are rather shaky (and doubly so for correlated conditions), and the procedure cost estimates are quite crude too ... The risk is we might move "forward" very expensive condition. Imagine a condition with procost=1000000, which would normally be evaluated last after all other clauses. But if we promote it to an index filter, that'd no longer be true. What Jeff proposed last week was roughly this: - find min(procost) for clauses that can't be index filters - consider all clauses with procost <= min(procost) as index filters But after looking at this I realized order_qual_clauses() does a very similar thing for regular clauses, and the proposed logic contradicts the heuristics order_qual_clauses() a bit. In particular, order_qual_clauses() orders the clauses by procost, but then it's careful not to reorder the clauses with the same procost. With the proposed heuristics, that'd change - the clauses might get reordered in various ways, possibly with procost values [1, 10, 100, 1, 10, 100] or something like that. Not sure if we want to relax the heuristics like this. There's also the practical issue that order_qual_clauses() gets called quite late, long after the index filters were built. And it also considers security levels, which I ignored until now. But now that I think about it, with the original patch it had to be decided when building the path, because that's when the costing happens. With the costing changes removed, we can do probably do that much later while creating the plan, after order_qual_clauses() does all the work. We could walk the qpqual list, and stop on the first element that can't be treated as index filter. That'd also handle the security levels, and it'd *almost* do what Jeff proposed, I think. The main difference is that it'd not reorder clauses with the same procost. With multiple clauses with the same procost, it'd depend on which one happens to be first in the list, which AFAIK depends on the order of conditions in the query. That may be a bit surprising, I guess, because it seems natural that in a declarative language this should not really matter. Yes, we already do that, but it probably does not have huge impact. If it affects whether a condition will be treated as an index filter, the differences are likely to be much more significant. (FWIW this reminds me the issues we had with GROUP BY optimization, which ultimately got reverted because the reordering was considered too risky. I'm afraid we might end in the same situation ...) As for the other issue, that's more about how to deal with the various quals in the generic scan code. Imagine we have multiple conditions, and and some of them can be treated as index filters. So for example we may end up with this: quals = [A, B, C, D] index-filters = [A, B] And now a tuple can fall into three categories: a) all-visible=false, i.e. can't use index filter => quals b) all-visible=true, and index-filters evaluates to false c) all-visible=true, and index-filters evaluates to true The first two cases are fine, but for (c) we need to also evaluate the remaining non-filter quals [C, D]. We may build such list, and the 0002 patch does that, and shows the list in explain. But where/how would we evaluate it? The code in execScan.c uses the ScanState->ps.qual, which is set to [A,B,C,D]. Which is correct, but it does more work than strictly necessary - the index filters are evaluated again. We can't easily change that to [C,D] -> that would break the (a) case. Which quals are evaluated on the heap tuple depends on visibility map and on the index-filter result. I think there's two options. First, we may accept that some of index filters may get evaluated twice. That's not great, because it increases the risk of regressions. Second, we could disable ScanState->ps.quals if there are index filters, and just do all the work info nodeIndexscan. But that seems quite ugly - in a way, the code already does that, which is where the two loops while (true) { for (;;) { ... } } come from. I was hoping to simplify / get rid of this, not making it do even more stuff ... :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 8/19/23 02:49, Peter Geoghegan wrote: > On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote: >>> The only thing the patch does is it looks at clauses we decided not to >>> treat as index quals, and do maybe still evaluate them on index. And I >>> don't think I want to move these goalposts much further. >> >> Avoiding the need for visibility checks completely (in at least a >> subset of cases) was originally your idea. I'm not going to insist on >> it, or anything like that. It just seems like something that'll add a >> great deal of value over what you have already. > > Another idea in this area occurred to me today: it would be cool if > non-key columns from INCLUDE indexes could completely avoid visibility > checks (not just avoid heap accesses using the visibility map) in > roughly the same way that we'd expect with an equivalent key column > already, today (if it was an index filter index qual). Offhand I think > that it would make sense to do that outside of index AMs, by extending > the mechanism from Tomas' patch to this special class of expression. > We'd invent some other class of index filter that "outranks" > conventional index filters when the optimizer can safely determine > that they're "index filters with the visibility characteristics of > true index quals". I am mostly thinking of simple, common cases here > (e.g., an equality operator + constant). > > This might require the involvement of the relevant btree opclass, > since that's where the no-visibility-check-safety property actually > comes from. However, it wouldn't need to be limited to INCLUDE B-Tree > indexes. You could for example do this with a GiST INCLUDE index that > had no opclass information about the datatype/operator. That'd be a > natural advantage of an approach based on index filters. > I haven't really thought about INCLUDE columns at all. I agree it seems doable and possibly quite useful, and doing it outside the index AM seems about right. The index does not know about the opclass for these columns, it just stores them, why/how should it be responsible to do something with it? And as you said, it seems like a (fairly natural?) extension of the patch discussed in this thread. That being said, I don't intend to make this work in v1. Once the other issues get solved in some way, I may try hacking a WIP, but mostly to try if there's not some design issue that'd make it hard in the future. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company