Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
От | Peter Geoghegan |
---|---|
Тема | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Дата | |
Msg-id | CAH2-Wz=vA9F1yS2gsc==EbV7rgJ21YyV0MEtV-Q_ELN9QPOD6g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
(Alexander Lakhin <exclusion@gmail.com>)
|
Список | pgsql-hackers |
On Mon, Apr 1, 2024 at 6:33 PM Peter Geoghegan <pg@bowt.ie> wrote: > Note: v18 doesn't have any adjustments to the costing, as originally > planned. I'll probably need to post a revised patch with improved (or > at least polished) costing in the next few days, so that others will > have the opportunity to comment before I commit the patch. Attached is v19, which dealt with remaining concerns I had about the costing in selfuncs.c. My current plan is to commit this on Saturday morning (US Eastern time). I ultimately concluded that selfuncs.c shouldn't be doing anything to cap the estimated number of primitive index scans for an SAOP clause, unless doing so is all but certain to make the estimate more accurate. And so v19 makes the changes in selfuncs.c much closer to the approach used to cost SOAP clauses on master. In short, v19 is a lot more conservative in the changes made to selfuncs.c. v19 does hold onto the idea of aggressively capping the estimate in extreme cases -- cases where the estimate begins to approach the total number of leaf pages in the index. This is clearly safe (it's literally impossible to have more index descents than there are leaf pages in the index), and probably necessary given that index paths with SAOP filter quals (on indexed attributes) will no longer be generated by the planner. To recap, since nbtree more or less behaves as if scans with SAOPs are one continuous index scan under the new design from patch, it was tempting to make code in genericcostestimate/btcostestimate treat it like that, too (the selfuncs.c changes from earlier versions tried to do that). As I said already, I've now given up on that approach in v19. This does mean that genericcostestimate continues to apply the Mackert-Lohman formula for btree SAOP scans. This is a bit odd in a world where nbtree specifically promises to not repeat any leaf page accesses. I was a bit uneasy about that aspect for a while, but ultimately concluded that it wasn't very important in the grand scheme of things. The problem with teaching code in genericcostestimate/btcostestimate to treat nbtree scans with SAOPs are one continuous index scan is that it hugely biases genericcostestimate's simplistic approach to estimating numIndexPages. genericcostestimate does this by simply prorating using numIndexTuples/index->pages/index->tuples. That naive approach probably works alright with simple scalar operators, but it's not going to work with SAOPs directly. I can't think of a good way of estimating numIndexPages with SAOPs, and suspect that the required information just isn't available. It seems best to treat it as a possible area for improvement in a later release. The really important thing for v17 is that we never wildly overestimate the number of descents when multiple SAOPs are used, each with a medium or large array. v19 also makes sure that genericcostestimate doesn't allow a non-boundary SAOP clause/non-required array scan key to affect num_sa_scans -- it'll just use the num_sa_scans values used by btcostestimate in v19. This makes sense because nbtree is guaranteed to not start new primitive scans for these sorts of scan keys (plus there's no need to calculate num_sa_scans twice, in two places, using two slightly different approaches). -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: