Re: FETCH FIRST clause PERCENT option
От | Mark Dilger |
---|---|
Тема | Re: FETCH FIRST clause PERCENT option |
Дата | |
Msg-id | C33C2805-2664-4687-B554-3372157F0A59@gmail.com обсуждение исходный текст |
Ответ на | Re: FETCH FIRST clause PERCENT option (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
> On Sep 20, 2018, at 5:29 PM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2018-09-20 17:06:36 -0700, Mark Dilger wrote: >> I should think that spilling anything to a tuplestore would only be needed >> if the query contains an ORDER BY expression. If you query >> >> FETCH FIRST 50 PERCENT * FROM foo; >> >> you should just return every other row, discarding the rest, right? It's >> only when an explicit ordering is given that the need to store the results >> arises. Even with >> >> FETCH FIRST 50 PERCENT name FROM foo ORDER BY name; >> >> you can return one row for every two rows that you get back from the >> sort node, reducing the maximum number you need to store at any time to >> no more than 25% of all rows. > > I'm doubtful about the validity of these optimizations, particularly > around being surprising. But I think more importantly, we should focus > on the basic implementation that's needed anyway. You may be right that getting the basic implementation finished first is better than optimizing at this stage. So the rest of what I'm going to say is just in defense of the optimization, and not an argument for needing to optimize right away. As for reducing the surprise factor, I think that it would be surprising if I ask for a smallish percentage of rows and it takes significantly longer and significantly more memory or disk than asking for all the rows takes. If I'm including an explicit ORDER BY, then that explains it, but otherwise, I'd be surprised. Note that I'm not saying I'd be surprised by it taking roughly the same length of time / memory / disk. I'd only be surprised if it took a lot more. There are plenty of SQL generation engines that people put in their software. I'd expect something like sprintf("FETCH FIRST %d PERCENT %s FROM %s", percentage, columns, tablename) to show up in such engines, and percentage to sometimes be 100. At least in that case you should just return all rows rather than dumping them into a tuplestore. Likewise, if the percentage is 0, you'll want to finish quickly. Actually, I don't know if the SQL spec would require side effects to still happen, in which case you'd still have to generate all rows for their side effects to happen, and then just not return them. But still, no tuplestore. So the implementation of FETCH FIRST would at least need to think about what percentage is being requested, rather than just mindlessly adding a node to the tree for storing everything, then computing the LIMIT based on the number of rows stored, and then returning that number of rows. mark
В списке pgsql-hackers по дате отправления: