Re: pg16: XX000: could not find pathkey item to sort
От | David Rowley |
---|---|
Тема | Re: pg16: XX000: could not find pathkey item to sort |
Дата | |
Msg-id | CAApHDvrAHkghCDNFG5+K2adWCDCC9cdZT2KZamVRkqYy31kL9w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg16: XX000: could not find pathkey item to sort (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: pg16: XX000: could not find pathkey item to sort
Re: pg16: XX000: could not find pathkey item to sort |
Список | pgsql-hackers |
On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofenglinux@gmail.com> wrote: > On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowleyml@gmail.com> wrote: >> >> So in short, I propose the attached fix without any regression tests >> because I feel that any regression test would just mark that there was >> a big in create_agg_path() and not really help with ensuring we don't >> end up with some similar problem in the future. > > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > are computable from the targetlist, it seems that we do not need to trim > them off, because prepare_sort_from_pathkeys() will add resjunk target > entries for them. But it's also no harm if we trim them off. So I > think the patch is a pretty safe fix. +1 to it. hmm, I think one of us does not understand what is going on here. I tried to explain in [1] why we *need* to strip off the pathkeys added by adjust_group_pathkeys_for_groupagg(). Given the following example: create table ab (a int,b int); explain (costs off) select a,count(distinct b) from ab group by a; QUERY PLAN ---------------------------- GroupAggregate Group Key: a -> Sort Sort Key: a, b -> Seq Scan on ab (5 rows) adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b" column and that results in the Sort node sorting on {a,b}. It's simply not at all valid to have the GroupAggregate path claim that its pathkeys are also (effectively) {a,b}" as "b" does not and cannot legally exist after the aggregation takes place. We cannot put a resjunk "b" in the targetlist of the GroupAggregate either as there could be any number "b" values aggregated. Can you explain why you think we can put a resjunk "b" in the target list of the GroupAggregate in the above case? >> >> I have some concerns that the assert_pathkeys_in_target() function >> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm >> not proposing to commit that without further discussion. > > > Yeah, it looks like some heavy to call assert_pathkeys_in_target() for > each path node. Can we run some benchmarks to see how much overhead it > would bring to USE_ASSERT_CHECKING build? I think it'll be easy to show that there is an overhead to it. It'll be in the realm of the ~41ms patched vs ~24ms unpatched that I showed in [2]. That's quite an extreme case. Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". David [1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com [2] https://postgr.es/m/CAApHDvo7RzcQYw-gnkZr6QCijCqf8vJLkJ4XFk-KawvyAw109Q@mail.gmail.com
В списке pgsql-hackers по дате отправления: