Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
От | James Inform |
---|---|
Тема | Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected. |
Дата | |
Msg-id | 48e87ec7-4371-3524-79d0-18439640ddc0@pharmapp.de обсуждение исходный текст |
Ответ на | Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected. (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-bugs |
Thank you David for this detailled explanation. Makes things very clear. David Rowley schrieb am 20.08.23 um 23:04: > On Mon, 21 Aug 2023 at 06:26, James Inform <james.inform@pharmapp.de> wrote: >> Just looked at the results again. >> >> They look strange. When I execute the query multiple times I get three >> kinds of results: >> - NO match on all columns >> - THE SAME match on all columns >> - THREE DIFFERENT columns that are repeated for all the 1000 rows. >> >> With your explanation there should be randomly assigned rows appearing, >> but the seems to be a pattern. At least more than those three different >> ones. >> >> Have you executed my example and looked at the results? >> If not, please give it a try. Nothing looks random there. > You might expect that the "link_id = 1 + (random()*500)::int8" is > evaluated as a join qual because you put it in the ON clause, but > PostgreSQL will distribute these quals to the lowest location that > they can be evaluated. Since the only column that's mentioned in your > join expression belongs to the "link" table, then the qual is > evaluated at the scan level for that relation. You'll notice this if > you look at the EXPLAIN output. > > The reason you sometimes get no matches is simply that on that > execution of the query, the random number didn't happen to line up > with any of the particular link_ids on any of the scanned tuples. > > The reason you sometimes get the same match is that 1 tuple happened > to match the random number during the scan, and that tuple was joined > to 1000 times on your effective clauseless join. > > The reason you see three different columns being matches it is that 3 > tuples happened to match your random expression during the scan and > the clauseless join joined all three, resulting in 3000 rows rather > than 1000 rows in the final output. > > The link table is only scanned once due to the Material node in the > Nested Loop join. If you did: SET enable_material = off; then the > scan would be performed once per row in the "source" table. That > would mean the random() function would be executed 1 million times > instead of 1 thousand times. > > It might take you a while, but if you tried enough times, all the > planets would align and "link_id = 1 + (random()*500)::int8" would > happen to match all tuples during the scan. The query would then > return 1 million rows. > > PostgreSQL wouldn't have pushed your ON qual down to the scan level if > you'd included some column from the "source" table in the expression. > I'm not sure what good it'd do you, but you'd see different results > using something like "link_id = source_id * 0 + 1 + > (random()*5)::int8" (note the multiplication by 0) > > David
В списке pgsql-bugs по дате отправления: