Re: [PERFORM] Hash Anti Join performance degradation
От | Robert Haas |
---|---|
Тема | Re: [PERFORM] Hash Anti Join performance degradation |
Дата | |
Msg-id | BANLkTim-DqDC2AbVJ_1t-XAS4NYq2tQYZg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PERFORM] Hash Anti Join performance degradation (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [PERFORM] Hash Anti Join performance degradation
Re: [PERFORM] Hash Anti Join performance degradation |
Список | pgsql-hackers |
On Tue, May 31, 2011 at 11:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> With respect to the root of the issue (why does the anti-join take so >> long?), my first thought was that perhaps the OP was very unlucky and >> had a lot of values that hashed to the same bucket. But that doesn't >> appear to be the case. > > Well, yes it is. Notice what the subquery is doing: for each row in > "box", it's pulling all matching "box_id"s from message and running a > self-join across those rows. The hash join condition is a complete > no-op. And some of the box_ids have hundreds of thousands of rows. > > I'd just write it off as being a particularly stupid way to find the > max(), except I'm not sure why deleting just a few thousand rows > improves things so much. It looks like it ought to be an O(N^2) > situation, so the improvement should be noticeable but not amazing. Yeah, this is what I was getting at, though perhaps I didn't say it well. If the last 78K rows were particularly pathological in some way, that might explain something, but as far as one can see they are not a whole heck of a lot different from the rest of the data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: