Re: [SQL] JOIN index/sequential select problem
От | Tom Lane |
---|---|
Тема | Re: [SQL] JOIN index/sequential select problem |
Дата | |
Msg-id | 16163.926716133@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [SQL] JOIN index/sequential select problem (Ole Gjerde <gjerde@icebox.org>) |
Список | pgsql-sql |
Ole Gjerde <gjerde@icebox.org> writes: >> How big did you say these tables were? > The av_parts table has about 4 million rows, while inventorysuppliers only > has ~200 rows. Ah! That's a critical bit of information! It sure looks to me like the optimizer did not know that when it was producing the plan you showed earlier; so it used a plan that looked OK but actually was pretty awful. > I ran vacuum again, and the query is done instantly, however the cost seem > a bit high, no? > Hash Join (cost=31373.53 rows=7218 width=100) > -> Index Scan using av_parts_rawpartnumber_index on av_parts (cost=31313.53 rows=1186 width=60) > -> Hash (cost=11.93 rows=210 width=40) > -> Seq Scan on inventorysuppliers (cost=11.93 rows=210 width=40) No, that's probably about right --- bear in mind that the optimizer is only making guesses about how many hits there will be for an index restriction. Here, we can see that it's guessing 1186 hits in the av_parts table, which is pretty darn selective for a 4-million-row table (about 0.03%), even though really there might be only a few matches. Note that this doesn't have anything to do with how many rows will actually match your given restrictionAV_Parts.RawPartNumber LIKE '6890040%' Instead, it's an estimate based on looking at the additional index-compatible restrictions that the parser manufactured from that clause, namelyAV_Parts.RawPartNumber >= '6890040'ANDAV_Parts.RawPartNumber <= '6890040\377' and trying to guess how many rows will be scanned between those lower and upper index limits. There's going to be an awful lot of fuzz in any such estimate. However, the optimizer drew the right conclusion, namely that this way is probably cheaper than any other way. You'll recall that before, it was estimating only two cost units for an index scan on av_parts, which is obviously silly for a 4-million-row table. (The cost units are intended to be roughly 1 unit = 1 disk block access.) So it thought it could get away with doing that as the inner loop of a nested-loop join. Now that it knows scanning av_parts is *not* cheap compared to scanning inventorysuppliers, it's making better decisions. You are probably used to looking at small cost estimates from your experience with small or unvacuumed tables. Now that the optimizer knows how big av_parts really is, all its estimates for queries involving that table will be large --- but they're all relative anyway. regards, tom lane
В списке pgsql-sql по дате отправления: