Re: extremly bad select performance on huge table
От | Björn Wittich |
---|---|
Тема | Re: extremly bad select performance on huge table |
Дата | |
Msg-id | 5446B4B3.4030802@gmx.de обсуждение исходный текст |
Ответ на | Re: extremly bad select performance on huge table (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: extremly bad select performance on huge table
|
Список | pgsql-performance |
Hi Tom and Igor, thanks for your help. With the reindex the select query running time was reduced from 5200 sec to 130 sec. Impressive! Even a join on this table is now fast. Unfortunately, there is now another problem: The table in my example has 500 columns which I want to retrieve with my join command. Example which is fast "select value from smallertable inner join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn" Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn" Which is the number of columns to fetch so bad ? Which action is done in the db system when querying this via pgadmin? I think that there is no real retrieval included, why is the number of additional columns so bad for the join performance? > =?ISO-8859-15?Q?Bj=F6rn_Wittich?= <Bjoern_Wittich@gmx.de> writes: >> Here is the explain (analyze,buffers) select mycolumn from myhugetable >> "Index Only Scan using myprimkey on myhugetable (cost=0.00..8224444.82 >> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999 >> loops=1)" >> " Heap Fetches: 356861" >> " Buffers: shared hit=71799472 read=613813" >> "Total runtime: 2503009.611 ms" > So that works out to about 4 msec per page fetched considering only I/O > costs, which is about as good as you're likely to get if the data is > sitting on spinning rust. > > You could potentially make it faster with a VACUUM (to mark all pages > all-visible and eliminate the "heap fetches" costs), or a REINDEX > (so that the index scan becomes more nearly sequential instead of random > access). However, unless the data is nearly static those will just be > temporary fixes: the time will degrade again as you update the table. > >> Note: This select is just for testing. My final statement will be a join >> on this table via the "mycolumn" column. > In that case it's probably a waste of time to worry about the performance > of this query as such. In the first place, a join is not likely to use > the index at all unless it's fetching a relatively small number of rows, > and in the second place it seems unlikely that the join query can use > an IndexOnlyScan on this index --- I imagine that the purpose of the join > will require fetching additional columns. > > regards, tom lane > >
В списке pgsql-performance по дате отправления: