Re: Hardware upgrade for a high-traffic database
От | Tom Lane |
---|---|
Тема | Re: Hardware upgrade for a high-traffic database |
Дата | |
Msg-id | 23019.1092276554@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hardware upgrade for a high-traffic database ("Jason Coene" <jcoene@gotfrag.com>) |
Список | pgsql-performance |
"Jason Coene" <jcoene@gotfrag.com> writes: > We have a lot of: > SELECT whatever > FROM ourtable > WHERE field1 = X > AND field2 = Y > AND field3 = Z > ORDER BY id DESC > LIMIT 5 > With indexes: > ourtable(id) > ourtable(field1, field2, field3) > Is it standard procedure with postgres to include any fields listed in WHERE > in the ORDER BY, and create a single index for only the ORDER BY fields (in > order of appearance, of course)? It depends. If the X/Y/Z constraint is already pretty selective, then it seems sufficient to me to pick up the matching rows (using the 3-field index), sort them by id, and take the first 5. The case where the extra-index-column trick is useful is where the WHERE clause *isn't* real selective and so a lot of rows would have to be sorted. In your previous example, I imagine you have a lot of prolific posters and so "all posts by userid 42" can be a nontrivial set. The double-column index lets you skip the sort and just pull out the required rows by scanning from the end of the range of userid = 42 entries. > gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY > timestamp DESC LIMIT 5; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------------------------------- > Limit (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317 > rows=5 loops=1) > -> Sort (cost=1608.43..1609.45 rows=407 width=8) (actual > time=0.287..0.295 rows=5 loops=1) > Sort Key: "timestamp" > -> Index Scan using comments_ix_userid on comments > (cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35 > loops=1) > Index Cond: (userid = 51) > Total runtime: 0.375 ms > (6 rows) This example looks fine, but since userid 51 evidently only has 35 posts, there's not much time needed to read 'em all and sort 'em. The place where the double-column index will win big is on userids with hundreds of posts. You have to keep in mind that each index costs time to maintain during inserts/updates. So adding an index just because it makes a few queries a little faster probably isn't a win. You need to make tradeoffs. regards, tom lane
В списке pgsql-performance по дате отправления: