Re: Subselect query enhancement
От | Richard Huxton |
---|---|
Тема | Re: Subselect query enhancement |
Дата | |
Msg-id | 45C23950.4070803@archonet.com обсуждение исходный текст |
Ответ на | Re: Subselect query enhancement ("Michael Artz" <mlartz@gmail.com>) |
Список | pgsql-performance |
Michael Artz wrote: >> > I have primary table that holds ip information >> > and two other tables that hold event data for the specific IP in with >> > a one-to-many mapping between them, ie: >> [snip] >> > There is quite a bit of commonality between the network_events and >> > host_events schemas, but they do not currently share an ancestor. >> > ip_info has about 13 million rows, the network_events table has about >> > 30 million rows, and the host_events table has about 7 million rows. >> > There are indexes on all the rows. >> >> What indexes though. Do you have (name,ip) on the two event tables? > > All the columns are indexed individually. The tables are completely > static, as I reload the whole DB with new data every day. The point of a (name,ip) index would be to let you read off ip numbers in order easily. >> How selective is "name" - are there many different values or just a few? >> If lots, it might be worth increasing the statistics gathered on that >> column (ALTER COLUMN ... SET STATISTICS). >> http://www.postgresql.org/docs/8.2/static/sql-altertable.html > > I guess that is the heart of my question. "name" is not very > selective (there are only 20 or so choices) however other columns are > fairly selective for certain cases, such as 'port'. When querying on > and unusual port, the query is very fast, and the single UNIONed > subselect returns quickly. When 'port' is not very selective (like > port = '80', which is roughly 1/2 of the rows in the DB), the dual > subselect query wins, hands-down. > > And I have altered the statistics via the config file: > default_statistics_target = 100 > Perhaps this should be even higher for certain columns? You're probably better off leaving it at 10 and upping it for the vital columns. 25 for names should be a good choice. You could try partial indexes for those cases where you have particularly common values of name/port: CREATE INDEX idx1 ON host_events (ip) WHERE port=80; -- Richard Huxton Archonet Ltd
В списке pgsql-performance по дате отправления: