Re: Large table search question
От | Greg Stark |
---|---|
Тема | Re: Large table search question |
Дата | |
Msg-id | 874qpv8czt.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: Large table search question (Richard Huxton <dev@archonet.com>) |
Ответы |
Re: Large table search question
|
Список | pgsql-general |
Richard Huxton <dev@archonet.com> writes: > If you execute a lot of queries for last_name="Smith" AND first_name="John" > then it might well help, there are a lot of "Smith"s to choose from. I think this is the crux of the argument. Even for a common last name like Smith, you're going to be talking about what, a few thousand records? Probably selective enough that the extra column isn't really necessary and you pay the cost for that extra column on every single update and even on other lookups. On the other hand this logic applies best to DSS style databases where you're looking for the fastest average throughput. For OLTP databases it may not hold: if 'Smith' breaks your performance guarantee then the application could break. For systems like that it may be worth paying a 1% penalty everywhere that's within your time budget to avoid paying a 100% penalty on the rare query that would cause failures, even if on average that means a performance loss. In practice I find two column indexes are not uncommon, especially on many-to-many relationship tables. Three column indexes are rare but they do happen. Offhand I don't ever recall defining any indexes with four or more columns. > There's really no alternative to testing. The statistics tables are very useful > here. Unless you have good reason not to, always turn the statistics gathering > on, and take snapshot regularly to keep an eye on where PG is exerting the most > effort. IMHO many people rely too heavily on empirical data rather than having a good sense of what they want to be happening. Statistics can obscure situations like what I described above. I do have statistics on though, and have the same thinking about always leaving it on, but I'm unclear how to make use of these data. What tools should I be looking at to use them? -- greg
В списке pgsql-general по дате отправления: