Re: [HACKERS] Slow count(*) again...

Поиск

Список

Период

Сортировка

От	david@lang.hm
Тема	Re: [HACKERS] Slow count(*) again...
Дата	3 февраля 2011 г. 21:37:28
Msg-id	alpine.DEB.2.00.1102031732470.30983@asgard.lang.hm обсуждение исходный текст
Ответ на	Re: [HACKERS] Slow count(*) again... (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: [HACKERS] Slow count(*) again...
Список	pgsql-performance

Дерево обсуждения

On Thu, 3 Feb 2011, Robert Haas wrote:

> On Thu, Feb 3, 2011 at 7:39 PM,  <david@lang.hm> wrote:
>>> Yeah, but you'll be passing the entire table through this separate
>>> process that may only need to see 1% of it or less on a large table.
>>> If you want to write the code and prove it's better than what we have
>>> now, or some other approach that someone else may implement in the
>>> meantime, hey, this is an open source project, and I like improvements
>>> as much as the next guy.  But my prediction for what it's worth is
>>> that the results will suck.  :-)
>>
>> I will point out that 1% of a very large table can still be a lot of disk
>> I/O that is avoided (especially if it's random I/O that's avoided)
>
> Sure, but I think that trying to avoid it will be costly in other ways
> - you'll be streaming a huge volume of data through some auxiliary
> process, which will have to apply some algorithm that's very different
> from the one we use today.  The reality is that I think there's little
> evidence that the way we do ANALYZE now is too expensive.  It's
> typically very cheap and works very well.  It's a bit annoying when it
> fires off in the middle of a giant data load, so we might need to
> change the time of it a little, but if there's a problem with the
> operation itself being too costly, this is the first I'm hearing of
> it.  We've actually worked *really* hard to make it cheap.

I could be misunderstanding things here, but my understanding is that it's
'cheap' in that it has little impact on the database while it is running.

the issue here is that the workflow is

load data
analyze
start work

so the cost of analyze in this workflow is not "1% impact on query speed
for the next X time", it's "the database can't be used for the next X time
while we wait for analyze to finish running"

I don't understand why the algorithm would have to be so different than
what's done today, surely the analyze thread could easily be tweaked to
ignore the rest of the data (assuming we don't have the thread sending the
data to analyze do the filtering)

David Lang

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Slow count(*) again...