Re: Performance

Поиск
Список
Период
Сортировка
От	Tomas Vondra
Тема	Re: Performance
Дата	12 апреля 2011 г. 19:37:07
Msg-id	4DA4D40A.4010200@fuzzy.cz обсуждение исходный текст
Ответ на	Re: Performance (Ogden <lists@darkstatic.com>)
Ответы	Re: Performance
Список	pgsql-performance
Дерево обсуждения
Dne 12.4.2011 23:19, Ogden napsal(a):
>
> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>
>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>
>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>
>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>
>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>
>>>>>> Ogden <lists@darkstatic.com> wrote:
>>>>>>
>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>
>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>
>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>
>>>>>>> max_connections = 350
>>>>>>> shared_buffers = 4096MB
>>>>>>> work_mem = 32MB
>>>>>>> maintenance_work_mem = 512MB
>>>>>>
>>>>>> That's okay.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>> random_page_cost = 0.03
>>>>>>
>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>> and random_page_cost) are completly wrong.
>>>>>>
>>>>>
>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>> find by raising them to:
>>>>>
>>>>> seq_page_cost = 1.0
>>>>> random_page_cost = 3.0
>>>>> cpu_tuple_cost = 0.3
>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>> effective_cache_size = 8192MB
>>>>>
>>>>> That this is better, some queries run much faster. Is this better?
>>>>
>>>> I guess it is. What really matters with those cost variables is the
>>>> relative scale - the original values
>>>>
>>>> seq_page_cost = 0.02
>>>> random_page_cost = 0.03
>>>> cpu_tuple_cost = 0.02
>>>>
>>>> suggest that the random reads are almost as expensive as sequential
>>>> reads (which usually is not true - the random reads are significantly
>>>> more expensive), and that processing each row is about as expensive as
>>>> reading the page from disk (again, reading data from disk is much more
>>>> expensive than processing them).
>>>>
>>>> So yes, the current values are much more likely to give good results.
>>>>
>>>> You've mentioned those values were recommended on this list - can you
>>>> point out the actual discussion?
>>>>
>>>>
>>>
>>> Thank you for your reply.
>>>
>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>
>>
>> OK, what JD said there generally makes sense, although those values are
>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>> usually pushes the planner towards index scans.
>>
>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>> effect is about the same and it changes the impact of the other cost
>> variables (cpu_tuple_cost, etc.)
>>
>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>> nothing else running and the rest of the RAM is used for pagecache? I've
>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>> size is 7GB (so it might fit into memory). Is this still the case?
>>
>> regards
>> Tomas
>
>
> Thomas,
>
> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it
justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set
random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? 
>
> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?

OK, that's a very important information and it kinda explains all the
problems you had. When the planner decides what execution plan to use,
it computes a 'virtual cost' for different plans and then chooses the
cheapest one.

Decreasing 'random_page_cost' decreases the expected cost of plans
involving index scans, so that at a certain point it seems cheaper than
a plan using sequential scans etc.

You can see this when using EXPLAIN - do it with the original cost
values, then change the values (for that session only) and do the
EXPLAIN only. You'll see how the execution plan suddenly changes and
starts to use index scans.

The problem with random I/O is that it's usually much more expensive
than sequential I/O as the drives need to seek etc. The only case when
random I/O is just as cheap as sequential I/O is when all the data is
cached in memory, because within RAM there's no difference between
random and sequential access (right, that's why it's called Random
Access Memory).

So in the previous post setting both random_page_cost and seq_page_cost
to the same value makes sense, because when the whole database fits into
the memory, there's no difference and index scans are favorable.

In this case (the database is much bigger than the available RAM) this
no longer holds - index scans hit the drives, resulting in a lot of
seeks etc. So it's a serious performance killer ...

Not sure about increasing the shared_buffers - if the block is not found
in shared buffers, it still might be found in pagecache (without need to
do a physical read). There are ways to check if the current size of
shared buffers is enough or not - I usually use pg_stat views (bgwriter
and database).

regards
Tomas
В списке pgsql-performance по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance