Обсуждение: explanation for random_page_cost is outdated

Поиск
Список
Период
Сортировка

explanation for random_page_cost is outdated

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/runtime-config-query.html
Description:

Explanation for random_page_cost is rather outdated, because it did only for
case of mechanical hdd. But all modern database servers, which I know, made
upon SSD. Do or not do default value for random_page_cost equal to 1 is the
question, but, IMHO, at list in the documentation  about random_page_cost
need to add in a speculation about SSD.

It's important because a business programming now is mostly web programming.
Most database is poorly designed by web programmer, tables looked like a
primary key and a huge json (containing all) with large gin index upon it.
Now I am seeing a table with a GIN index 50% of the table size. The database
is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
by planner, but with random_page_cost=1 the result may be not excellent, but
acceptable for web programmers.

Re: explanation for random_page_cost is outdated

От
Bruce Momjian
Дата:
On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
> Description:
> 
> Explanation for random_page_cost is rather outdated, because it did only for
> case of mechanical hdd. But all modern database servers, which I know, made
> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
> question, but, IMHO, at list in the documentation  about random_page_cost
> need to add in a speculation about SSD.
> 
> It's important because a business programming now is mostly web programming.
> Most database is poorly designed by web programmer, tables looked like a
> primary key and a huge json (containing all) with large gin index upon it.
> Now I am seeing a table with a GIN index 50% of the table size. The database
> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
> by planner, but with random_page_cost=1 the result may be not excellent, but
> acceptable for web programmers.

Does this sentence in the random_page_cost docs unclear or not have enough
visibility:

    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
    
    Storage that has a low random read cost relative to sequential, e.g.
    solid-state drives, might also be better modeled with a lower value for
    random_page_cost.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: explanation for random_page_cost is outdated

От
Олег Самойлов
Дата:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +




Re: explanation for random_page_cost is outdated

От
yigong hu
Дата:
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that

> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.

However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:

 >  Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.

Detail:

I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.

I attached my testing script, the postgresql configuration file, and planner output.

On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <splarv@ya.ru> wrote:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>   
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +





Вложения

Re: explanation for random_page_cost is outdated

От
Pavel Stehule
Дата:


ne 26. 4. 2020 v 21:25 odesílatel yigong hu <yigongh@gmail.com> napsal:
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that

> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.

However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:

 >  Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.

I depends on estimation. Lot of people use random_page_cost as fix of broken estimation. Then configures this value to some strange values. Lot of other queries with good estimation can be worse then.



Detail:

I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.

I attached my testing script, the postgresql configuration file, and planner output.

On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <splarv@ya.ru> wrote:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>   
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +





Re: explanation for random_page_cost is outdated

От
Bruce Momjian
Дата:
On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote:
> 
> 
> ne 26. 4. 2020 v 21:25 odesílatel yigong hu <yigongh@gmail.com> napsal:
> 
>     Sorry to hijack the thread, I also recently have similar observation that
>     the statement about random_page_cost on SSD is ambiguous. The current
>     document says that
> 
>     > Storage that has a low random read cost relative to sequential, e.g.
>     solid-state drives, might also be better modeled with a lower value for
>     random_page_cost.
> 
>     However, this statement does not clarify what values might be good. For
>     some workload, the default value 4.0 would cause bad performance and
>     lowering random_page_cost to a value 3.0 or 2.0 does not solve the
>     performance problem. Only when the random_page_cost is lowered to below 1.2
>     will the bad performance be mitigated. Thus, I would suggest elaborating on
>     this description further as:
> 
>      >  Storage that has a low random read cost relative to sequential, e.g.
>     solid-state drives, might also be better modeled with a value that is close
>     to 1 for random_page_cost.
> 
> 
> I depends on estimation. Lot of people use random_page_cost as fix of broken
> estimation. Then configures this value to some strange values. Lot of other
> queries with good estimation can be worse then.

I have been recommending 1.1 as a value for random_page_cost for SSDs
for years, and I think it would be helpful to suggest that value, so doc
patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Вложения

Re: explanation for random_page_cost is outdated

От
Олег Самойлов
Дата:
Yes, I saw recommendations for 1.1 early, but why? Why such exactly precision number, why 1.1? Is here ever a
theoreticalor experimental prof? 

As for me, random_page_cost depended not only not characteristic of a storage device (hdd or ssd), but also on
assumptionsabout how much of the database is in memory cache (90% by default). And this is a very rough assumption (of
causein ideal whole database must fit in the memory cache). 

And so I don't see any reason to recommend exactly value 1.1, simple 1 is good too, especially for an ideal server with
hugememory cache. 


> 27 апр. 2020 г., в 19:16, Bruce Momjian <bruce@momjian.us> написал(а):
>
> I have been recommending 1.1 as a value for random_page_cost for SSDs
> for years, and I think it would be helpful to suggest that value, so doc
> patch attached.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +
> <random.diff>




Re: explanation for random_page_cost is outdated

От
Bruce Momjian
Дата:
On Mon, Apr 27, 2020 at 07:24:45PM +0300, Олег Самойлов
wrote:
> Yes, I saw recommendations for 1.1 early, but why? Why such exactly
> precision number, why 1.1? Is here ever a theoretical or experimental
> prof?

Well, SSD random performance is slightly slower than sequential, so the
value should be slighly larger than 1.1.  Clearly 2.0 or higher is not
recommended.  Could it be 1.05 or 1.01 or 1.15?  Yeah, but probably
giving 1.1 is at least a good suggestion to start with.

> As for me, random_page_cost depended not only not characteristic of a
> storage device (hdd or ssd), but also on assumptions about how much of
> the database is in memory cache (90% by default). And this is a very
> rough assumption (of cause in ideal whole database must fit in the
> memory cache).

True.

> And so I don't see any reason to recommend exactly value 1.1, simple 1
> is good too, especially for an ideal server with huge memory cache.

Uh, well, 1.0 is fine for non-SSDs too if all the data is in cache, and
there are no database writes.

---------------------------------------------------------------------------

>
>
> > 27 апр. 2020 г., в 19:16, Bruce Momjian <bruce@momjian.us>
> > написал(а):
> >
> > I have been recommending 1.1 as a value for random_page_cost for
> > SSDs for years, and I think it would be helpful to suggest that
> > value, so doc patch attached.
> >
> > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB
> > https://enterprisedb.com
> >
> > + As you are, so once was I.  As I am, so you will be. + + Ancient
> > Roman grave inscription + <random.diff>
>

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: explanation for random_page_cost is outdated

От
Bruce Momjian
Дата:
On Mon, Apr 27, 2020 at 12:16:30PM -0400, Bruce Momjian wrote:
> On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote:
> >      >  Storage that has a low random read cost relative to sequential, e.g.
> >     solid-state drives, might also be better modeled with a value that is close
> >     to 1 for random_page_cost.
> > 
> > 
> > I depends on estimation. Lot of people use random_page_cost as fix of broken
> > estimation. Then configures this value to some strange values. Lot of other
> > queries with good estimation can be worse then.
> 
> I have been recommending 1.1 as a value for random_page_cost for SSDs
> for years, and I think it would be helpful to suggest that value, so doc
> patch attached.

Patch applied back through 9.5.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +