Обсуждение: Correct the documentation for work_mem

Поиск
Список
Период
Сортировка

Correct the documentation for work_mem

От
"Imseih (AWS), Sami"
Дата:

Hi,

 

I recently noticed the following in the work_mem [1] documentation:

 

“Note that for a complex query, several sort or hash operations might be running in parallel;”

 

The use of “parallel” here is misleading as this has nothing to do with parallel query, but

rather several operations in a plan running simultaneously.

 

The use of parallel in this doc predates parallel query support, which explains the usage.

 

I suggest a small doc fix:

 

“Note that for a complex query, several sort or hash operations might be running simultaneously;”

 

This should also be backpatched to all supported versions docs.

 

Thoughts?

 

Regards,

 

Sami Imseih

Amazon Web Services (AWS)

 

1. https://www.postgresql.org/docs/current/runtime-config-resource.html

 

 

 

 

Re: Correct the documentation for work_mem

От
Peter Eisentraut
Дата:
On 21.04.23 16:28, Imseih (AWS), Sami wrote:
> I recently noticed the following in the work_mem [1] documentation:
> 
> “Note that for a complex query, several sort or hash operations might be 
> running in parallel;”
> 
> The use of “parallel” here is misleading as this has nothing to do with 
> parallel query, but
> 
> rather several operations in a plan running simultaneously.
> 
> The use of parallel in this doc predates parallel query support, which 
> explains the usage.
> 
> I suggest a small doc fix:
> 
> “Note that for a complex query, several sort or hash operations might be 
> running simultaneously;”

Here is a discussion of these terms: 
https://takuti.me/note/parallel-vs-concurrent/

I think "concurrently" is the correct word here.




Re: Correct the documentation for work_mem

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 21.04.23 16:28, Imseih (AWS), Sami wrote:
>> I suggest a small doc fix:
>> “Note that for a complex query, several sort or hash operations might be
>> running simultaneously;”

> Here is a discussion of these terms:
> https://takuti.me/note/parallel-vs-concurrent/

> I think "concurrently" is the correct word here.

Probably, but it'd do little to remove the confusion Sami is on about,
especially since the next sentence uses "concurrently" to describe the
other case.  I think we need a more thorough rewording, perhaps like

-       Note that for a complex query, several sort or hash operations might be
-       running in parallel; each operation will generally be allowed
+       Note that a complex query may include several sort or hash
+       operations; each such operation will generally be allowed
        to use as much memory as this value specifies before it starts
        to write data into temporary files.  Also, several running
        sessions could be doing such operations concurrently.

I also find this wording a bit further down to be poor:

        Hash-based operations are generally more sensitive to memory
        availability than equivalent sort-based operations.  The
        memory available for hash tables is computed by multiplying
        <varname>work_mem</varname> by
        <varname>hash_mem_multiplier</varname>.  This makes it

I think "available" is not le mot juste, and it's also unclear from
this whether we're speaking of the per-hash-table limit or some
(nonexistent) overall limit.  How about

-       memory available for hash tables is computed by multiplying
+       memory limit for a hash table is computed by multiplying

            regards, tom lane



Re: Correct the documentation for work_mem

От
Gurjeet Singh
Дата:
On Fri, Apr 21, 2023 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > On 21.04.23 16:28, Imseih (AWS), Sami wrote:
> >> I suggest a small doc fix:
> >> “Note that for a complex query, several sort or hash operations might be
> >> running simultaneously;”
>
> > Here is a discussion of these terms:
> > https://takuti.me/note/parallel-vs-concurrent/
>
> > I think "concurrently" is the correct word here.
>
> Probably, but it'd do little to remove the confusion Sami is on about,

+1.

When discussing this internally, Sami's proposal was in fact to use
the word 'concurrently'. But given that when it comes to computers and
programming, it's common for someone to not understand the intricate
difference between the two terms, we thought it's best to not use any
of those, and instead use a word not usually associated with
programming and algorithms.

Aside: Another pair of words I see regularly used interchangeably,
when in fact they mean different things: precise vs. accurate.

> especially since the next sentence uses "concurrently" to describe the
> other case.  I think we need a more thorough rewording, perhaps like
>
> -       Note that for a complex query, several sort or hash operations might be
> -       running in parallel; each operation will generally be allowed
> +       Note that a complex query may include several sort or hash
> +       operations; each such operation will generally be allowed

This wording doesn't seem to bring out the fact that there could be
more than one work_mem consumer running (in-progress) at the same
time. The reader to could mistake it to mean hashes and sorts in a
complex query may happen one after the other.

+ Note that a complex query may include several sort and hash operations, and
+ more than one of these operations may be in progress simultaneously at any
+ given time;  each such operation will generally be allowed

I believe the phrase "several sort _and_ hash" better describes the
possible composition of a complex query, than does "several sort _or_
hash".

> I also find this wording a bit further down to be poor:
>
>         Hash-based operations are generally more sensitive to memory
>         availability than equivalent sort-based operations.  The
>         memory available for hash tables is computed by multiplying
>         <varname>work_mem</varname> by
>         <varname>hash_mem_multiplier</varname>.  This makes it
>
> I think "available" is not le mot juste, and it's also unclear from
> this whether we're speaking of the per-hash-table limit or some
> (nonexistent) overall limit.  How about
>
> -       memory available for hash tables is computed by multiplying
> +       memory limit for a hash table is computed by multiplying

+1

Best regards,
Gurjeet https://Gurje.et
Postgres Contributors Team, http://aws.amazon.com



Re: Correct the documentation for work_mem

От
"Imseih (AWS), Sami"
Дата:
> > especially since the next sentence uses "concurrently" to describe the
> > other case.  I think we need a more thorough rewording, perhaps like
> >
> > -       Note that for a complex query, several sort or hash operations might be
> > -       running in parallel; each operation will generally be allowed
> > +       Note that a complex query may include several sort or hash
> > +       operations; each such operation will generally be allowed

> This wording doesn't seem to bring out the fact that there could be
> more than one work_mem consumer running (in-progress) at the same
> time. 

Do you mean, more than one work_mem consumer running at the same
time for a given query? If so, that is precisely the point we need to convey
in the docs.

i.e. if I have 2 sorts in a query that can use up to 4MB each, at some point
in the query execution, I can have 8MB of memory allocated.


Regards,

Sami Imseih
Amazon Web Services (AWS)


Re: Correct the documentation for work_mem

От
"Imseih (AWS), Sami"
Дата:
Based on the feedback, here is a v1 of the suggested doc changes.

I modified Gurjeets suggestion slightly to make it clear that a specific
query execution could have operations simultaneously using up to 
work_mem.

I also added the small hash table memory limit clarification.


Regards,

Sami Imseih
Amazon Web Services (AWS)






Вложения

Re: Correct the documentation for work_mem

От
David Rowley
Дата:
On Tue, 25 Apr 2023 at 04:20, Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> Based on the feedback, here is a v1 of the suggested doc changes.
>
> I modified Gurjeets suggestion slightly to make it clear that a specific
> query execution could have operations simultaneously using up to
> work_mem.

> -        Note that for a complex query, several sort or hash operations might be
> -        running in parallel; each operation will generally be allowed
> +        Note that a complex query may include several sort and hash operations,
> +        and more than one of these operations may be in progress simultaneously
> +        for a given query execution; each such operation will generally be allowed
>         to use as much memory as this value specifies before it starts
>         to write data into temporary files.  Also, several running
>         sessions could be doing such operations concurrently.

I'm wondering about adding "and more than one of these operations may
be in progress simultaneously".  Are you talking about concurrent
sessions running other queries which are using work_mem too?  If so,
isn't that already covered by the final sentence in the quoted text
above? if not, what is running simultaneously?

I think Tom's suggestion looks fine. I'd maybe change "sort or hash"
to "sort and hash" per the suggestion from Gurjeet above.

David



Re: Correct the documentation for work_mem

От
Tristen Raab
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

Hello,

I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for
thissection is quite clear.
 

> I'm wondering about adding "and more than one of these operations may
> be in progress simultaneously".  Are you talking about concurrent
> sessions running other queries which are using work_mem too?

This appears to be referring to the "sort and hash" operations mentioned prior.

> If so,
> isn't that already covered by the final sentence in the quoted text
> above? if not, what is running simultaneously?

I believe the last sentence is referring to another session that is running its own sort and hash operations. So the
firstsection you mention is describing how sort and hash operations can be in execution at the same time for a query,
whilethe second refers to how sessions may overlap in their execution of sort and hash operations if I am understanding
thiscorrectly.
 

I also agree that changing "sort or hash" to "sort and hash" is a better description.

Tristen

Re: Correct the documentation for work_mem

От
"Imseih (AWS), Sami"
Дата:
Hi,

Sorry for the delay in response and thanks for the feedback!

> I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for
thissection is quite clear.
 

Sorry, I am not understanding. What is clear? The current documentation -or- the proposed documentation in the patch?

>> I'm wondering about adding "and more than one of these operations may
>> be in progress simultaneously".  Are you talking about concurrent
>> sessions running other queries which are using work_mem too?

> This appears to be referring to the "sort and hash" operations mentioned prior.

Correct, this is not referring to multiple sessions, but a given execution could 
have multiple operations that are each using up to work_mem simultaneously.

> I also agree that changing "sort or hash" to "sort and hash" is a better description.

That is addressed in the last revision of the patch.

-        Note that for a complex query, several sort or hash operations might be
-        running in parallel; each operation will generally be allowed
+        Note that a complex query may include several sort and hash operations,

Regards,

Sami 



Re: Correct the documentation for work_mem

От
Bruce Momjian
Дата:
On Fri, Apr 21, 2023 at 01:15:01PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > On 21.04.23 16:28, Imseih (AWS), Sami wrote:
> >> I suggest a small doc fix:
> >> “Note that for a complex query, several sort or hash operations might be 
> >> running simultaneously;”
> 
> > Here is a discussion of these terms: 
> > https://takuti.me/note/parallel-vs-concurrent/
> 
> > I think "concurrently" is the correct word here.
> 
> Probably, but it'd do little to remove the confusion Sami is on about,
> especially since the next sentence uses "concurrently" to describe the
> other case.  I think we need a more thorough rewording, perhaps like
> 
> -       Note that for a complex query, several sort or hash operations might be
> -       running in parallel; each operation will generally be allowed
> +       Note that a complex query may include several sort or hash
> +       operations; each such operation will generally be allowed
>         to use as much memory as this value specifies before it starts
>         to write data into temporary files.  Also, several running
>         sessions could be doing such operations concurrently.
> 
> I also find this wording a bit further down to be poor:
> 
>         Hash-based operations are generally more sensitive to memory
>         availability than equivalent sort-based operations.  The
>         memory available for hash tables is computed by multiplying
>         <varname>work_mem</varname> by
>         <varname>hash_mem_multiplier</varname>.  This makes it
> 
> I think "available" is not le mot juste, and it's also unclear from
> this whether we're speaking of the per-hash-table limit or some
> (nonexistent) overall limit.  How about
> 
> -       memory available for hash tables is computed by multiplying
> +       memory limit for a hash table is computed by multiplying

Adjusted patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: Correct the documentation for work_mem

От
David Rowley
Дата:
On Fri, 8 Sept 2023 at 15:24, Bruce Momjian <bruce@momjian.us> wrote:
> Adjusted patch attached.

This looks mostly fine to me modulo "sort or hash".  I do see many
instances of "and/or" in the docs. Maybe that would work better.

David



Re: Correct the documentation for work_mem

От
"Imseih (AWS), Sami"
Дата:
> This looks mostly fine to me modulo "sort or hash". I do see many
> instances of "and/or" in the docs. Maybe that would work better.

"sort or hash operations at the same time" is clear explanation IMO.

This latest version of the patch looks good to me.

Regards,

Sami







Re: Correct the documentation for work_mem

От
David Rowley
Дата:
On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> > This looks mostly fine to me modulo "sort or hash". I do see many
> > instances of "and/or" in the docs. Maybe that would work better.
>
> "sort or hash operations at the same time" is clear explanation IMO.

Just for anyone else following along that haven't seen the patch. The
full text in question is:

+        Note that a complex query might perform several sort or hash
+        operations at the same time, with each operation generally being

It's certainly not a show-stopper. I do believe the patch makes some
improvements.  The reason I'd prefer to see either "and" or "and/or"
in place of "or" is because the text is trying to imply that many of
these operations can run at the same time. I'm struggling to
understand why, given that there could be many sorts and many hashes
going on at once that we'd claim it could only be one *or* the other.
If we have 12 sorts and 4 hashes then that's not "several sort or hash
operations", it's "several sort and hash operations".  Of course, it
could just be sorts or just hashes, so "and/or" works fine for that.

David



Re: Correct the documentation for work_mem

От
Bruce Momjian
Дата:
On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
> On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote:
> >
> > > This looks mostly fine to me modulo "sort or hash". I do see many
> > > instances of "and/or" in the docs. Maybe that would work better.
> >
> > "sort or hash operations at the same time" is clear explanation IMO.
> 
> Just for anyone else following along that haven't seen the patch. The
> full text in question is:
> 
> +        Note that a complex query might perform several sort or hash
> +        operations at the same time, with each operation generally being
> 
> It's certainly not a show-stopper. I do believe the patch makes some
> improvements.  The reason I'd prefer to see either "and" or "and/or"
> in place of "or" is because the text is trying to imply that many of
> these operations can run at the same time. I'm struggling to
> understand why, given that there could be many sorts and many hashes
> going on at once that we'd claim it could only be one *or* the other.
> If we have 12 sorts and 4 hashes then that's not "several sort or hash
> operations", it's "several sort and hash operations".  Of course, it
> could just be sorts or just hashes, so "and/or" works fine for that.

Yes, I see your point and went with "and",   updated patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: Correct the documentation for work_mem

От
David Rowley
Дата:
On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
> > It's certainly not a show-stopper. I do believe the patch makes some
> > improvements.  The reason I'd prefer to see either "and" or "and/or"
> > in place of "or" is because the text is trying to imply that many of
> > these operations can run at the same time. I'm struggling to
> > understand why, given that there could be many sorts and many hashes
> > going on at once that we'd claim it could only be one *or* the other.
> > If we have 12 sorts and 4 hashes then that's not "several sort or hash
> > operations", it's "several sort and hash operations".  Of course, it
> > could just be sorts or just hashes, so "and/or" works fine for that.
>
> Yes, I see your point and went with "and",   updated patch attached.

Looks good to me.

David



Re: Correct the documentation for work_mem

От
Bruce Momjian
Дата:
On Wed, Sep 27, 2023 at 02:05:44AM +1300, David Rowley wrote:
> On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
> > > It's certainly not a show-stopper. I do believe the patch makes some
> > > improvements.  The reason I'd prefer to see either "and" or "and/or"
> > > in place of "or" is because the text is trying to imply that many of
> > > these operations can run at the same time. I'm struggling to
> > > understand why, given that there could be many sorts and many hashes
> > > going on at once that we'd claim it could only be one *or* the other.
> > > If we have 12 sorts and 4 hashes then that's not "several sort or hash
> > > operations", it's "several sort and hash operations".  Of course, it
> > > could just be sorts or just hashes, so "and/or" works fine for that.
> >
> > Yes, I see your point and went with "and",   updated patch attached.
> 
> Looks good to me.

Patch applied back to Postgres 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.