Обсуждение: Correct the documentation for work_mem
Hi,
I recently noticed the following in the work_mem [1] documentation:
“Note that for a complex query, several sort or hash operations might be running in parallel;”
The use of “parallel” here is misleading as this has nothing to do with parallel query, but
rather several operations in a plan running simultaneously.
The use of parallel in this doc predates parallel query support, which explains the usage.
I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be running simultaneously;”
This should also be backpatched to all supported versions docs.
Thoughts?
Regards,
Sami Imseih
Amazon Web Services (AWS)
1. https://www.postgresql.org/docs/current/runtime-config-resource.html
On 21.04.23 16:28, Imseih (AWS), Sami wrote: > I recently noticed the following in the work_mem [1] documentation: > > “Note that for a complex query, several sort or hash operations might be > running in parallel;” > > The use of “parallel” here is misleading as this has nothing to do with > parallel query, but > > rather several operations in a plan running simultaneously. > > The use of parallel in this doc predates parallel query support, which > explains the usage. > > I suggest a small doc fix: > > “Note that for a complex query, several sort or hash operations might be > running simultaneously;” Here is a discussion of these terms: https://takuti.me/note/parallel-vs-concurrent/ I think "concurrently" is the correct word here.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 21.04.23 16:28, Imseih (AWS), Sami wrote: >> I suggest a small doc fix: >> “Note that for a complex query, several sort or hash operations might be >> running simultaneously;” > Here is a discussion of these terms: > https://takuti.me/note/parallel-vs-concurrent/ > I think "concurrently" is the correct word here. Probably, but it'd do little to remove the confusion Sami is on about, especially since the next sentence uses "concurrently" to describe the other case. I think we need a more thorough rewording, perhaps like - Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort or hash + operations; each such operation will generally be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. I also find this wording a bit further down to be poor: Hash-based operations are generally more sensitive to memory availability than equivalent sort-based operations. The memory available for hash tables is computed by multiplying <varname>work_mem</varname> by <varname>hash_mem_multiplier</varname>. This makes it I think "available" is not le mot juste, and it's also unclear from this whether we're speaking of the per-hash-table limit or some (nonexistent) overall limit. How about - memory available for hash tables is computed by multiplying + memory limit for a hash table is computed by multiplying regards, tom lane
On Fri, Apr 21, 2023 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > > On 21.04.23 16:28, Imseih (AWS), Sami wrote: > >> I suggest a small doc fix: > >> “Note that for a complex query, several sort or hash operations might be > >> running simultaneously;” > > > Here is a discussion of these terms: > > https://takuti.me/note/parallel-vs-concurrent/ > > > I think "concurrently" is the correct word here. > > Probably, but it'd do little to remove the confusion Sami is on about, +1. When discussing this internally, Sami's proposal was in fact to use the word 'concurrently'. But given that when it comes to computers and programming, it's common for someone to not understand the intricate difference between the two terms, we thought it's best to not use any of those, and instead use a word not usually associated with programming and algorithms. Aside: Another pair of words I see regularly used interchangeably, when in fact they mean different things: precise vs. accurate. > especially since the next sentence uses "concurrently" to describe the > other case. I think we need a more thorough rewording, perhaps like > > - Note that for a complex query, several sort or hash operations might be > - running in parallel; each operation will generally be allowed > + Note that a complex query may include several sort or hash > + operations; each such operation will generally be allowed This wording doesn't seem to bring out the fact that there could be more than one work_mem consumer running (in-progress) at the same time. The reader to could mistake it to mean hashes and sorts in a complex query may happen one after the other. + Note that a complex query may include several sort and hash operations, and + more than one of these operations may be in progress simultaneously at any + given time; each such operation will generally be allowed I believe the phrase "several sort _and_ hash" better describes the possible composition of a complex query, than does "several sort _or_ hash". > I also find this wording a bit further down to be poor: > > Hash-based operations are generally more sensitive to memory > availability than equivalent sort-based operations. The > memory available for hash tables is computed by multiplying > <varname>work_mem</varname> by > <varname>hash_mem_multiplier</varname>. This makes it > > I think "available" is not le mot juste, and it's also unclear from > this whether we're speaking of the per-hash-table limit or some > (nonexistent) overall limit. How about > > - memory available for hash tables is computed by multiplying > + memory limit for a hash table is computed by multiplying +1 Best regards, Gurjeet https://Gurje.et Postgres Contributors Team, http://aws.amazon.com
> > especially since the next sentence uses "concurrently" to describe the > > other case. I think we need a more thorough rewording, perhaps like > > > > - Note that for a complex query, several sort or hash operations might be > > - running in parallel; each operation will generally be allowed > > + Note that a complex query may include several sort or hash > > + operations; each such operation will generally be allowed > This wording doesn't seem to bring out the fact that there could be > more than one work_mem consumer running (in-progress) at the same > time. Do you mean, more than one work_mem consumer running at the same time for a given query? If so, that is precisely the point we need to convey in the docs. i.e. if I have 2 sorts in a query that can use up to 4MB each, at some point in the query execution, I can have 8MB of memory allocated. Regards, Sami Imseih Amazon Web Services (AWS)
Based on the feedback, here is a v1 of the suggested doc changes. I modified Gurjeets suggestion slightly to make it clear that a specific query execution could have operations simultaneously using up to work_mem. I also added the small hash table memory limit clarification. Regards, Sami Imseih Amazon Web Services (AWS)
Вложения
On Tue, 25 Apr 2023 at 04:20, Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Based on the feedback, here is a v1 of the suggested doc changes. > > I modified Gurjeets suggestion slightly to make it clear that a specific > query execution could have operations simultaneously using up to > work_mem. > - Note that for a complex query, several sort or hash operations might be > - running in parallel; each operation will generally be allowed > + Note that a complex query may include several sort and hash operations, > + and more than one of these operations may be in progress simultaneously > + for a given query execution; each such operation will generally be allowed > to use as much memory as this value specifies before it starts > to write data into temporary files. Also, several running > sessions could be doing such operations concurrently. I'm wondering about adding "and more than one of these operations may be in progress simultaneously". Are you talking about concurrent sessions running other queries which are using work_mem too? If so, isn't that already covered by the final sentence in the quoted text above? if not, what is running simultaneously? I think Tom's suggestion looks fine. I'd maybe change "sort or hash" to "sort and hash" per the suggestion from Gurjeet above. David
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed Hello, I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for thissection is quite clear. > I'm wondering about adding "and more than one of these operations may > be in progress simultaneously". Are you talking about concurrent > sessions running other queries which are using work_mem too? This appears to be referring to the "sort and hash" operations mentioned prior. > If so, > isn't that already covered by the final sentence in the quoted text > above? if not, what is running simultaneously? I believe the last sentence is referring to another session that is running its own sort and hash operations. So the firstsection you mention is describing how sort and hash operations can be in execution at the same time for a query, whilethe second refers to how sessions may overlap in their execution of sort and hash operations if I am understanding thiscorrectly. I also agree that changing "sort or hash" to "sort and hash" is a better description. Tristen
Hi, Sorry for the delay in response and thanks for the feedback! > I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for thissection is quite clear. Sorry, I am not understanding. What is clear? The current documentation -or- the proposed documentation in the patch? >> I'm wondering about adding "and more than one of these operations may >> be in progress simultaneously". Are you talking about concurrent >> sessions running other queries which are using work_mem too? > This appears to be referring to the "sort and hash" operations mentioned prior. Correct, this is not referring to multiple sessions, but a given execution could have multiple operations that are each using up to work_mem simultaneously. > I also agree that changing "sort or hash" to "sort and hash" is a better description. That is addressed in the last revision of the patch. - Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort and hash operations, Regards, Sami
On Fri, Apr 21, 2023 at 01:15:01PM -0400, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > > On 21.04.23 16:28, Imseih (AWS), Sami wrote: > >> I suggest a small doc fix: > >> “Note that for a complex query, several sort or hash operations might be > >> running simultaneously;” > > > Here is a discussion of these terms: > > https://takuti.me/note/parallel-vs-concurrent/ > > > I think "concurrently" is the correct word here. > > Probably, but it'd do little to remove the confusion Sami is on about, > especially since the next sentence uses "concurrently" to describe the > other case. I think we need a more thorough rewording, perhaps like > > - Note that for a complex query, several sort or hash operations might be > - running in parallel; each operation will generally be allowed > + Note that a complex query may include several sort or hash > + operations; each such operation will generally be allowed > to use as much memory as this value specifies before it starts > to write data into temporary files. Also, several running > sessions could be doing such operations concurrently. > > I also find this wording a bit further down to be poor: > > Hash-based operations are generally more sensitive to memory > availability than equivalent sort-based operations. The > memory available for hash tables is computed by multiplying > <varname>work_mem</varname> by > <varname>hash_mem_multiplier</varname>. This makes it > > I think "available" is not le mot juste, and it's also unclear from > this whether we're speaking of the per-hash-table limit or some > (nonexistent) overall limit. How about > > - memory available for hash tables is computed by multiplying > + memory limit for a hash table is computed by multiplying Adjusted patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
On Fri, 8 Sept 2023 at 15:24, Bruce Momjian <bruce@momjian.us> wrote: > Adjusted patch attached. This looks mostly fine to me modulo "sort or hash". I do see many instances of "and/or" in the docs. Maybe that would work better. David
> This looks mostly fine to me modulo "sort or hash". I do see many > instances of "and/or" in the docs. Maybe that would work better. "sort or hash operations at the same time" is clear explanation IMO. This latest version of the patch looks good to me. Regards, Sami
On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > This looks mostly fine to me modulo "sort or hash". I do see many > > instances of "and/or" in the docs. Maybe that would work better. > > "sort or hash operations at the same time" is clear explanation IMO. Just for anyone else following along that haven't seen the patch. The full text in question is: + Note that a complex query might perform several sort or hash + operations at the same time, with each operation generally being It's certainly not a show-stopper. I do believe the patch makes some improvements. The reason I'd prefer to see either "and" or "and/or" in place of "or" is because the text is trying to imply that many of these operations can run at the same time. I'm struggling to understand why, given that there could be many sorts and many hashes going on at once that we'd claim it could only be one *or* the other. If we have 12 sorts and 4 hashes then that's not "several sort or hash operations", it's "several sort and hash operations". Of course, it could just be sorts or just hashes, so "and/or" works fine for that. David
On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote: > On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > > > This looks mostly fine to me modulo "sort or hash". I do see many > > > instances of "and/or" in the docs. Maybe that would work better. > > > > "sort or hash operations at the same time" is clear explanation IMO. > > Just for anyone else following along that haven't seen the patch. The > full text in question is: > > + Note that a complex query might perform several sort or hash > + operations at the same time, with each operation generally being > > It's certainly not a show-stopper. I do believe the patch makes some > improvements. The reason I'd prefer to see either "and" or "and/or" > in place of "or" is because the text is trying to imply that many of > these operations can run at the same time. I'm struggling to > understand why, given that there could be many sorts and many hashes > going on at once that we'd claim it could only be one *or* the other. > If we have 12 sorts and 4 hashes then that's not "several sort or hash > operations", it's "several sort and hash operations". Of course, it > could just be sorts or just hashes, so "and/or" works fine for that. Yes, I see your point and went with "and", updated patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote: > > It's certainly not a show-stopper. I do believe the patch makes some > > improvements. The reason I'd prefer to see either "and" or "and/or" > > in place of "or" is because the text is trying to imply that many of > > these operations can run at the same time. I'm struggling to > > understand why, given that there could be many sorts and many hashes > > going on at once that we'd claim it could only be one *or* the other. > > If we have 12 sorts and 4 hashes then that's not "several sort or hash > > operations", it's "several sort and hash operations". Of course, it > > could just be sorts or just hashes, so "and/or" works fine for that. > > Yes, I see your point and went with "and", updated patch attached. Looks good to me. David
On Wed, Sep 27, 2023 at 02:05:44AM +1300, David Rowley wrote: > On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote: > > > It's certainly not a show-stopper. I do believe the patch makes some > > > improvements. The reason I'd prefer to see either "and" or "and/or" > > > in place of "or" is because the text is trying to imply that many of > > > these operations can run at the same time. I'm struggling to > > > understand why, given that there could be many sorts and many hashes > > > going on at once that we'd claim it could only be one *or* the other. > > > If we have 12 sorts and 4 hashes then that's not "several sort or hash > > > operations", it's "several sort and hash operations". Of course, it > > > could just be sorts or just hashes, so "and/or" works fine for that. > > > > Yes, I see your point and went with "and", updated patch attached. > > Looks good to me. Patch applied back to Postgres 11. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.