Обсуждение: proposal: make NOTIFY list de-duplication optional
- new GUC in "Statement Behaviour" section, notify_duplicate_removal (default true)
Initial discussion in this thread: http://www.postgresql.org/message-id/CAP_rwwmpzk9=SbjRZTOd05bDctyC43wNKnu_m37dYGvL4SAeSw@mail.gmail.com
Rationale: for some legitimate use cases, duplicate removal is not required, and it gets O(N^2) cost on large COPY/ insert transactions.
Вложения
On Fri, Feb 5, 2016 at 10:17 AM, Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote: > - new GUC in "Statement Behaviour" section, notify_duplicate_removal > (default true) > > Initial discussion in this thread: > http://www.postgresql.org/message-id/CAP_rwwmpzk9=SbjRZTOd05bDctyC43wNKnu_m37dYGvL4SAeSw@mail.gmail.com > > Rationale: for some legitimate use cases, duplicate removal is not required, > and it gets O(N^2) cost on large COPY/ insert transactions. I agree with what Merlin said about this: http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yC10AHU1nFiA1tbHsg+35Ds-oEueUapo7t4g@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 5, 2016 at 10:17 AM, Filip Rembiałkowski > <filip.rembialkowski@gmail.com> wrote: >> - new GUC in "Statement Behaviour" section, notify_duplicate_removal > I agree with what Merlin said about this: > http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yC10AHU1nFiA1tbHsg+35Ds-oEueUapo7t4g@mail.gmail.com Yeah, I agree that a GUC for this is quite unappetizing. One idea would be to build a hashtable to aid with duplicate detection (perhaps only once the pending-notify list gets long). Another thought is that it's already the case that duplicate detection is something of a "best effort" activity; note for example the comment in AsyncExistsPendingNotify pointing out that we don't collapse duplicates across subtransactions. Would it be acceptable to relax the standards a bit further? For example, if we only checked for duplicates among the last N notification list entries (for N say around 100), we'd probably cover just about all the useful cases, and the runtime would stay linear. The data structure isn't tremendously conducive to that, but it could be done. regards, tom lane
On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> I agree with what Merlin said about this:
> http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yC10AHU1nFiA1tbHsg+35Ds-oEueUapo7t4g@mail.gmail.com
Yeah, I agree that a GUC for this is quite unappetizing.
How would you feel about a variant for calling NOTIFY?
The SQL syntax could be something like "NOTIFY [ALL] channel, payload" where the ALL means "just send the notification already, nobody cares whether there's an identical one in the queue".
Likewise we could introduce a three-argument form of pg_notify(text, text, bool) where the final argument is whether you are interested in removing duplicates.
Optimising the remove-duplicates path is still probably a worthwhile endeavour, but if the user really doesn't care at all about duplication, it seems silly to force them to pay any performance price for a behaviour they didn't want, no?
Cheers,
BJ
Brendan Jurd <direvus@gmail.com> writes: > On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I agree that a GUC for this is quite unappetizing. > How would you feel about a variant for calling NOTIFY? If we decide that this ought to be user-visible, then an extra NOTIFY parameter would be the way to do it. I'd much rather it "just works" though. In particular, if we do start advertising user control of de-duplication, we are likely to start getting bug reports about every case where it's inexact, eg the no-checks-across-subxact-boundaries business. > Optimising the remove-duplicates path is still probably a worthwhile > endeavour, but if the user really doesn't care at all about duplication, it > seems silly to force them to pay any performance price for a behaviour they > didn't want, no? I would only be impressed with that argument if it could be shown that de-duplication was a significant fraction of the total cost of a typical NOTIFY cycle. Obviously, you can make the O(N^2) term dominate if you try, but I really doubt that it's significant for reasonable numbers of notify events per transaction. One should also keep in mind that duplicate events are going to cost extra processing on the client-application side, too. In my experience with using NOTIFY, that cost probably dwarfs the cost of emitting the messages. regards, tom lane
On 02/05/2016 08:49 PM, Tom Lane wrote: > Yeah, I agree that a GUC for this is quite unappetizing. Agreed. > > One idea would be to build a hashtable to aid with duplicate detection > (perhaps only once the pending-notify list gets long). > > Another thought is that it's already the case that duplicate detection is > something of a "best effort" activity; note for example the comment in > AsyncExistsPendingNotify pointing out that we don't collapse duplicates > across subtransactions. Would it be acceptable to relax the standards > a bit further? For example, if we only checked for duplicates among the > last N notification list entries (for N say around 100), we'd probably > cover just about all the useful cases, and the runtime would stay linear. > The data structure isn't tremendously conducive to that, but it could be > done. > > I like the hashtable idea if it can be made workable. cheers andrew
On Sat, Feb 6, 2016 at 5:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Brendan Jurd <direvus@gmail.com> writes: >> On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I agree that a GUC for this is quite unappetizing. > >> How would you feel about a variant for calling NOTIFY? > > If we decide that this ought to be user-visible, then an extra NOTIFY > parameter would be the way to do it. I'd much rather it "just works" > though. In particular, if we do start advertising user control of > de-duplication, we are likely to start getting bug reports about every > case where it's inexact, eg the no-checks-across-subxact-boundaries > business. It is not enough to say "database server can decide to deliver a single notification only." - which is already said in the docs? The ALL keyword would be a clearly separated "do-nothing" version. > >> Optimising the remove-duplicates path is still probably a worthwhile >> endeavour, but if the user really doesn't care at all about duplication, it >> seems silly to force them to pay any performance price for a behaviour they >> didn't want, no? > > I would only be impressed with that argument if it could be shown that > de-duplication was a significant fraction of the total cost of a typical > NOTIFY cycle. Even if a typical NOTIFY cycle excludes processing 10k or 100k messages, why penalize users who have bigger transactions? > Obviously, you can make the O(N^2) term dominate if you > try, but I really doubt that it's significant for reasonable numbers of > notify events per transaction. Yes, it is hard to observe for less than few thousands messages in one transaction. But big data happens. And then the numbers get really bad. In my test for 40k messages, it is 400 ms versus 9 seconds. 22 times slower. For 200k messages, it is 2 seconds versus 250 seconds. 125 times slower. And I tested with very short payload strings, so strcmp() had not much to do.
+1 ... and a patch (only adding ALL keyword, no hash table implemented yet). On Sat, Feb 6, 2016 at 2:35 PM, Brendan Jurd <direvus@gmail.com> wrote: > On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Robert Haas <robertmhaas@gmail.com> writes: >> > I agree with what Merlin said about this: >> > >> > http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yC10AHU1nFiA1tbHsg+35Ds-oEueUapo7t4g@mail.gmail.com >> >> Yeah, I agree that a GUC for this is quite unappetizing. > > > How would you feel about a variant for calling NOTIFY? > > The SQL syntax could be something like "NOTIFY [ALL] channel, payload" where > the ALL means "just send the notification already, nobody cares whether > there's an identical one in the queue". > > Likewise we could introduce a three-argument form of pg_notify(text, text, > bool) where the final argument is whether you are interested in removing > duplicates. > > Optimising the remove-duplicates path is still probably a worthwhile > endeavour, but if the user really doesn't care at all about duplication, it > seems silly to force them to pay any performance price for a behaviour they > didn't want, no? > > Cheers, > BJ
Вложения
On 02/07/2016 03:42 AM, Filip Rembiałkowski wrote: > +1 > > ... and a patch (only adding ALL keyword, no hash table implemented yet). Please stop top-posting, it's very disruptive. My comments are below, where they belong. > On Sat, Feb 6, 2016 at 2:35 PM, Brendan Jurd <direvus@gmail.com> wrote: >> On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> I agree with what Merlin said about this: >>>> >>>> http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yC10AHU1nFiA1tbHsg+35Ds-oEueUapo7t4g@mail.gmail.com >>> >>> Yeah, I agree that a GUC for this is quite unappetizing. >> >> >> How would you feel about a variant for calling NOTIFY? >> >> The SQL syntax could be something like "NOTIFY [ALL] channel, payload" where >> the ALL means "just send the notification already, nobody cares whether >> there's an identical one in the queue". >> >> Likewise we could introduce a three-argument form of pg_notify(text, text, >> bool) where the final argument is whether you are interested in removing >> duplicates. >> >> Optimising the remove-duplicates path is still probably a worthwhile >> endeavour, but if the user really doesn't care at all about duplication, it >> seems silly to force them to pay any performance price for a behaviour they >> didn't want, no? On 02/07/2016 03:42 AM, Filip Rembiałkowski wrote: > +1 > > ... and a patch (only adding ALL keyword, no hash table implemented yet). I only read through the patch, I didn't compile it or test it, but I have a few comments: You left the duplicate behavior with subtransactions, but didn't mention it in the documentation. If I do NOTIFY DISTINCT chan, 'msg'; then I expect only distinct notifications but I'll get duplicates if I'm in a subtransaction. Either the documentation or the code needs to be fixed. I seem to remember some discussion about not using DEFAULT parameters in system functions so you should leave the old function alone and create a new function with your use_all parameter. I don't recall the exact reason why so hopefully someone else will enlighten me. There is also no mention in the documentation about what happens if I do: NOTIFY ALL chan, 'msg'; NOTIFY ALL chan, 'msg'; NOTIFY DISTINCT chan, 'msg'; NOTIFY ALL chan, 'msg'; Without testing, I'd say I'd get two messages, but it should be explicitly mentioned somewhere. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Feb 7, 2016 at 11:54 AM, Vik Fearing <vik@2ndquadrant.fr> wrote: > On 02/07/2016 03:42 AM, Filip Rembiałkowski wrote: > You left the duplicate behavior with subtransactions, but didn't mention > it in the documentation. If I do NOTIFY DISTINCT chan, 'msg'; then I > expect only distinct notifications but I'll get duplicates if I'm in a > subtransaction. Either the documentation or the code needs to be fixed. agreed > > I seem to remember some discussion about not using DEFAULT parameters in > system functions so you should leave the old function alone and create a > new function with your use_all parameter. I don't recall the exact > reason why so hopefully someone else will enlighten me. I'm quite new to this; how do I pinpoint proper OID for a new catalog object (function, in this case)? Is there a better way than browsing fmgr files and guessing next available oid? > > There is also no mention in the documentation about what happens if I do: > > NOTIFY ALL chan, 'msg'; > NOTIFY ALL chan, 'msg'; > NOTIFY DISTINCT chan, 'msg'; > NOTIFY ALL chan, 'msg'; > > Without testing, I'd say I'd get two messages, but it should be > explicitly mentioned somewhere. If it's four separate transactions, LISTEN'er should get four. If it's in one transaction, LISTEN'er should get three. > -- > Vik Fearing +33 6 46 75 15 36 > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 02/07/2016 04:00 PM, Filip Rembiałkowski wrote: > On Sun, Feb 7, 2016 at 11:54 AM, Vik Fearing <vik@2ndquadrant.fr> wrote: >> I seem to remember some discussion about not using DEFAULT parameters in >> system functions so you should leave the old function alone and create a >> new function with your use_all parameter. I don't recall the exact >> reason why so hopefully someone else will enlighten me. > > I'm quite new to this; how do I pinpoint proper OID for a new catalog > object (function, in this case)? > Is there a better way than browsing fmgr files and guessing next available oid? There is a shell script called `unused_oids` in src/include/catalog/. >> There is also no mention in the documentation about what happens if I do: >> >> NOTIFY ALL chan, 'msg'; >> NOTIFY ALL chan, 'msg'; >> NOTIFY DISTINCT chan, 'msg'; >> NOTIFY ALL chan, 'msg'; >> >> Without testing, I'd say I'd get two messages, but it should be >> explicitly mentioned somewhere. > > If it's four separate transactions, LISTEN'er should get four. The question was for one transaction, I should have been clearer about that. > If it's in one transaction, LISTEN'er should get three. This is surprising to me, I would think it would get only two. What is your rationale for three? Compare with the behavior of: select 1 union all select 1 union distinct select 1 union all select 1; -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Feb 7, 2016 at 4:37 PM, Vik Fearing <vik@2ndquadrant.fr> wrote: >>> There is also no mention in the documentation about what happens if I do: >>> >>> NOTIFY ALL chan, 'msg'; >>> NOTIFY ALL chan, 'msg'; >>> NOTIFY DISTINCT chan, 'msg'; >>> NOTIFY ALL chan, 'msg'; >>> >>> Without testing, I'd say I'd get two messages, but it should be >>> explicitly mentioned somewhere. >> >> If it's four separate transactions, LISTEN'er should get four. > > The question was for one transaction, I should have been clearer about that. > >> If it's in one transaction, LISTEN'er should get three. > > This is surprising to me, I would think it would get only two. What is > your rationale for three? > It is a single transaction, but four separate commands. >>> NOTIFY ALL chan, 'msg'; -- send the message, save in the list/hash >>> NOTIFY ALL chan, 'msg'; -- ALL was specified, send the message even if it is on the list/hash >>> NOTIFY DISTINCT chan, 'msg'; -- default mode, skip message because it's in the list/hash >>> NOTIFY ALL chan, 'msg'; -- ALL was specified, send the message even if it is hashed/saved
On 8 February 2016 at 09:37, Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote:
On Sun, Feb 7, 2016 at 4:37 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:
>>> There is also no mention in the documentation about what happens if I do:
>>>
>>> NOTIFY ALL chan, 'msg';
>>> NOTIFY ALL chan, 'msg';
>>> NOTIFY DISTINCT chan, 'msg';
>>> NOTIFY ALL chan, 'msg';
>>>
>>> Without testing, I'd say I'd get two messages, but it should be
>>> explicitly mentioned somewhere.
>>
>> If it's four separate transactions, LISTEN'er should get four.
>
> The question was for one transaction, I should have been clearer about that.
>
>> If it's in one transaction, LISTEN'er should get three.
>
> This is surprising to me, I would think it would get only two. What is
> your rationale for three?
>
It is a single transaction, but four separate commands.
>>> NOTIFY ALL chan, 'msg';
-- send the message, save in the list/hash
>>> NOTIFY ALL chan, 'msg';
-- ALL was specified, send the message even if it is on the list/hash
>>> NOTIFY DISTINCT chan, 'msg';
-- default mode, skip message because it's in the list/hash
>>> NOTIFY ALL chan, 'msg';
-- ALL was specified, send the message even if it is hashed/saved
So in total three messages are sent?
Would it be correct to say that if ALL is specified then a message is queued no matter what. If DISTINCT is specified then it is only queued if no message with the same channel and argument is already queued for delivery. Using DISTINCT can never decrease the total number of messages to be sent.
Right?
If so, I think that's the right behaviour and the docs just need to be explicit - an example like the above would be good, translated to be friendlier to those who don't know the internal mechanics.
I've found the deduplication functionality of NOTIFY very frustrating in the past and I see this as a significant improvement. Sometimes the *number of times* something happened is significant too...
On Mon, Feb 8, 2016 at 1:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > Would it be correct to say that if ALL is specified then a message is queued > no matter what. If DISTINCT is specified then it is only queued if no > message with the same channel and argument is already queued for delivery. Yes, exactly. > Using DISTINCT can never decrease the total number of messages to be sent. This sentence does not sound true. DISTINCT is the default, old behaviour. It *can* decrease total number of messages (by deduplication) > I've found the deduplication functionality of NOTIFY very frustrating in the past > and I see this as a significant improvement. Sometimes the *number of times* > something happened is significant too... yep, same idea here. Here is my next try, after suggestions from -perf and -hackers list: * no GUC * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT * corresponding, 3-argument version of pg_notify(text,text,bool) * updated the docs to include new syntax and clarify behavior * no hashtable in AsyncExistsPendingNotify (I don't see much sense in that part; and it can be well done separately from this)
Вложения
On 02/08/2016 09:33 PM, Filip Rembiałkowski wrote: > Here is my next try, after suggestions from -perf and -hackers list: > > * no GUC > > * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT > > * corresponding, 3-argument version of pg_notify(text,text,bool) > > * updated the docs to include new syntax and clarify behavior > > * no hashtable in AsyncExistsPendingNotify > (I don't see much sense in that part; and it can be well done > separately from this) Please add this to the next commitfest: https://commitfest.postgresql.org/9/new/ -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Feb 8, 2016 at 2:33 PM, Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote: > Here is my next try, after suggestions from -perf and -hackers list: > > * no GUC > > * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT > > * corresponding, 3-argument version of pg_notify(text,text,bool) This is all sounding pretty good. I wonder if the third argument should be a boolean however. If we make it 'text, 'send mode', instead, we could leave some room for more specialization of the queuing behavior. For example, we've had a couple of requests over the years to have an 'immediate' mode which dumps the notification immediately to the client without waiting for tx commit. This may or may not be a good idea, but if it was ultimately proved to be, it could be introduced as an alternate mode without adding an extra function. merlin
On Tue, Feb 9, 2016 at 12:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > I wonder if the third argument > should be a boolean however. If we make it 'text, 'send mode', > instead, we could leave some room for more specialization of the > queuing behavior. > > For example, we've had a couple of requests over the years to have an > 'immediate' mode which dumps the notification immediately to the > client without waiting for tx commit. This may or may not be a good > idea, but if it was ultimately proved to be, it could be introduced as > an alternate mode without adding an extra function. But then it becomes disputable if SQL syntax change makes sense. ---we had this,NOTIFY channel [ , payload ] ---and in this patch we have this NOTIFY [ ALL | DISTINCT ] channel [ , payload ]--- but maybe we should have this? NOTIFY channel [ , payload [ , mode ] ] I'm not sure which direction is better with non-standard SQL additions. Recycling keywords or adding more commas?
On Tue, Feb 9, 2016 at 2:16 PM, Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote: > But then it becomes disputable if SQL syntax change makes sense. > > ---we had this, > NOTIFY channel [ , payload ] > ---and in this patch we have this > NOTIFY [ ALL | DISTINCT ] channel [ , payload ] > --- but maybe we should have this? > NOTIFY channel [ , payload [ , mode ] ] I think using ALL to mean "don't worry about de-duplication" could be a bit confusing, especially as there was some interest recently in supporting wildcard notifications: http://www.postgresql.org/message-id/52693FC5.7070507@gmail.com and conceivably we might want to support a way to notify all listeners, i.e. NOTIFY * as proposed in that thread. If we ever supported wildcard notifies, ALL may be easily confused to mean "all channel names". What about adopting the options-inside-parentheses format, the way EXPLAIN does nowadays, something like: NOTIFY (DEDUPLICATE FALSE, MODE IMMEDIATE) mychannel; Josh
Josh Kupershmidt <schmiddy@gmail.com> writes: > On Tue, Feb 9, 2016 at 2:16 PM, Filip Rembiałkowski > <filip.rembialkowski@gmail.com> wrote: >> But then it becomes disputable if SQL syntax change makes sense. >> >> ---we had this, >> NOTIFY channel [ , payload ] >> ---and in this patch we have this >> NOTIFY [ ALL | DISTINCT ] channel [ , payload ] >> --- but maybe we should have this? >> NOTIFY channel [ , payload [ , mode ] ] > What about adopting the options-inside-parentheses format, the way > EXPLAIN does nowadays, something like: > NOTIFY (DEDUPLICATE FALSE, MODE IMMEDIATE) mychannel; FWIW, I think it would be a good thing if the NOTIFY statement syntax were not remarkably different from the syntax used in the pg_notify() function call. To do otherwise would certainly be confusing. So on the whole I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. regards, tom lane
Another update - separated new internal function to satisfy opr_sanity.sql
Вложения
On 2/9/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, I think it would be a good thing if the NOTIFY statement syntax were > not remarkably different from the syntax used in the pg_notify() function > call. To do otherwise would certainly be confusing. So on the whole > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. I'm quite interested in getting this addressed in time for 9.6 as I'll be using NOTIFY extensively in a project and I agree with Craig that the deduplication is frustrating both because you sometimes want every event and because it can apparently cause O(n^2) behaviour (which I didn't know before this thread). If another use case for suppressing deduplication is needed, consider publishing events like "inserted tuple", "deleted tuple" from triggers and a transaction that does "insert, delete, insert" which the client then sees as "insert, delete, oops nothing else". Tom's proposal allows for more flexible modes than just the ALL and DISTINCT keywords and accommodates the concern that DISTINCT will lead to bug reports about not really being distinct due to savepoints. Android has a similar thing for push notifications to mobile devices which they call collapse: https://developers.google.com/cloud-messaging/concept-options, search for collapse_key. So I propose NOTIFY channel [ , payload [ , collapse_mode ] ] with collapse mode being: * 'never' ** Filip's proposed behaviour for the ALL option ** if specified, every notification is queued regardless what'sin the queue * 'maybe' ** vague word allowing for flexibility in what the server decides to do ** current behaviour ** improves performancefor big transactions if a row trigger creates the same payload over and over one after the other due to the current optimization of checking the tail of the list ** has performance problems O(n^2) for big transactions with different payloads *** the performance problems can be addressed by a different patch which uses a hash table, or decides to collapse less aggressively (Tom's check last 100 idea), or whatever else *** in the meantime the 'never' mode acts as a good workaround In the future we might support an 'always' collapse_mode which would really be always, including across savepoints. Or an 'only_inside_savepoints' which guarantees the current behaviour. Filip, do you agree with Tom's proposal? Do you plan to rework the patch on these lines? If you are I'll try to review it, if not I could give it a shot as I'm interested in having this in 9.6.
On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote:
On 2/9/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I think it would be a good thing if the NOTIFY statement syntax were
> not remarkably different from the syntax used in the pg_notify() function
> call. To do otherwise would certainly be confusing. So on the whole
> I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option.
I'm quite interested in getting this addressed in time for 9.6 as I'll
be using NOTIFY extensively in a project and I agree with Craig that
the deduplication is frustrating both because you sometimes want every
event and because it can apparently cause O(n^2) behaviour (which I
didn't know before this thread). If another use case for suppressing
deduplication is needed, consider publishing events like "inserted
tuple", "deleted tuple" from triggers and a transaction that does
"insert, delete, insert" which the client then sees as "insert,
delete, oops nothing else".
Tom's proposal allows for more flexible modes than just the ALL and
DISTINCT keywords and accommodates the concern that DISTINCT will lead
to bug reports about not really being distinct due to savepoints.
Android has a similar thing for push notifications to mobile devices
which they call collapse:
https://developers.google.com/cloud-messaging/concept-options, search
for collapse_key.
So I propose NOTIFY channel [ , payload [ , collapse_mode ] ] with
collapse mode being:
* 'never'
** Filip's proposed behaviour for the ALL option
** if specified, every notification is queued regardless what's in the queue
* 'maybe'
** vague word allowing for flexibility in what the server decides to do
** current behaviour
** improves performance for big transactions if a row trigger
creates the same payload over and over one after the other due to the
current optimization of checking the tail of the list
** has performance problems O(n^2) for big transactions with
different payloads
*** the performance problems can be addressed by a different
patch which uses a hash table, or decides to collapse less
aggressively (Tom's check last 100 idea), or whatever else
*** in the meantime the 'never' mode acts as a good workaround
In the future we might support an 'always' collapse_mode which would
really be always, including across savepoints. Or an
'only_inside_savepoints' which guarantees the current behaviour.
Filip, do you agree with Tom's proposal? Do you plan to rework the
patch on these lines? If you are I'll try to review it, if not I could
give it a shot as I'm interested in having this in 9.6.
I see that Tom's remarks give more flexibility, and your refinement makes sense.
I was stuck because both syntaxes have their ugliness. NOTIFY allows the payload to be NULL:
NOTIFY chan01;
How would this look like in "never" mode?
NOTIFY chan01, NULL, 'never'; -- seems very cryptic.
On Sat, Feb 20, 2016 at 2:00 PM, Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote: > I was stuck because both syntaxes have their ugliness. NOTIFY allows the > payload to be NULL: > NOTIFY chan01; > > How would this look like in "never" mode? > NOTIFY chan01, NULL, 'never'; -- seems very cryptic. The docs say: "The information passed to the client for a notification event includes the notification channel name, the notifying session's server process PID, and the payload string, which is an empty string if it has not been specified." So a missing payload is not a SQL NULL but an empty string. This means you would have: NOTIFY chan01; NOTIFY chan01, ''; -- same as above NOTIFY chan01, '', 'maybe'; -- same as above NOTIFY chan01, '', 'never'; -- send this all the time Seems ok to me.
Hi Filip, On 2/20/16 8:00 AM, Filip Rembiałkowski wrote: > On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob <iacobcatalin@gmail.com > On 2/9/16, Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> > wrote: > > FWIW, I think it would be a good thing if the NOTIFY statement syntax were > > not remarkably different from the syntax used in the pg_notify() function > > call. To do otherwise would certainly be confusing. So on the whole > > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. > > Filip, do you agree with Tom's proposal? Do you plan to rework the > patch on these lines? If you are I'll try to review it, if not I could > give it a shot as I'm interested in having this in 9.6. > > I see that Tom's remarks give more flexibility, and your refinement > makes sense. It looks like we are waiting on a new patch from you before this can be reviewed. Are you close to having that done? Meanwhile, I have marked it "Waiting on Author". -- -David david@pgmasters.net
On 3/11/16 1:46 PM, David Steele wrote: > Hi Filip, > > On 2/20/16 8:00 AM, Filip Rembiałkowski wrote: >> On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob <iacobcatalin@gmail.com >> On 2/9/16, Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> >> wrote: >> > FWIW, I think it would be a good thing if the NOTIFY statement syntax were >> > not remarkably different from the syntax used in the pg_notify() function >> > call. To do otherwise would certainly be confusing. So on the whole >> > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. >> >> Filip, do you agree with Tom's proposal? Do you plan to rework the >> patch on these lines? If you are I'll try to review it, if not I could >> give it a shot as I'm interested in having this in 9.6. >> >> I see that Tom's remarks give more flexibility, and your refinement >> makes sense. > > It looks like we are waiting on a new patch from you before this can be > reviewed. Are you close to having that done? > > Meanwhile, I have marked it "Waiting on Author". Since there has been no activity on this thread since before the CF and no response from the author I have marked this "returned with feedback". Please feel free to resubmit for 9.7! -- -David david@pgmasters.net
Thanks for all the input. Finally I found time and motivation to revive this. See attached patch... I'm ready to work on so it can get merged in the next CF. On Thu, Mar 17, 2016 at 12:44 AM David Steele <david@pgmasters.net> wrote: > > On 3/11/16 1:46 PM, David Steele wrote: > > Hi Filip, > > > > On 2/20/16 8:00 AM, Filip Rembiałkowski wrote: > >> On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob <iacobcatalin@gmail.com > >> On 2/9/16, Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> > >> wrote: > >> > FWIW, I think it would be a good thing if the NOTIFY statement syntax were > >> > not remarkably different from the syntax used in the pg_notify() function > >> > call. To do otherwise would certainly be confusing. So on the whole > >> > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. > >> > >> Filip, do you agree with Tom's proposal? Do you plan to rework the > >> patch on these lines? If you are I'll try to review it, if not I could > >> give it a shot as I'm interested in having this in 9.6. > >> > >> I see that Tom's remarks give more flexibility, and your refinement > >> makes sense. > > > > It looks like we are waiting on a new patch from you before this can be > > reviewed. Are you close to having that done? > > > > Meanwhile, I have marked it "Waiting on Author". > > Since there has been no activity on this thread since before the CF and > no response from the author I have marked this "returned with feedback". > Please feel free to resubmit for 9.7! > > -- > -David > david@pgmasters.net
Вложения
On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski <filip.rembialkowski@gmail.com> wrote: > See attached patch... I'm ready to work on so it can get merged in the next CF. Hi Filip, Seen on Travis: async ... FAILED 126 ms Looks like the new error isn't being raised for invalid send mode? (What kind of error message is "?" anyway? :-)) ERROR: channel name too long -- Should fail. Invalid 3rd parameter NOTIFY notify_async2, 'test', 'invalid'; -ERROR: ? NOTIFY notify_async2, 'test', true; -ERROR: ? --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands NOTIFY notify_async2; NOTIFY notify_async2, ''; -- Thomas Munro https://enterprisedb.com
Thank you. Here is my latest attempt, with actual syntax error handling. Also, the syntax is updated to what Tom Lane suggested in other thread (with another variant of the same thing, from Julien Demoor) NOTIFY [ ( option [, ...] ) ] channel [ , payload ] Still no hash table fallback is implemented, so this is *not* a performance improvement. Only a little more flexibility. On Sat, Mar 9, 2019 at 3:31 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski > <filip.rembialkowski@gmail.com> wrote: > > See attached patch... I'm ready to work on so it can get merged in the next CF. > > Hi Filip, > > Seen on Travis: > > async ... FAILED 126 ms > > Looks like the new error isn't being raised for invalid send mode? > (What kind of error message is "?" anyway? :-)) > > ERROR: channel name too long > -- Should fail. Invalid 3rd parameter > NOTIFY notify_async2, 'test', 'invalid'; > -ERROR: ? > NOTIFY notify_async2, 'test', true; > -ERROR: ? > --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands > NOTIFY notify_async2; > NOTIFY notify_async2, ''; > > -- > Thomas Munro > https://enterprisedb.com
Вложения
=?UTF-8?Q?Filip_Rembia=C5=82kowski?= <filip.rembialkowski@gmail.com> writes: > Still no hash table fallback is implemented, so this is *not* a > performance improvement. Only a little more flexibility. I think that we'd probably be better off fixing the root performance issue than adding semantic complexity to bypass it. Especially since, if you don't de-duplicate, that's going to cost you when it comes time to actually send the notifications, receive them, forward them to clients, and process them in the clients. Admittedly, if the app *knows* that it's generating non-duplicate events, maybe it'd be all right to skip checking that. But if we can make the check cheap, I'd just as soon keep it. Accordingly, I looked into making a hash table when there are more than a small number of notifications pending, and attached is a lightly-tested version of that. This seems to be more or less similar speed to the existing code for up to 100 or so distinct notifies, but it soon pulls away above that. A point that needs discussion is that this patch, unlike the existing code, *does* de-duplicate fully: any events generated by a subtransaction that duplicate events already emitted by a parent will get removed when the subxact is merged to its parent. I did this partly because we have to expend O(N) work to merge N subtransaction notifies in any case, now that we have to make new hashtable entries in the parent xact; so the old excuse that subxact-end processing is really cheap no longer applies. Also because the Assert(!found) assertions in the attached hash coding fall over if we cheat on this. If we really want to maintain exactly the old semantics here, we could relax the hashtable code to just ignore duplicate entries. But, per the argument above, de-duplication is a good thing so I'm inclined to keep it like this. I also noticed that as things stand, it costs us two or three pallocs to construct a Notification event. It wouldn't be terribly hard to reduce that to one palloc, and maybe it'd be worthwhile if we're thinking that transactions with many many notifies are a case worth optimizing. But I didn't do that here; it seems like a separable patch. I also thought for awhile about not having the hashtable as an auxiliary data structure, but making it the main data structure. We could preserve the required notification ordering information by threading the live hashtable entries into an slist, say. However, this would greatly increase the overhead for transactions with just one or a few distinct NOTIFY events, since we'd have to set up the hashtable at the first one. I think that's a common enough case that we shouldn't de-optimize it. A smaller objection is that such a data structure would absolutely commit us to de-duplication semantics, whereas the list plus separate hashtable can cope with not de-duping if someone persuades us that's sane. Thoughts? regards, tom lane diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 6e9c580..c21daa5 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -135,6 +135,7 @@ #include "storage/sinval.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/hashutils.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/snapmgr.h" @@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */ /* * State for outbound notifies consists of a list of all channels+payloads - * NOTIFYed in the current transaction. We do not actually perform a NOTIFY - * until and unless the transaction commits. pendingNotifies is NIL if no - * NOTIFYs have been done in the current transaction. + * NOTIFYed in the current transaction. We do not actually perform a NOTIFY + * until and unless the transaction commits. pendingNotifies is NULL if no + * NOTIFYs have been done in the current (sub) transaction. + * + * We discard duplicate notify events issued in the same transaction. + * Hence, in addition to the list proper (which we need to track the order + * of the events, since we guarantee to deliver them in order), we build a + * hash table which we can probe to detect duplicates. Since building the + * hash table is somewhat expensive, we do so only once we have at least + * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction; + * before that we just scan the events linearly. * * The list is kept in CurTransactionContext. In subtransactions, each * subtransaction has its own list in its own CurTransactionContext, but - * successful subtransactions attach their lists to their parent's list. - * Failed subtransactions simply discard their lists. + * successful subtransactions add their entries to their parent's list. + * Failed subtransactions simply discard their lists. Since these lists + * are independent, there may be notify events in a subtransaction's list + * that duplicate events in some ancestor (sub) transaction; we get rid of + * the dups when merging the subtransaction's list into its parent's. * * Note: the action and notify lists do not interact within a transaction. * In particular, if a transaction does NOTIFY and then LISTEN on the same @@ -343,7 +355,20 @@ typedef struct Notification char *payload; /* payload string (can be empty) */ } Notification; -static List *pendingNotifies = NIL; /* list of Notifications */ +typedef struct NotificationList +{ + List *events; /* list of Notification structs */ + HTAB *hashtab; /* hash of NotificationHash structs, or NULL */ +} NotificationList; + +#define MIN_HASHABLE_NOTIFIES 16 /* threshold to build hashtab */ + +typedef struct NotificationHash +{ + Notification *event; /* => the actual Notification struct */ +} NotificationHash; + +static NotificationList *pendingNotifies = NULL; /* current list, if any */ static List *upperPendingNotifies = NIL; /* list of upper-xact lists */ @@ -393,6 +418,9 @@ static bool asyncQueueProcessPageEntries(volatile QueuePosition *current, static void asyncQueueAdvanceTail(void); static void ProcessIncomingNotify(void); static bool AsyncExistsPendingNotify(const char *channel, const char *payload); +static void AddEventToPendingNotifies(Notification *n); +static uint32 notification_hash(const void *key, Size keysize); +static int notification_match(const void *key1, const void *key2, Size keysize); static void ClearPendingActionsAndNotifies(void); /* @@ -586,11 +614,19 @@ Async_Notify(const char *channel, const char *payload) else n->payload = ""; - /* - * We want to preserve the order so we need to append every notification. - * See comments at AsyncExistsPendingNotify(). - */ - pendingNotifies = lappend(pendingNotifies, n); + if (pendingNotifies == NULL) + { + /* First notify event in current (sub)xact */ + pendingNotifies = (NotificationList *) palloc(sizeof(NotificationList)); + pendingNotifies->events = list_make1(n); + /* We certainly don't need a hashtable yet */ + pendingNotifies->hashtab = NULL; + } + else + { + /* Append more events to existing list */ + AddEventToPendingNotifies(n); + } MemoryContextSwitchTo(oldcontext); } @@ -761,7 +797,7 @@ PreCommit_Notify(void) { ListCell *p; - if (pendingActions == NIL && pendingNotifies == NIL) + if (!pendingActions && !pendingNotifies) return; /* no relevant statements in this xact */ if (Trace_notify) @@ -821,7 +857,7 @@ PreCommit_Notify(void) /* Now push the notifications into the queue */ backendHasSentNotifications = true; - nextNotify = list_head(pendingNotifies); + nextNotify = list_head(pendingNotifies->events); while (nextNotify != NULL) { /* @@ -1294,7 +1330,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe) * database OID in order to fill the page. So every page is always used up to * the last byte which simplifies reading the page later. * - * We are passed the list cell (in pendingNotifies) containing the next + * We are passed the list cell (in pendingNotifies->events) containing the next * notification to write and return the first still-unwritten cell back. * Eventually we will return NULL indicating all is done. * @@ -1345,7 +1381,7 @@ asyncQueueAddEntries(ListCell *nextNotify) if (offset + qe.length <= QUEUE_PAGESIZE) { /* OK, so advance nextNotify past this item */ - nextNotify = lnext(pendingNotifies, nextNotify); + nextNotify = lnext(pendingNotifies->events, nextNotify); } else { @@ -1607,7 +1643,7 @@ AtSubStart_Notify(void) Assert(list_length(upperPendingNotifies) == GetCurrentTransactionNestLevel() - 1); - pendingNotifies = NIL; + pendingNotifies = NULL; MemoryContextSwitchTo(old_cxt); } @@ -1621,7 +1657,7 @@ void AtSubCommit_Notify(void) { List *parentPendingActions; - List *parentPendingNotifies; + NotificationList *parentPendingNotifies; parentPendingActions = linitial_node(List, upperPendingActions); upperPendingActions = list_delete_first(upperPendingActions); @@ -1634,16 +1670,41 @@ AtSubCommit_Notify(void) */ pendingActions = list_concat(parentPendingActions, pendingActions); - parentPendingNotifies = linitial_node(List, upperPendingNotifies); + parentPendingNotifies = (NotificationList *) linitial(upperPendingNotifies); upperPendingNotifies = list_delete_first(upperPendingNotifies); Assert(list_length(upperPendingNotifies) == GetCurrentTransactionNestLevel() - 2); - /* - * We could try to eliminate duplicates here, but it seems not worthwhile. - */ - pendingNotifies = list_concat(parentPendingNotifies, pendingNotifies); + if (pendingNotifies == NULL) + { + /* easy, no notify events happened in current subxact */ + pendingNotifies = parentPendingNotifies; + } + else if (parentPendingNotifies == NULL) + { + /* easy, subxact's list becomes parent's */ + } + else + { + /* + * Formerly, we didn't need to eliminate duplicates here, but now we + * must, else we fall foul of "Assert(!found)", either here or during + * a later attempt to build the parent-level hashtable. + */ + NotificationList *childPendingNotifies = pendingNotifies; + ListCell *l; + + pendingNotifies = parentPendingNotifies; + /* Insert all the subxact's events into parent, except for dups */ + foreach(l, childPendingNotifies->events) + { + Notification *childn = (Notification *) lfirst(l); + + if (!AsyncExistsPendingNotify(childn->channel, childn->payload)) + AddEventToPendingNotifies(childn); + } + } } /* @@ -1672,7 +1733,7 @@ AtSubAbort_Notify(void) while (list_length(upperPendingNotifies) > my_level - 2) { - pendingNotifies = linitial_node(List, upperPendingNotifies); + pendingNotifies = (NotificationList *) linitial(upperPendingNotifies); upperPendingNotifies = list_delete_first(upperPendingNotifies); } } @@ -2102,50 +2163,149 @@ NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid) static bool AsyncExistsPendingNotify(const char *channel, const char *payload) { - ListCell *p; - Notification *n; - - if (pendingNotifies == NIL) + if (pendingNotifies == NULL) return false; if (payload == NULL) payload = ""; - /*---------- - * We need to append new elements to the end of the list in order to keep - * the order. However, on the other hand we'd like to check the list - * backwards in order to make duplicate-elimination a tad faster when the - * same condition is signaled many times in a row. So as a compromise we - * check the tail element first which we can access directly. If this - * doesn't match, we check the whole list. - * - * As we are not checking our parents' lists, we can still get duplicates - * in combination with subtransactions, like in: - * - * begin; - * notify foo '1'; - * savepoint foo; - * notify foo '1'; - * commit; - *---------- - */ - n = (Notification *) llast(pendingNotifies); - if (strcmp(n->channel, channel) == 0 && - strcmp(n->payload, payload) == 0) - return true; - - foreach(p, pendingNotifies) + if (pendingNotifies->hashtab != NULL) { - n = (Notification *) lfirst(p); - - if (strcmp(n->channel, channel) == 0 && - strcmp(n->payload, payload) == 0) + /* Use the hash table to probe for a match */ + Notification n; + Notification *k; + + /* set up a dummy Notification struct */ + n.channel = unconstify(char *, channel); + n.payload = unconstify(char *, payload); + k = &n; + /* ... and probe */ + if (hash_search(pendingNotifies->hashtab, + &k, + HASH_FIND, + NULL)) return true; } + else + { + /* Must scan the event list */ + ListCell *l; + + foreach(l, pendingNotifies->events) + { + Notification *n = (Notification *) lfirst(l); + + if (strcmp(n->channel, channel) == 0 && + strcmp(n->payload, payload) == 0) + return true; + } + } return false; } +/* + * Add a notification event to a pre-existing pendingNotifies list. + * + * Because pendingNotifies->events is already nonempty, this works + * correctly no matter what CurrentMemoryContext is. + */ +static void +AddEventToPendingNotifies(Notification *n) +{ + Assert(pendingNotifies->events != NIL); + + /* Create the hash table if it's time to */ + if (list_length(pendingNotifies->events) >= MIN_HASHABLE_NOTIFIES && + pendingNotifies->hashtab == NULL) + { + HASHCTL hash_ctl; + ListCell *l; + + /* Create the hash table */ + MemSet(&hash_ctl, 0, sizeof(hash_ctl)); + hash_ctl.keysize = sizeof(Notification *); + hash_ctl.entrysize = sizeof(NotificationHash); + hash_ctl.hash = notification_hash; + hash_ctl.match = notification_match; + hash_ctl.hcxt = CurTransactionContext; + pendingNotifies->hashtab = + hash_create("Pending Notifies", + 256L, + &hash_ctl, + HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); + + /* Insert all the already-existing events */ + foreach(l, pendingNotifies->events) + { + Notification *oldn = (Notification *) lfirst(l); + NotificationHash *hentry; + bool found; + + hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab, + &oldn, + HASH_ENTER, + &found); + Assert(!found); + hentry->event = oldn; + } + } + + /* Add new event to the list, in order */ + pendingNotifies->events = lappend(pendingNotifies->events, n); + + /* Add event to the hash table if needed */ + if (pendingNotifies->hashtab != NULL) + { + NotificationHash *hentry; + bool found; + + hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab, + &n, + HASH_ENTER, + &found); + Assert(!found); + hentry->event = n; + } +} + +/* + * notification_hash: hash function for notification hash table + * + * The hash "keys" are pointers to Notification structs. + */ +static uint32 +notification_hash(const void *key, Size keysize) +{ + const Notification *k = *(const Notification *const *) key; + uint32 hashc; + uint32 hashp; + + Assert(keysize == sizeof(Notification *)); + /* We just XOR the hashes for the two strings */ + hashc = DatumGetUInt32(hash_any((const unsigned char *) k->channel, + (int) strlen((const char *) k->channel))); + hashp = DatumGetUInt32(hash_any((const unsigned char *) k->payload, + (int) strlen((const char *) k->payload))); + return hashc ^ hashp; +} + +/* + * notification_match: match function to use with notification_hash + */ +static int +notification_match(const void *key1, const void *key2, Size keysize) +{ + const Notification *k1 = *(const Notification *const *) key1; + const Notification *k2 = *(const Notification *const *) key2; + + Assert(keysize == sizeof(Notification *)); + if (strcmp(k1->channel, k2->channel) == 0 && + strcmp(k1->payload, k2->payload) == 0) + return 0; /* equal */ + return 1; /* not equal */ +} + /* Clear the pendingActions and pendingNotifies lists. */ static void ClearPendingActionsAndNotifies(void) @@ -2158,5 +2318,5 @@ ClearPendingActionsAndNotifies(void) * pointers. */ pendingActions = NIL; - pendingNotifies = NIL; + pendingNotifies = NULL; }
I wrote: > I think that we'd probably be better off fixing the root performance issue > than adding semantic complexity to bypass it. ... > Accordingly, I looked into making a hash table when there are more than > a small number of notifications pending, and attached is a lightly-tested > version of that. This seems to be more or less similar speed to the > existing code for up to 100 or so distinct notifies, but it soon pulls > away above that. I noticed that the cfbot was unhappy with this, because it (intentionally) changes the results of the async-notify isolation tests I added awhile ago. So here's an updated version that adjusts that test, and also changes the NOTIFY documentation to remove the old weasel wording about whether we de-dup or not. > I also noticed that as things stand, it costs us two or three pallocs to > construct a Notification event. It wouldn't be terribly hard to reduce > that to one palloc, and maybe it'd be worthwhile if we're thinking that > transactions with many many notifies are a case worth optimizing. > But I didn't do that here; it seems like a separable patch. I also did that, attached as the second patch below. This way ends up requiring us to palloc the Notification event and then pfree it again, if it turns out to be a dup. Despite that, it's faster than the first patch alone, and also faster than HEAD in every case I tried. Not much faster, if there's not a lot of dups, but as far as I can find there isn't any case where it loses compared to HEAD. Even with subtransactions, where in principle the time to merge subtransaction event lists into the parent transaction ought to cost us. You can't get that to matter unless the subtransaction had a lot of distinct events, and then HEAD hits its O(N^2) behavior inside the subxact. So I can't really see any reason not to commit these. That leaves the question of whether we want to continue pursuing the proposed feature for user control of de-duping. I'd tend to vote against, because it seems like semantic complexity we don't need. While the idea sounds straightforward, I think it isn't so much when you start to think hard about how notifies issued with and without "collapse" ought to interact. regards, tom lane diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index e0e125a..d7dcbea 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -94,9 +94,9 @@ NOTIFY <replaceable class="parameter">channel</replaceable> [ , <replaceable cla </para> <para> - If the same channel name is signaled multiple times from the same - transaction with identical payload strings, the - database server can decide to deliver a single notification only. + If the same channel name is signaled multiple times with identical + payload strings within the same transaction, only one instance of the + notification event is delivered to listeners. On the other hand, notifications with distinct payload strings will always be delivered as distinct notifications. Similarly, notifications from different transactions will never get folded into one notification. diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 6e9c580..3f5f054 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -135,6 +135,7 @@ #include "storage/sinval.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/hashutils.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/snapmgr.h" @@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */ /* * State for outbound notifies consists of a list of all channels+payloads - * NOTIFYed in the current transaction. We do not actually perform a NOTIFY - * until and unless the transaction commits. pendingNotifies is NIL if no - * NOTIFYs have been done in the current transaction. + * NOTIFYed in the current transaction. We do not actually perform a NOTIFY + * until and unless the transaction commits. pendingNotifies is NULL if no + * NOTIFYs have been done in the current (sub) transaction. + * + * We discard duplicate notify events issued in the same transaction. + * Hence, in addition to the list proper (which we need to track the order + * of the events, since we guarantee to deliver them in order), we build a + * hash table which we can probe to detect duplicates. Since building the + * hash table is somewhat expensive, we do so only once we have at least + * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction; + * before that we just scan the events linearly. * * The list is kept in CurTransactionContext. In subtransactions, each * subtransaction has its own list in its own CurTransactionContext, but - * successful subtransactions attach their lists to their parent's list. - * Failed subtransactions simply discard their lists. + * successful subtransactions add their entries to their parent's list. + * Failed subtransactions simply discard their lists. Since these lists + * are independent, there may be notify events in a subtransaction's list + * that duplicate events in some ancestor (sub) transaction; we get rid of + * the dups when merging the subtransaction's list into its parent's. * * Note: the action and notify lists do not interact within a transaction. * In particular, if a transaction does NOTIFY and then LISTEN on the same @@ -343,7 +355,20 @@ typedef struct Notification char *payload; /* payload string (can be empty) */ } Notification; -static List *pendingNotifies = NIL; /* list of Notifications */ +typedef struct NotificationList +{ + List *events; /* list of Notification structs */ + HTAB *hashtab; /* hash of NotificationHash structs, or NULL */ +} NotificationList; + +#define MIN_HASHABLE_NOTIFIES 16 /* threshold to build hashtab */ + +typedef struct NotificationHash +{ + Notification *event; /* => the actual Notification struct */ +} NotificationHash; + +static NotificationList *pendingNotifies = NULL; /* current list, if any */ static List *upperPendingNotifies = NIL; /* list of upper-xact lists */ @@ -393,6 +418,9 @@ static bool asyncQueueProcessPageEntries(volatile QueuePosition *current, static void asyncQueueAdvanceTail(void); static void ProcessIncomingNotify(void); static bool AsyncExistsPendingNotify(const char *channel, const char *payload); +static void AddEventToPendingNotifies(Notification *n); +static uint32 notification_hash(const void *key, Size keysize); +static int notification_match(const void *key1, const void *key2, Size keysize); static void ClearPendingActionsAndNotifies(void); /* @@ -586,11 +614,19 @@ Async_Notify(const char *channel, const char *payload) else n->payload = ""; - /* - * We want to preserve the order so we need to append every notification. - * See comments at AsyncExistsPendingNotify(). - */ - pendingNotifies = lappend(pendingNotifies, n); + if (pendingNotifies == NULL) + { + /* First notify event in current (sub)xact */ + pendingNotifies = (NotificationList *) palloc(sizeof(NotificationList)); + pendingNotifies->events = list_make1(n); + /* We certainly don't need a hashtable yet */ + pendingNotifies->hashtab = NULL; + } + else + { + /* Append more events to existing list */ + AddEventToPendingNotifies(n); + } MemoryContextSwitchTo(oldcontext); } @@ -761,7 +797,7 @@ PreCommit_Notify(void) { ListCell *p; - if (pendingActions == NIL && pendingNotifies == NIL) + if (!pendingActions && !pendingNotifies) return; /* no relevant statements in this xact */ if (Trace_notify) @@ -821,7 +857,7 @@ PreCommit_Notify(void) /* Now push the notifications into the queue */ backendHasSentNotifications = true; - nextNotify = list_head(pendingNotifies); + nextNotify = list_head(pendingNotifies->events); while (nextNotify != NULL) { /* @@ -1294,7 +1330,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe) * database OID in order to fill the page. So every page is always used up to * the last byte which simplifies reading the page later. * - * We are passed the list cell (in pendingNotifies) containing the next + * We are passed the list cell (in pendingNotifies->events) containing the next * notification to write and return the first still-unwritten cell back. * Eventually we will return NULL indicating all is done. * @@ -1345,7 +1381,7 @@ asyncQueueAddEntries(ListCell *nextNotify) if (offset + qe.length <= QUEUE_PAGESIZE) { /* OK, so advance nextNotify past this item */ - nextNotify = lnext(pendingNotifies, nextNotify); + nextNotify = lnext(pendingNotifies->events, nextNotify); } else { @@ -1607,7 +1643,7 @@ AtSubStart_Notify(void) Assert(list_length(upperPendingNotifies) == GetCurrentTransactionNestLevel() - 1); - pendingNotifies = NIL; + pendingNotifies = NULL; MemoryContextSwitchTo(old_cxt); } @@ -1621,7 +1657,7 @@ void AtSubCommit_Notify(void) { List *parentPendingActions; - List *parentPendingNotifies; + NotificationList *parentPendingNotifies; parentPendingActions = linitial_node(List, upperPendingActions); upperPendingActions = list_delete_first(upperPendingActions); @@ -1634,16 +1670,41 @@ AtSubCommit_Notify(void) */ pendingActions = list_concat(parentPendingActions, pendingActions); - parentPendingNotifies = linitial_node(List, upperPendingNotifies); + parentPendingNotifies = (NotificationList *) linitial(upperPendingNotifies); upperPendingNotifies = list_delete_first(upperPendingNotifies); Assert(list_length(upperPendingNotifies) == GetCurrentTransactionNestLevel() - 2); - /* - * We could try to eliminate duplicates here, but it seems not worthwhile. - */ - pendingNotifies = list_concat(parentPendingNotifies, pendingNotifies); + if (pendingNotifies == NULL) + { + /* easy, no notify events happened in current subxact */ + pendingNotifies = parentPendingNotifies; + } + else if (parentPendingNotifies == NULL) + { + /* easy, subxact's list becomes parent's */ + } + else + { + /* + * Formerly, we didn't bother to eliminate duplicates here, but now we + * must, else we fall foul of "Assert(!found)", either here or during + * a later attempt to build the parent-level hashtable. + */ + NotificationList *childPendingNotifies = pendingNotifies; + ListCell *l; + + pendingNotifies = parentPendingNotifies; + /* Insert all the subxact's events into parent, except for dups */ + foreach(l, childPendingNotifies->events) + { + Notification *childn = (Notification *) lfirst(l); + + if (!AsyncExistsPendingNotify(childn->channel, childn->payload)) + AddEventToPendingNotifies(childn); + } + } } /* @@ -1672,7 +1733,7 @@ AtSubAbort_Notify(void) while (list_length(upperPendingNotifies) > my_level - 2) { - pendingNotifies = linitial_node(List, upperPendingNotifies); + pendingNotifies = (NotificationList *) linitial(upperPendingNotifies); upperPendingNotifies = list_delete_first(upperPendingNotifies); } } @@ -2102,50 +2163,149 @@ NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid) static bool AsyncExistsPendingNotify(const char *channel, const char *payload) { - ListCell *p; - Notification *n; - - if (pendingNotifies == NIL) + if (pendingNotifies == NULL) return false; if (payload == NULL) payload = ""; - /*---------- - * We need to append new elements to the end of the list in order to keep - * the order. However, on the other hand we'd like to check the list - * backwards in order to make duplicate-elimination a tad faster when the - * same condition is signaled many times in a row. So as a compromise we - * check the tail element first which we can access directly. If this - * doesn't match, we check the whole list. - * - * As we are not checking our parents' lists, we can still get duplicates - * in combination with subtransactions, like in: - * - * begin; - * notify foo '1'; - * savepoint foo; - * notify foo '1'; - * commit; - *---------- - */ - n = (Notification *) llast(pendingNotifies); - if (strcmp(n->channel, channel) == 0 && - strcmp(n->payload, payload) == 0) - return true; - - foreach(p, pendingNotifies) + if (pendingNotifies->hashtab != NULL) { - n = (Notification *) lfirst(p); - - if (strcmp(n->channel, channel) == 0 && - strcmp(n->payload, payload) == 0) + /* Use the hash table to probe for a match */ + Notification n; + Notification *k; + + /* set up a dummy Notification struct */ + n.channel = unconstify(char *, channel); + n.payload = unconstify(char *, payload); + k = &n; + /* ... and probe */ + if (hash_search(pendingNotifies->hashtab, + &k, + HASH_FIND, + NULL)) return true; } + else + { + /* Must scan the event list */ + ListCell *l; + + foreach(l, pendingNotifies->events) + { + Notification *n = (Notification *) lfirst(l); + + if (strcmp(n->channel, channel) == 0 && + strcmp(n->payload, payload) == 0) + return true; + } + } return false; } +/* + * Add a notification event to a pre-existing pendingNotifies list. + * + * Because pendingNotifies->events is already nonempty, this works + * correctly no matter what CurrentMemoryContext is. + */ +static void +AddEventToPendingNotifies(Notification *n) +{ + Assert(pendingNotifies->events != NIL); + + /* Create the hash table if it's time to */ + if (list_length(pendingNotifies->events) >= MIN_HASHABLE_NOTIFIES && + pendingNotifies->hashtab == NULL) + { + HASHCTL hash_ctl; + ListCell *l; + + /* Create the hash table */ + MemSet(&hash_ctl, 0, sizeof(hash_ctl)); + hash_ctl.keysize = sizeof(Notification *); + hash_ctl.entrysize = sizeof(NotificationHash); + hash_ctl.hash = notification_hash; + hash_ctl.match = notification_match; + hash_ctl.hcxt = CurTransactionContext; + pendingNotifies->hashtab = + hash_create("Pending Notifies", + 256L, + &hash_ctl, + HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); + + /* Insert all the already-existing events */ + foreach(l, pendingNotifies->events) + { + Notification *oldn = (Notification *) lfirst(l); + NotificationHash *hentry; + bool found; + + hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab, + &oldn, + HASH_ENTER, + &found); + Assert(!found); + hentry->event = oldn; + } + } + + /* Add new event to the list, in order */ + pendingNotifies->events = lappend(pendingNotifies->events, n); + + /* Add event to the hash table if needed */ + if (pendingNotifies->hashtab != NULL) + { + NotificationHash *hentry; + bool found; + + hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab, + &n, + HASH_ENTER, + &found); + Assert(!found); + hentry->event = n; + } +} + +/* + * notification_hash: hash function for notification hash table + * + * The hash "keys" are pointers to Notification structs. + */ +static uint32 +notification_hash(const void *key, Size keysize) +{ + const Notification *k = *(const Notification *const *) key; + uint32 hashc; + uint32 hashp; + + Assert(keysize == sizeof(Notification *)); + /* We just XOR the hashes for the two strings */ + hashc = DatumGetUInt32(hash_any((const unsigned char *) k->channel, + (int) strlen((const char *) k->channel))); + hashp = DatumGetUInt32(hash_any((const unsigned char *) k->payload, + (int) strlen((const char *) k->payload))); + return hashc ^ hashp; +} + +/* + * notification_match: match function to use with notification_hash + */ +static int +notification_match(const void *key1, const void *key2, Size keysize) +{ + const Notification *k1 = *(const Notification *const *) key1; + const Notification *k2 = *(const Notification *const *) key2; + + Assert(keysize == sizeof(Notification *)); + if (strcmp(k1->channel, k2->channel) == 0 && + strcmp(k1->payload, k2->payload) == 0) + return 0; /* equal */ + return 1; /* not equal */ +} + /* Clear the pendingActions and pendingNotifies lists. */ static void ClearPendingActionsAndNotifies(void) @@ -2158,5 +2318,5 @@ ClearPendingActionsAndNotifies(void) * pointers. */ pendingActions = NIL; - pendingNotifies = NIL; + pendingNotifies = NULL; } diff --git a/src/test/isolation/expected/async-notify.out b/src/test/isolation/expected/async-notify.out index 60ba506..7ad26b7 100644 --- a/src/test/isolation/expected/async-notify.out +++ b/src/test/isolation/expected/async-notify.out @@ -42,8 +42,6 @@ step notifys1: notifier: NOTIFY "c1" with payload "payload" from notifier notifier: NOTIFY "c2" with payload "payload" from notifier -notifier: NOTIFY "c1" with payload "payload" from notifier -notifier: NOTIFY "c2" with payload "payload" from notifier notifier: NOTIFY "c1" with payload "payloads" from notifier notifier: NOTIFY "c2" with payload "payloads" from notifier diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 3f5f054..6cb2d44 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -351,8 +351,10 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */ */ typedef struct Notification { - char *channel; /* channel name */ - char *payload; /* payload string (can be empty) */ + uint16 channel_len; /* length of channel-name string */ + uint16 payload_len; /* length of payload string */ + /* null-terminated channel name, then null-terminated payload follow */ + char data[FLEXIBLE_ARRAY_MEMBER]; } Notification; typedef struct NotificationList @@ -417,7 +419,7 @@ static bool asyncQueueProcessPageEntries(volatile QueuePosition *current, Snapshot snapshot); static void asyncQueueAdvanceTail(void); static void ProcessIncomingNotify(void); -static bool AsyncExistsPendingNotify(const char *channel, const char *payload); +static bool AsyncExistsPendingNotify(Notification *n); static void AddEventToPendingNotifies(Notification *n); static uint32 notification_hash(const void *key, Size keysize); static int notification_match(const void *key1, const void *key2, Size keysize); @@ -569,6 +571,8 @@ pg_notify(PG_FUNCTION_ARGS) void Async_Notify(const char *channel, const char *payload) { + size_t channel_len; + size_t payload_len; Notification *n; MemoryContext oldcontext; @@ -578,41 +582,53 @@ Async_Notify(const char *channel, const char *payload) if (Trace_notify) elog(DEBUG1, "Async_Notify(%s)", channel); + channel_len = channel ? strlen(channel) : 0; + payload_len = payload ? strlen(payload) : 0; + /* a channel name must be specified */ - if (!channel || !strlen(channel)) + if (channel_len == 0) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("channel name cannot be empty"))); - if (strlen(channel) >= NAMEDATALEN) + /* enforce length limits */ + if (channel_len >= NAMEDATALEN) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("channel name too long"))); - if (payload) - { - if (strlen(payload) >= NOTIFY_PAYLOAD_MAX_LENGTH) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("payload string too long"))); - } - - /* no point in making duplicate entries in the list ... */ - if (AsyncExistsPendingNotify(channel, payload)) - return; + if (payload_len >= NOTIFY_PAYLOAD_MAX_LENGTH) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("payload string too long"))); /* + * We must construct the Notification entry, even if we end up not using + * it, in order to compare it cheaply to existing list entries. + * * The notification list needs to live until end of transaction, so store * it in the transaction context. */ oldcontext = MemoryContextSwitchTo(CurTransactionContext); - n = (Notification *) palloc(sizeof(Notification)); - n->channel = pstrdup(channel); + n = (Notification *) palloc(offsetof(Notification, data) + + channel_len + payload_len + 2); + n->channel_len = channel_len; + n->payload_len = payload_len; + strcpy(n->data, channel); if (payload) - n->payload = pstrdup(payload); + strcpy(n->data + channel_len + 1, payload); else - n->payload = ""; + n->data[channel_len + 1] = '\0'; + + /* Now check for duplicates */ + if (AsyncExistsPendingNotify(n)) + { + /* It's a dup, so forget it */ + pfree(n); + MemoryContextSwitchTo(oldcontext); + return; + } if (pendingNotifies == NULL) { @@ -1303,8 +1319,8 @@ asyncQueueAdvance(volatile QueuePosition *position, int entryLength) static void asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe) { - size_t channellen = strlen(n->channel); - size_t payloadlen = strlen(n->payload); + size_t channellen = n->channel_len; + size_t payloadlen = n->payload_len; int entryLength; Assert(channellen < NAMEDATALEN); @@ -1317,8 +1333,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe) qe->dboid = MyDatabaseId; qe->xid = GetCurrentTransactionId(); qe->srcPid = MyProcPid; - memcpy(qe->data, n->channel, channellen + 1); - memcpy(qe->data + channellen + 1, n->payload, payloadlen + 1); + memcpy(qe->data, n->data, channellen + payloadlen + 2); } /* @@ -1701,7 +1716,7 @@ AtSubCommit_Notify(void) { Notification *childn = (Notification *) lfirst(l); - if (!AsyncExistsPendingNotify(childn->channel, childn->payload)) + if (!AsyncExistsPendingNotify(childn)) AddEventToPendingNotifies(childn); } } @@ -2159,29 +2174,18 @@ NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid) elog(INFO, "NOTIFY for \"%s\" payload \"%s\"", channel, payload); } -/* Does pendingNotifies include the given channel/payload? */ +/* Does pendingNotifies include a match for the given event? */ static bool -AsyncExistsPendingNotify(const char *channel, const char *payload) +AsyncExistsPendingNotify(Notification *n) { if (pendingNotifies == NULL) return false; - if (payload == NULL) - payload = ""; - if (pendingNotifies->hashtab != NULL) { /* Use the hash table to probe for a match */ - Notification n; - Notification *k; - - /* set up a dummy Notification struct */ - n.channel = unconstify(char *, channel); - n.payload = unconstify(char *, payload); - k = &n; - /* ... and probe */ if (hash_search(pendingNotifies->hashtab, - &k, + &n, HASH_FIND, NULL)) return true; @@ -2193,10 +2197,12 @@ AsyncExistsPendingNotify(const char *channel, const char *payload) foreach(l, pendingNotifies->events) { - Notification *n = (Notification *) lfirst(l); + Notification *oldn = (Notification *) lfirst(l); - if (strcmp(n->channel, channel) == 0 && - strcmp(n->payload, payload) == 0) + if (n->channel_len == oldn->channel_len && + n->payload_len == oldn->payload_len && + memcmp(n->data, oldn->data, + n->channel_len + n->payload_len + 2) == 0) return true; } } @@ -2278,16 +2284,11 @@ static uint32 notification_hash(const void *key, Size keysize) { const Notification *k = *(const Notification *const *) key; - uint32 hashc; - uint32 hashp; Assert(keysize == sizeof(Notification *)); - /* We just XOR the hashes for the two strings */ - hashc = DatumGetUInt32(hash_any((const unsigned char *) k->channel, - (int) strlen((const char *) k->channel))); - hashp = DatumGetUInt32(hash_any((const unsigned char *) k->payload, - (int) strlen((const char *) k->payload))); - return hashc ^ hashp; + /* We don't bother to include the payload's trailing null in the hash */ + return DatumGetUInt32(hash_any((const unsigned char *) k->data, + k->channel_len + k->payload_len + 1)); } /* @@ -2300,8 +2301,10 @@ notification_match(const void *key1, const void *key2, Size keysize) const Notification *k2 = *(const Notification *const *) key2; Assert(keysize == sizeof(Notification *)); - if (strcmp(k1->channel, k2->channel) == 0 && - strcmp(k1->payload, k2->payload) == 0) + if (k1->channel_len == k2->channel_len && + k1->payload_len == k2->payload_len && + memcmp(k1->data, k2->data, + k1->channel_len + k1->payload_len + 2) == 0) return 0; /* equal */ return 1; /* not equal */ }