Обсуждение: Thread-safe queueing?
I need to create a cross-process producer/consumer data queue (e.g. singly-linked list). That is - Processes A, B, and C add nodes to a controlled list and process D removes them. Not sure if the creation of the nodes would be best done by the producers or consumers, but destruction would have to be done by the consumer, as the producers don't wait for processing. For optimal results, the consumer process should sleep until item(s) are added to its queue. Query: within the existing backend framework, what's the best way to accomplish this? Thanks, Tim Holloway
Tim Holloway <mtsinc@southeast.net> writes: > I need to create a cross-process producer/consumer data queue > (e.g. singly-linked list). That is - Processes A, B, and C add nodes > to a controlled list and process D removes them. Not sure if the > creation of the nodes would be best done by the producers or > consumers, but destruction would have to be done by the consumer, as > the producers don't wait for processing. For optimal results, the > consumer process should sleep until item(s) are added to its queue. > Query: within the existing backend framework, what's the best way to > accomplish this? More context, please. What are you trying to accomplish? Is this really a communication path between backends (and if so, what backend code needs it?), or are you trying to set up a queue between SQL clients? How much data might need to be in the queue at one time? regards, tom lane
Tom Lane wrote: > > Tim Holloway <mtsinc@southeast.net> writes: > > I need to create a cross-process producer/consumer data queue > > (e.g. singly-linked list). That is - Processes A, B, and C add nodes > > to a controlled list and process D removes them. Not sure if the > > creation of the nodes would be best done by the producers or > > consumers, but destruction would have to be done by the consumer, as > > the producers don't wait for processing. For optimal results, the > > consumer process should sleep until item(s) are added to its queue. > > > Query: within the existing backend framework, what's the best way to > > accomplish this? > > More context, please. What are you trying to accomplish? Is this > really a communication path between backends (and if so, what backend > code needs it?), or are you trying to set up a queue between SQL > clients? How much data might need to be in the queue at one time? > > regards, tom lane > This is for the logging subsystem I'm developing. The backends call pg_log(), which is like elog(), except that the message is a resource ID + any parameters in order to support locales and custom message formatting. These ID+parameter packets are then pipelined down to the logging channels via the log engine to be formatted and output according to rules in the configuration file. I *think* that the log engine should be a distinct process. I'm not sure I can trust the output not to come out sliced and diced if each backend can run the engine directly -- and for that matter, I see problems if the engine is reconfigured on the fly owing to the need for each backend to replicate the configuration process (among other things). The basic singly-linked list component is all I need to handle the FIFO, but obviously I need guards to preserve its integrity. As to the amount of data involved, I sincerely hope the queue would stay pretty shallow! I have the configuration parser and logging engine operational, so the last significant hurdle is making sure that A) the data to be logged is accessable/addressable by the engine, and B) that the process runs in the proper sequence. A description of what it all will look like is now online at http://postgres.mousetech.com/index.html (with apologies for the ugly formatting). Thanks, TIm Holloway
Tim Holloway <mtsinc@southeast.net> writes: > Tom Lane wrote: >> More context, please. > This is for the logging subsystem I'm developing. The backends call > pg_log(), which is like elog(), except that the message is a resource > ID + any parameters in order to support locales and custom message > formatting. These ID+parameter packets are then pipelined down to the > logging channels via the log engine to be formatted and output > according to rules in the configuration file. OK. You probably want something roughly comparable to the shared-inval message queue --- see include/storage/sinvaladt.h and backend/storage/ipc/sinvaladt.c. That's more complex than your problem in one way (the sinval queue must be read by all N backends, not just one process) but simpler in another (we can shoehorn all SI messages into a fixed-size structure; is that practical for log data?). Anyway, a queue structure in shared memory protected by spinlocks is what you want, and sinval is about the closest thing we have to that at the moment. > I *think* that the log engine should be a distinct process. Probably so, if you use a shared-memory queue. Shared memory is a finite resource; AFAIK it's not practical to enlarge it on-the-fly. So you will have to set a maximum queue size --- either a compile-time constant, or at best a value chosen at postmaster start time. This implies that there will be scenarios where backends are waiting for room to be made in the log queue. If the queue emptier is a separate process then those waits can't turn into deadlocks. (sinval works around the memory-overflow problem with a special "reset" mechanism, but that doesn't seem appropriate for logging.) Alternatively, you could forget about a queue per se, and just allow each backend to execute the sending of its own log messages, using a spinlock in shared memory to prevent concurrent issuance of log messages on channels where that's a problem. That might be the simplest and most robust approach. regards, tom lane
> Alternatively, you could forget about a queue per se, and just allow > each backend to execute the sending of its own log messages, using > a spinlock in shared memory to prevent concurrent issuance of log > messages on channels where that's a problem. That might be the > simplest and most robust approach. Hold on. Unix guarantees all write() calls are atomic, so no one gets in between that write. Why not just collect the output into one buffer in the backend, and blast the entire buffer in one write() to the log file. I don't think there is any way another backend could mess that up. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> Alternatively, you could forget about a queue per se, and just allow >> each backend to execute the sending of its own log messages, using >> a spinlock in shared memory to prevent concurrent issuance of log >> messages on channels where that's a problem. That might be the >> simplest and most robust approach. > Hold on. Unix guarantees all write() calls are atomic, so no one gets > in between that write. Actually, I didn't say that I *believed* there were any channel types where such an interlock is essential ;-). I just said that spinlocking is a solution if the problem comes up. Tim mentioned on-the-fly reconfiguration of logging as an area that might need interlocking, and I'm more prepared to believe that. Still, we seem to be getting on just fine with each backend responding independently to reconfiguration of the pg_options values. So I'd be inclined to build it that way, and wait for evidence of a performance problem before spending effort to make it smarter. Which I guess is Bruce's point also... regards, tom lane