Обсуждение: Timing events WIP v1
Attached is a first WIP saving Timing Events via a new hook, grabbed by a new pg_timing_events contrib module. This only implements a small portion of the RFP spec I presented earlier this month, and is by no means finished code looking for a regular review. This is just enough framework to do something useful while looking for parts that I suspect have tricky details. Right now this saves just a single line of text and inserts the hook into the checkpoint code, so output looks like this: $ psql -c "select * from pg_timing_events" -x -[ RECORD 1 ]- seq | 0 event | check/restartpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s Extending this to save the key/value set and most of the other data I mentioned before is pretty straightforward. There is a fair amount of debris in here from the pg_stat_statements code this started as. I've left a lot of that here but commented out because it gives examples of grabbing things like the user and database I'll need soon. There's a modest list of things that I've done or am facing soon that involve less obvious choices though, and I would appreciate feedback on these things in particular: -This will eventually aggregate data from clients running queries, the checkpointer process, and who knows what else in the future. The most challenging part of this so far is deciding how to handle the memory management side. I've tried just dumping everything into TopMemoryContext, and that not working as hoped is the biggest known bug right now. I'm not decided on whether to continue doing that but resolve the bug; if there is an alternate context that makes more sense for a contrib module like this; or if I should fix the size of the data and allocate everything at load time. The size of the data from log_lock_waits in particular will be hard to estimate in advance. -The event queue into a simple array accessed in circular fashion. After realizing that output needed to navigate in the opposite order of element addition, I ended up just putting all the queue navigation code directly into the add/output routines. I'd be happy to use a more formal Postgres type here instead--I looked at SHM_QUEUE for example--but I haven't found something yet that makes this any simpler. -I modeled the hook here on the logging one that went into 9.2. It's defined in its own include file and it gets initialized by the logging system. No strong justification for putting it there, it was just similar to the existing hook and that seemed reasonable. This is in some respects an alternate form of logging after all. The data sent by the hook itself needs to be more verbose in order to handle the full spec, right now I'm just asking about placement. -I'm growing increasingly worried about allowing concurrent reads of this data (which might be large and take a while to return to the client) without blocking insertions. The way that was split to improve pg_stat_statements concurrency doesn't convert naturally to this data. The serial number field in here is part of one idea I had. I might grab the header with an exclusive lock for only a moment and lookup the serial number of the last record I should display. Then I could drop to a share lock looping over the elements, stopping if I find one with a serial number over that. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Вложения
> Extending this to save the key/value set and most of the other data I > mentioned before is pretty straightforward. Why not use Hstore? Seriously? It would require merging Hstore into core, but I don't necessarily see that as a bad thing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Greg Smith wrote: > -The event queue into a simple array accessed in circular fashion. > After realizing that output needed to navigate in the opposite order > of element addition, I ended up just putting all the queue > navigation code directly into the add/output routines. I'd be happy > to use a more formal Postgres type here instead--I looked at > SHM_QUEUE for example--but I haven't found something yet that makes > this any simpler. SHM_QUEUE is on the death row. Try a dlist from src/backend/lib/ilist.c -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/15/12 10:26 AM, Josh Berkus wrote: > >> Extending this to save the key/value set and most of the other data I >> mentioned before is pretty straightforward. > > Why not use Hstore? Seriously? I just haven't done that part yet. The fact that hstore is the likely way to do it is why I said it's straightforward, and haven't gotten stressed about not having that finished. I wanted to try and shake out some feedback on hook location and how best to deal with memory context, both of which might impact this. Extending the current hook setup I've got here to pass data via hstore would mean it needs to pull into core. If there's a better approach for adding a hook thatavoids that I'd be happy to adopt it, I just haven't thought of one. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
<div class="moz-cite-prefix">On 11/16/2012 02:26 AM, Josh Berkus wrote:<br /></div><blockquote cite="mid:50A533CC.6010805@agliodbs.com"type="cite"><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">Extending this to save the key/value set and most of the other data I mentioned before is pretty straightforward. </pre></blockquote><pre wrap=""> Why not use Hstore? Seriously? </pre></blockquote> The only issue I see is that hstore's text format is non-standard, not widely understood and a pain towork with in applications. You can always write queries against it in Pg and extract normal row sets of key/value pairs,so it's not a big problem - but I'd want to think about adding a built-in `hstore_to_json` if we were going to usehstore to manage the data, since json is so much more widely understood by client applications.<br /><br /> Not that implementing`hstore_to_json` is exactly hard, mind you, as a quick search shows: <a href="https://gist.github.com/2318757">https://gist.github.com/2318757</a><pre class="moz-signature" cols="72">-- Craig Ringer <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQLDevelopment, 24x7 Support, Training & Services</pre>
On 11/16/12 12:20 AM, Craig Ringer wrote: > Not that implementing `hstore_to_json` is exactly hard, mind you, as a > quick search shows: https://gist.github.com/2318757 Both pulling hstore more firmly into core and adopting something like a hstore_to_json call as the preferred UI for timing event data are all directions I'd be happy to go. That's why I stopped before trying to even implement that part. I think the general direction to go is clear, but the details on how to present the resulting data is more of a tarp than even a bikeshed at this point. It's not a hard problem though. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
2012/11/20 Greg Smith <greg@2ndquadrant.com>: > On 11/16/12 12:20 AM, Craig Ringer wrote: > >> Not that implementing `hstore_to_json` is exactly hard, mind you, as a >> quick search shows: https://gist.github.com/2318757 > > > Both pulling hstore more firmly into core and adopting something like a > hstore_to_json call as the preferred UI for timing event data are all > directions I'd be happy to go. That's why I stopped before trying to even > implement that part. I think the general direction to go is clear, but the > details on how to present the resulting data is more of a tarp than even a > bikeshed at this point. It's not a hard problem though. > I don't like to see current hstore in core - It doesn't support nesting, it doesn't support different datatypes, it is not well supported by plpgsql regards Pavel > -- > Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 11/20/2012 04:48 PM, Pavel Stehule wrote: > 2012/11/20 Greg Smith <greg@2ndquadrant.com>: >> On 11/16/12 12:20 AM, Craig Ringer wrote: >> >>> Not that implementing `hstore_to_json` is exactly hard, mind you, as a >>> quick search shows: https://gist.github.com/2318757 >> >> Both pulling hstore more firmly into core and adopting something like a >> hstore_to_json call as the preferred UI for timing event data are all >> directions I'd be happy to go. That's why I stopped before trying to even >> implement that part. I think the general direction to go is clear, but the >> details on how to present the resulting data is more of a tarp than even a >> bikeshed at this point. It's not a hard problem though. >> > I don't like to see current hstore in core - It doesn't support > nesting, it doesn't support different datatypes, it is not well > supported by plpgsql > ... or by many client libraries, though converting hstore values to json notation for output usually takes care of that. I can't help but suspect that hstore will be subsumed by native storage and indexing of json objects, since these are already widely supported in KV or document stores. Of course, that requires that we come to some agreement on how to represent literal scalars for interaction with json, which hasn't seen much progress. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/20/12 8:02 PM, Craig Ringer wrote: > On 11/20/2012 04:48 PM, Pavel Stehule wrote: >> I don't like to see current hstore in core - It doesn't support >> nesting, it doesn't support different datatypes, it is not well >> supported by plpgsql >> > > ... or by many client libraries, though converting hstore values to json > notation for output usually takes care of that. The more I look at this, the less comfortable I am moving forward with hand waving this issue away. The obvious but seemingly all troublesome options: -Store things internally using hstore. This seems to require pulling hstore fully into core, which has this list of issues. Outputting through a JSON filter seems like a lot of overhead. And even if the output concerns are addressed, if there's a problem processing the raw data with PL/pgSQL that would be bad. I didn't think about that at all until Pavel brought it up. -Go back to a simpler composite type idea for storing this data. It feels wrong to create something that looks a lot like hstore data, but isn't. There's probably subtle display/client/processing issues in there I haven't thought of too. -Try to store the data in a JSON friendly way in the first place. That last one seems to lead right to... > I can't help but suspect that hstore will be subsumed by native storage > and indexing of json objects, since these are already widely supported > in KV or document stores. Of course, that requires that we come to some > agreement on how to represent literal scalars for interaction with json, > which hasn't seen much progress. If this is the Right Way to solve this problem, that's puts a serious snag in doing something useful with Timing Events in he near future. I know better than to try and fight against following the correct path once it's identified. I can't claim any good expertise on client encoding/transfer issues though; that's not a strong area for me. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
2012/12/12 Greg Smith <greg@2ndquadrant.com>: > On 11/20/12 8:02 PM, Craig Ringer wrote: >> >> On 11/20/2012 04:48 PM, Pavel Stehule wrote: > > >>> I don't like to see current hstore in core - It doesn't support >>> nesting, it doesn't support different datatypes, it is not well >>> supported by plpgsql >>> >> >> ... or by many client libraries, though converting hstore values to json >> notation for output usually takes care of that. > > > The more I look at this, the less comfortable I am moving forward with hand > waving this issue away. The obvious but seemingly all troublesome options: > > -Store things internally using hstore. This seems to require pulling hstore > fully into core, which has this list of issues. Outputting through a JSON > filter seems like a lot of overhead. And even if the output concerns are > addressed, if there's a problem processing the raw data with PL/pgSQL that > would be bad. I didn't think about that at all until Pavel brought it up. > > -Go back to a simpler composite type idea for storing this data. It feels > wrong to create something that looks a lot like hstore data, but isn't. > There's probably subtle display/client/processing issues in there I haven't > thought of too. > > -Try to store the data in a JSON friendly way in the first place. > > That last one seems to lead right to... > >> I can't help but suspect that hstore will be subsumed by native storage >> and indexing of json objects, since these are already widely supported >> in KV or document stores. Of course, that requires that we come to some >> agreement on how to represent literal scalars for interaction with json, >> which hasn't seen much progress. > > > If this is the Right Way to solve this problem, that's puts a serious snag > in doing something useful with Timing Events in he near future. I know > better than to try and fight against following the correct path once it's > identified. I can't claim any good expertise on client encoding/transfer > issues though; that's not a strong area for me. > I didn't watch this discussion all time, but I don't understand why you need a complex datatypes. Maybe you can do normalization and complex data types can hold only in memory. Regards Pavel > -- > Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
I decided to take a look at this. On 15 November 2012 09:56, Greg Smith <greg@2ndquadrant.com> wrote: > -I modeled the hook here on the logging one that went into 9.2. It's > defined in its own include file and it gets initialized by the logging > system. No strong justification for putting it there, it was just similar > to the existing hook and that seemed reasonable. This is in some respects > an alternate form of logging after all. The data sent by the hook itself > needs to be more verbose in order to handle the full spec, right now I'm > just asking about placement. I noticed that when !log_checkpoints, control never reaches the site where the hook is called, and thus the checkpoint info is not stored. Is that the intended behaviour of the patch? I don't feel strongly either way, but it did give me pause for a second. pg_stat I don't like the design of this patch. It seems like we should be moving as far away from an (internal) textual representation as possible - that's basically the main reason why logging is bad. Textual representations make easy, correct querying hard, are bloated, and are really no better than manual logging. The first question I'd ask is "what is true of all timing events?". You address this in your original RFC [1]. However, this isn't reflected in the datastructures that the patch uses, and I suspect that it should be. What I'm envisioning is new NodeTag infrastructure. You'd have an "abstract base class". This is sort of like the one used by plan nodes (i.e. the Plan struct). This would contain the information that is essential to all timing events (in the case of Plan, this includes startup and total costs for the node in question). You'd refactor structs like CheckpointStatsData, to compose them such that their first field was this parent struct (i.e. logically, they'd inherit from, say, TimingEvent, which itself would inherit from Node, in the style of Plan's children). That'd entail normalising things like inconsistent use of datatypes to represent time intervals and whatnot among these sorts of structs in the extant code, which is painful, but long term it seems like the way forward. With this commonality and variability analysis performed, and having refactored the code, you'd be able to use type punning to pass the essential information to your new hook. You'd have a way to usefully restrict statistics to only one class of timing event, because you could expose the NodeTag in your view (or perhaps a prettified version thereof), by putting something in a query predicate. You can perform aggregation, too. Maybe this abstract base class would look something like: typedef struct TimingEvent {NodeTag type; /* * Common structural data for all TimingEvent types. */double duration; /* Duration of event in milliseconds (per project policy for views) */double naffected; /* Number of objects (of variable type) affected */char *string; /* Non-standardisedevent string */ } TimingEvent; However, my failure to come up with more fields here leads me to believe that we'd be better off with a slightly different approach. Let's forget about storing text altogether, and make the unstructured "union" field JSON (while making fields common to all timing events plain-old scalar datums, for ease of use). It would become the job of PGTE_memsize() to make each entry in the shared hashtable wide enough to accommodate the widest of all possible TimingEvent-inheriting structs. We'd just store heterogeneous TimingEvent-inheriting structs in the hash table. It would be pg_timing_events job to call a JSON conversion routine directly, with baked-in knowledge, within a NodeTag case statement. It actually wouldn't be hard to have the view spit out some JSON from the internal, compact struct representation. Currently, explain.c manages to generate JSON representations of plans with very little fuss, and without using any of the JSON datatype stuff. Doing this with hstore is just way too controversial, and not particularly worth it, IMHO. With a "where event_type = x" in the query predicate, the JSON datums would have predictable, consistent structure, facilitating machine reading and aggregation. [1] Original RFC e-mail: http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 1/14/13 11:19 AM, Peter Geoghegan wrote: > I noticed that when !log_checkpoints, control never reaches the site > where the hook is called, and thus the checkpoint info is not stored. > Is that the intended behaviour of the patch? I was aware and considered it a defensible situation--that turning off checkpoint logging takes that data out everywhere. But it was not necessarily the right thing to do. > Currently, explain.c > manages to generate JSON representations of plans with very little > fuss, and without using any of the JSON datatype stuff. Doing this > with hstore is just way too controversial, and not particularly worth > it, IMHO. I wasn't optimistic after seeing the number of bugs that scurried out when the hstore rock was turned over for this job. On the implementation side, the next round of code I've been playing with here has struggled with the problem of rendering to strings earlier than I'd like. I'd like to delay that as long as possible; certainly not do it during storage, and preferably it only happens when someone asks for the timing event. > With a "where event_type = x" in the query predicate, the> JSON datums would have predictable, consistent structure, facilitating>machine reading and aggregation. Filtering on a range of timestamps or on the serial number field is the main thing that I imagined, as something that should limit results before even producing tuples. The expected and important case where someone wants "all timing events after #123" after persisting #123 to disk, I'd like that to be efficient. All of the fields I'd want to see filtering on are part of the fixed set of columns every entry will have. To summarize, your suggestion is to build an in-memory structure capable of holding the timing event data. The Datum approach will be used to cope with different types of events having different underlying types. The output format for queries against this data set will be JSON, rendered directly from the structure similarly to how EXPLAIN (FORMAT JSON) outputs query trees. The columns every row contains, like a serial number, timestamp, and pid, can be filtered on by something operating at the query executor level. Doing something useful with the more generic, "dynamic schema" parts will likely require parsing the JSON output. Those are all great ideas I think people could live with. It looks to me like the hook definition itself would need the entire data structure defined, and known to work, before its API could be nailed down. I was hoping that we might get a hook for diverting this data committed into 9.3 even if the full extension to expose it wasn't nailed down. That was based on similarity to the generic logging hook that went into 9.2. This new implementation idea reminds me more of the query length decoration needed for normalized pg_stat_statements though--something that wasn't easy to just extract out from the consumer at all. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com