Обсуждение: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Поиск
Список
Период
Сортировка

Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Jeff Holt
Дата:

TLDR; If I spend the time necessary to instrument the many functions that are the equivalent of the Oracle counterparts, would anyone pull those changes and use them? Specifically, for those who know Oracle, I'm talking about implementing:

  1. The portion of the ALTER SESSION that enables extended SQL trace
  2. Most of the DBMS_MONITOR and DBMS_APPLICATION_INFO packages
  3. Instrument the thousand or so functions that are the equivalent of those found in Oracle's V$EVENT_NAME
  4. Dynamic performance view V$DIAG_INFO

For the last 35 years, I've made my living helping people solve Oracle performance problems by looking at it, which means:

Trace a user experience and profile the trace file to (a) reveal where the time has gone and its algorithm and (b) make it easy to imagine the cost of possible solutions as well as the savings in response time or resources.

I've even submitted change requests to improve Oracle's tracing features while working for them and since those glorious five years.

Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.

I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time. The biggest roadblock is that without a lot of automation, a user of any kind must log into the server and attempt to get the data that are now traditionally child's play for Oracle. The second biggest roadblock I see is the recompilation that is required for the server components (i.e., postgreSQL, operating system). My initial attempts to get anything useful out of postgreSQL were dismal failures and I think it should be infinitely easier.

Running either dtrace and eBPF scripts on the server should not be required. The instrumentation and the code being instrumented should be tightly coupled. Doing so will allow anyone on any platform for any PostgreSQL version to get a trace file just as easily as people do for Oracle.

Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/4/21 02:34, Laurenz Albe wrote:
> On Fri, 2021-10-01 at 15:06 -0500, Jeff Holt wrote:
>> TLDR; If I spend the time necessary to instrument the many functions that are the equivalent
>> of the Oracle counterparts, would anyone pull those changes and use them?
>> Specifically, for those who know Oracle, I'm talking about implementing:
>>     1. The portion of the ALTER SESSION that enables extended SQL trace
>>     2. Most of the DBMS_MONITOR and DBMS_APPLICATION_INFO packages
>>     3. Instrument the thousand or so functions that are the equivalent of those found in Oracle's V$EVENT_NAME
>>     4. Dynamic performance view V$DIAG_INFO
>> For the last 35 years, I've made my living helping people solve Oracle performance problems by looking at it
>>
> [...]
>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
> Anything that improves user experience in that respect is welcome, but consider
> that each database has different approaches to solve the same problems.
>
> Before you go to the length of implementing a lot of stuff, check in with
> the -hackers list and discuss your ideas.
>
> Please be a lot more specific than in this e-mail.  While it is certainly
> fine to sketch your ambitios vision, focus on one specific thing you can
> imagine implementing and come up with a design for that.
>
> Note that "Oracle has it" is not a good enough reason for a PostgreSQL
> feature.  We think we can do better than they do (at least in many respects).
> Also, don't assume that everyone on the -hackers list will be familiar with
> certain PostgreSQL features.
>
> One think that you should keep in mind is that Oracle has to provide different
> features in that area because they are not open source.  In PostgreSQL, I can
> simply read the code or attach a debugger to a backend, and when it comes to
> profiling, "perf" works pretty well.  So there is less need for these things.
>
> I don't want to discourage you, but contributing to PostgreSQL can be a lengthy
> and tedious process.  On the upside, things that make it into core are usually
> fairly mature.
>
> Yours,
> Laurenz Albe

Laurenz, you are obviously not aware who are you talking to. Let me 
introduce you: Cary Millsap and Jeff Holt are authors of the "Optimizing 
Oracle for Performance", one of the most influential books in the entire 
realm of  Oracle literature.  The book describes the method of tuning 
Oracle applications by examining where are they spending time and what 
are they waiting for. The book can be found on Amazon and I would 
seriously advise you to read it:


https://www.amazon.com/Optimizing-Oracle-Performance-Practitioners-Response-ebook/dp/B00BJ9A8SU/ref=sr_1_1?dchild=1&keywords=Optimizing+Oracle+for+Performance&qid=1633395886&s=books&sr=1-1

Haughty lectures about "Oracle has it" not being good enough could 
hardly be more out of place here. To put it as politely as is possible 
in this case, shut your pie hole. What Jeff is asking for is not 
something that "Oracle has", it's something that customers want. That 
was the case few years ago when I was asking for the optimizer hints. I 
was castigated by the former pastry baker turned Postgres guru and my 
reaction was simple: I threw Postgres out of the company that I was a 
working for as the lead DBA. You see, customer is always right, whether 
the database is open source or not. Needless to say, Postgres has 
optimizer hints these days. It still has them in "we do not want" part 
of the Wiki, which is hilarious.

You see, without proper event instrumentation, and knowing where the 
application spends time, it is not possible to exactly tune that 
application. Oracle used to have a witchcraft based lore like that, 
where the performance was estimated, based on buffer cache hit ratio, 
the famous "BCHR". That was known as "Method C". The name comes from 
Cary's and Jeff's book. Jeff and Cary are the ones who made the BCHR 
based black magic - obsolete.

In other words, Jeff is asking for a method to fine tune the 
applications with precision. Instead of being an a....rrogant person, 
you should have given him the answer:

https://github.com/postgrespro/pg_wait_sampling

Postgres already has an extension which implements around 60% of what 
Oracle has. Of course, Oracle's mechanism is somewhat more refined but 
it is also 20 years older. Cary Millsap, Anjo Kolk and Jeff Holt were 
implementing the instrumentation for Oracle 7. There was a huge pile of 
paper, printed off Metalink, a predecessor of "My Oracle Support", 
describing Oracle 7 events and explaining what Oracle was actually 
waiting for. At that time Cary Millsap was a VP in Oracle development. 
The book came out for Oracle8. You see, Jeff Holt really knows what he's 
asking for. You are the ignorant one, the one who engaged in talking at 
Jeff, not knowing that there already is an answer. There is no shame in 
not knowing something, people ask questions all the time. Arrogantly 
talking at someone and giving unsolicited lectures in what is 
appropriate and what is not is another thing altogether.

Finally, about the tone of this message: you really pissed me off. I had 
to restrain myself from using even stronger language, that was 
surprisingly hard to do. I wouldn't be surprised to see you giving 
haughty lectures about programming to Brian Kernighan or Dennis Ritchie. 
And yes, those two have allegedly also written a book.

Regards


-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
"Michaeldba@sqlexec.com"
Дата:
Mladen,

Shame on u lecturing a top notch guy in the PostgreSQL world, Laurenz Albe. I think Laurenz knows “a little bit” about
Oraclehaving written the popular extension, fdw_oracle, among his many other contributions to the PG world. So ironic
thatLaurenz was just named “PostgReSQL person of the week”, and then has to be subjected to this “tirade” of yours! 

Follow the PG protocol in submitting your change requests to core PG and stop your Bitchin!

Michael Vitale


Sent from my iPad

> On Oct 4, 2021, at 9:51 PM, Mladen Gogala <gogala.mladen@gmail.com> wrote:
>
> 
>> On 10/4/21 02:34, Laurenz Albe wrote:
>>> On Fri, 2021-10-01 at 15:06 -0500, Jeff Holt wrote:
>>> TLDR; If I spend the time necessary to instrument the many functions that are the equivalent
>>> of the Oracle counterparts, would anyone pull those changes and use them?
>>> Specifically, for those who know Oracle, I'm talking about implementing:
>>>    1. The portion of the ALTER SESSION that enables extended SQL trace
>>>    2. Most of the DBMS_MONITOR and DBMS_APPLICATION_INFO packages
>>>    3. Instrument the thousand or so functions that are the equivalent of those found in Oracle's V$EVENT_NAME
>>>    4. Dynamic performance view V$DIAG_INFO
>>> For the last 35 years, I've made my living helping people solve Oracle performance problems by looking at it
>>>
>> [...]
>>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>> Anything that improves user experience in that respect is welcome, but consider
>> that each database has different approaches to solve the same problems.
>>
>> Before you go to the length of implementing a lot of stuff, check in with
>> the -hackers list and discuss your ideas.
>>
>> Please be a lot more specific than in this e-mail.  While it is certainly
>> fine to sketch your ambitios vision, focus on one specific thing you can
>> imagine implementing and come up with a design for that.
>>
>> Note that "Oracle has it" is not a good enough reason for a PostgreSQL
>> feature.  We think we can do better than they do (at least in many respects).
>> Also, don't assume that everyone on the -hackers list will be familiar with
>> certain PostgreSQL features.
>>
>> One think that you should keep in mind is that Oracle has to provide different
>> features in that area because they are not open source.  In PostgreSQL, I can
>> simply read the code or attach a debugger to a backend, and when it comes to
>> profiling, "perf" works pretty well.  So there is less need for these things.
>>
>> I don't want to discourage you, but contributing to PostgreSQL can be a lengthy
>> and tedious process.  On the upside, things that make it into core are usually
>> fairly mature.
>>
>> Yours,
>> Laurenz Albe
>
> Laurenz, you are obviously not aware who are you talking to. Let me introduce you: Cary Millsap and Jeff Holt are
authorsof the "Optimizing Oracle for Performance", one of the most influential books in the entire realm of  Oracle
literature. The book describes the method of tuning Oracle applications by examining where are they spending time and
whatare they waiting for. The book can be found on Amazon and I would seriously advise you to read it: 
>
>
https://www.amazon.com/Optimizing-Oracle-Performance-Practitioners-Response-ebook/dp/B00BJ9A8SU/ref=sr_1_1?dchild=1&keywords=Optimizing+Oracle+for+Performance&qid=1633395886&s=books&sr=1-1
>
> Haughty lectures about "Oracle has it" not being good enough could hardly be more out of place here. To put it as
politelyas is possible in this case, shut your pie hole. What Jeff is asking for is not something that "Oracle has",
it'ssomething that customers want. That was the case few years ago when I was asking for the optimizer hints. I was
castigatedby the former pastry baker turned Postgres guru and my reaction was simple: I threw Postgres out of the
companythat I was a working for as the lead DBA. You see, customer is always right, whether the database is open source
ornot. Needless to say, Postgres has optimizer hints these days. It still has them in "we do not want" part of the
Wiki,which is hilarious. 
>
> You see, without proper event instrumentation, and knowing where the application spends time, it is not possible to
exactlytune that application. Oracle used to have a witchcraft based lore like that, where the performance was
estimated,based on buffer cache hit ratio, the famous "BCHR". That was known as "Method C". The name comes from Cary's
andJeff's book. Jeff and Cary are the ones who made the BCHR based black magic - obsolete. 
>
> In other words, Jeff is asking for a method to fine tune the applications with precision. Instead of being an
a....rrogantperson, you should have given him the answer: 
>
> https://github.com/postgrespro/pg_wait_sampling
>
> Postgres already has an extension which implements around 60% of what Oracle has. Of course, Oracle's mechanism is
somewhatmore refined but it is also 20 years older. Cary Millsap, Anjo Kolk and Jeff Holt were implementing the
instrumentationfor Oracle 7. There was a huge pile of paper, printed off Metalink, a predecessor of "My Oracle
Support",describing Oracle 7 events and explaining what Oracle was actually waiting for. At that time Cary Millsap was
aVP in Oracle development. The book came out for Oracle8. You see, Jeff Holt really knows what he's asking for. You are
theignorant one, the one who engaged in talking at Jeff, not knowing that there already is an answer. There is no shame
innot knowing something, people ask questions all the time. Arrogantly talking at someone and giving unsolicited
lecturesin what is appropriate and what is not is another thing altogether. 
>
> Finally, about the tone of this message: you really pissed me off. I had to restrain myself from using even stronger
language,that was surprisingly hard to do. I wouldn't be surprised to see you giving haughty lectures about programming
toBrian Kernighan or Dennis Ritchie. And yes, those two have allegedly also written a book. 
>
> Regards
>
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217
> https://dbwhisperer.wordpress.com
>
>
>




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Peter Geoghegan
Дата:
On Mon, Oct 4, 2021 at 6:51 PM Mladen Gogala <gogala.mladen@gmail.com> wrote:
> Haughty lectures about "Oracle has it" not being good enough could
> hardly be more out of place here. To put it as politely as is possible
> in this case, shut your pie hole. What Jeff is asking for is not
> something that "Oracle has", it's something that customers want. That
> was the case few years ago when I was asking for the optimizer hints. I
> was castigated by the former pastry baker turned Postgres guru and my
> reaction was simple: I threw Postgres out of the company that I was a
> working for as the lead DBA. You see, customer is always right, whether
> the database is open source or not. Needless to say, Postgres has
> optimizer hints these days. It still has them in "we do not want" part
> of the Wiki, which is hilarious.

In all sincerity: Chill out. I don't think that this is worth getting
into an argument over. I think that there is a good chance that you'd
have had a much better experience if the conversation had been in
person. Text has a way of losing a lot of important nuance.

I have personally met and enjoyed talking to quite a few people that
personally worked on Oracle, in various capacities -- the world of
database internals experts is not huge. I find Tanel Poder very
interesting, too -- never met the man, but we follow each other on
Twitter. Oracle is a system that has some interesting properties in
general (not just as a counterpoint to PostgreSQL), and I definitely
respect it. Same with SQL Server.

There are lots of smart and accomplished people in the world,
including Jeff. I think that it's worth understanding these
differences in perspective. There is likely to be merit in all of the
specific points made by both Laurenze and Jeff. They may not be
irreconcilable, or anything like it.

-- 
Peter Geoghegan



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/4/21 22:25, Michaeldba@sqlexec.com wrote:
> Mladen,
>
> Shame on u lecturing a top notch guy in the PostgreSQL world, Laurenz Albe. I think Laurenz knows “a little bit”
aboutOracle having written the popular extension, fdw_oracle, among his many other contributions to the PG world. So
ironicthat Laurenz was just named “PostgReSQL person of the week”, and then has to be subjected to this “tirade” of
yours!
>
> Follow the PG protocol in submitting your change requests to core PG and stop your Bitchin!
>
> Michael Vitale

First, a matter of format: please don't top-post. Replies go under the 
original posts. That's an unwritten rule, but a very time honored one. 
Second, I know very well who Laurenz Albe is. We have met on the 
oracle-l few decades ago. Third, I think that my reproach to Laurenz's 
tone is very justified.  You don't say "the argument that Python has it 
is not good enough" to Dennis Ritchie. Hopefully, you get my analogy, 
but one cannot ever be sure.

Last, I didn't request any new features from the Postgres community. 
That's a mistake that I'll never commit again. Last time I tried, this 
has happened:

https://www.toolbox.com/tech/data-management/blogs/why-postgresql-doesnt-have-query-hints-020411/

I still keep it in my bookmark folder, under "Humor". I used that 
article several times on the oracle-l as an illustration some properties 
of Postgres community. That article was a gift and I am sincerely 
grateful. Of course, PostgreSQL now has query hints.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/4/21 23:08, Peter Geoghegan wrote:
> n all sincerity: Chill out. I don't think that this is worth getting
> into an argument over. I think that there is a good chance that you'd
> have had a much better experience if the conversation had been in
> person. Text has a way of losing a lot of important nuance.
>
> I have personally met and enjoyed talking to quite a few people that
> personally worked on Oracle, in various capacities -- the world of
> database internals experts is not huge. I find Tanel Poder very
> interesting, too -- never met the man, but we follow each other on
> Twitter. Oracle is a system that has some interesting properties in
> general (not just as a counterpoint to PostgreSQL), and I definitely
> respect it. Same with SQL Server.
>
> There are lots of smart and accomplished people in the world,
> including Jeff. I think that it's worth understanding these
> differences in perspective. There is likely to be merit in all of the
> specific points made by both Laurenze and Jeff. They may not be
> irreconcilable, or anything like it.

What angered me was the presumptuous tone of voice directed to an Oracle 
legend. I have probably talked to many more Oracle people than you, 
including Tanel, whom I have met personally. I am not on Twitter, 
unfortunately I am older than 20. Before you ask, I am not on Instagram, 
Facebook or Tiktok. I am not on OnlyFans either. I have never understood 
the need to share one's every thought in real time. Being rather private 
person has probably stymied my career of an internet influencer. I'll 
never rival Kim Kardashian.

As for Jeff Holt, I believe that a person of his stature needs to be 
taken seriously and not lectured "how are things done in Postgres 
community". I  am rather confused by the thinly veiled hostility toward 
Oracle. In my opinion, Postgres community should be rather welcoming to 
Oracle people like Frits Hoogland, Frank Pachot or Jeff Holt. After all, 
we are using Postgres and telling us "you can't have what you used to 
get from Oracle" is not either encouraging or smart. If you want 
consulting jobs in big banks and for a decent money, you might just take 
Oracle people seriously. Have you ever wondered why Oracle has so many 
customers despite the fact that it's so freakishly expensive?

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Peter Geoghegan
Дата:
On Mon, Oct 4, 2021 at 9:04 PM Mladen Gogala <gogala.mladen@gmail.com> wrote:
> What angered me was the presumptuous tone of voice directed to an Oracle
> legend. I have probably talked to many more Oracle people than you,
> including Tanel, whom I have met personally. I am not on Twitter,
> unfortunately I am older than 20. Before you ask, I am not on Instagram,
> Facebook or Tiktok. I am not on OnlyFans either. I have never understood
> the need to share one's every thought in real time. Being rather private
> person has probably stymied my career of an internet influencer. I'll
> never rival Kim Kardashian.

You do seem shy.

> As for Jeff Holt, I believe that a person of his stature needs to be
> taken seriously and not lectured "how are things done in Postgres
> community".

I haven't met Jeff Holt either, but I believe that he is also older
than 20. I have to imagine that he doesn't particularly need you to
defend his honor.

> I  am rather confused by the thinly veiled hostility toward
> Oracle. In my opinion, Postgres community should be rather welcoming to
> Oracle people like Frits Hoogland, Frank Pachot or Jeff Holt. After all,
> we are using Postgres and telling us "you can't have what you used to
> get from Oracle" is not either encouraging or smart.

I agree with all that. I am also friendly with Frank, as it happens.

I think that Laurenze was just trying to establish common terms of
reference for discussion -- it's easy for two groups of people with
similar but different terminology to talk past each other. For
example, I think that there may be confusion about what is possible
with a tool like eBPF today, and what might be possible in an ideal
world.

-- 
Peter Geoghegan



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Ranier Vilela
Дата:
Em ter., 5 de out. de 2021 às 01:04, Mladen Gogala <gogala.mladen@gmail.com> escreveu:
As for Jeff Holt, I believe that a person of his stature needs to be
taken seriously and not lectured "how are things done in Postgres
community". I  am rather confused by the thinly veiled hostility toward
Oracle. In my opinion, Postgres community should be rather welcoming to
Oracle people like Frits Hoogland, Frank Pachot or Jeff Holt.
I think that you're a little mistaken, the hostility of the "gurus" is not exactly against Oracle guys,
but rather towards anyone who is not a "committer".
Just follow the pgsql-hackers list, and you'll see that newbies are very unwelcome,
whether they're really newbies like me, or they're really teachers.

regards,
Ranier Vilela

Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:

Comments in-line:

On 10/5/21 04:26, Laurenz Albe wrote:
 have never heard of Jeff Holt, but then there are a lot of wonderful
and smart people I have never heard of.  I tend to be respectful in
my conversation, regardless if I know the other person or not.

That much is apparent. However, that's no excuse.



Haughty lectures about "Oracle has it" not being good enough could 
hardly be more out of place here.
I have no idea how you arrive at the conclusion that I was delivering
a haughty lecture.  Somebody asked if PostgreSQL would consider applying
patches he is ready to write, somebody who seems not to be familiar
with the way PostgreSQL development works, so I tried to give helpful
pointers.

Your tone of voice did. Plus, you took it on yourself to explain "how things are done in the Postgres community".  I always use hints and Josh Berkus as an example "how things are done in the Postgres community" and why is the Postgres progress so slow. You have just provided me another perfect example of the "community spirit".


To put it as politely as is possible in this case, shut your pie hole.
I think you have just disqualified yourself from taking part in this
conversation.  I recommend that you don't embarrass Jeff Holt by trying
to champion him.
If you are under impression that I want to take part in a conversation, then you're sorely mistaken. And I have to adjust my style of writing to make things even more clear. As for Jeff, I don't need to 'champion him'. He did that all by himself. In his place, I would simply ignore both this topic and you, Mr. Postgres Community.
-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Peter Geoghegan
Дата:
On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:
> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>
> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their
responsetime.
 

I have heard of method R. Offhand it seems roughly comparable to
something like the Top-down Microarchitecture Analysis Method that low
level systems programmers sometimes use, along with Intel's pmu-tools
-- at least at a very high level. The point seems to be to provide a
workflow that can plausibly zero in on low-level bottlenecks, by
providing high level context. Many tricky real world problems are in
some sense a high level problem that is disguised as a low level
problem. And so all of the pieces need to be present on the board, so
to speak.

Does that sound accurate?

One obvious issue with much of the Postgres instrumentation is that it
makes it hard to see how things change over time. I think that that is
often *way* more informative than static snapshots.

I can see why you'd emphasize the need for PostgreSQL to more or less
own the end to end experience for something like this. It doesn't
necessarily follow that the underlying implementation cannot make use
of infrastructure like eBPF, though. Fast user space probes provably
have no overhead, and can be compiled-in by distros that can support
it. There hasn't been a consistent effort to make that stuff
available, but I doubt that that tells us much about what is possible.
The probes that we have today are somewhat of a grab-bag, that aren't
particularly useful -- so it's a chicken-and-egg thing.

It would probably be helpful if you could describe what you feel is
missing in more general terms -- while perhaps giving specific
practical examples of specific scenarios that give us some sense of
what the strengths of the model are. ISTM that it's not so much a lack
of automation in PostgreSQL. It's more like a lack of a generalized
model, which includes automation, but also some high level top-down
theory.

-- 
Peter Geoghegan



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
Comments in-line

On 10/5/21 16:24, Peter Geoghegan wrote:
> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:
>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>>
>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their
responsetime.
 
> I have heard of method R. Offhand it seems roughly comparable to
> something like the Top-down Microarchitecture Analysis Method that low
> level systems programmers sometimes use, along with Intel's pmu-tools
> -- at least at a very high level. The point seems to be to provide a
> workflow that can plausibly zero in on low-level bottlenecks, by
> providing high level context. Many tricky real world problems are in
> some sense a high level problem that is disguised as a low level
> problem. And so all of the pieces need to be present on the board, so
> to speak.
>
> Does that sound accurate?
Yes, that is pretty accurate. It is essentially the same method 
described in the "High Performance Computing" books. The trick is to 
figure what the process is waiting for and then reduce the wait times. 
All computers wait at the same speed.
> One obvious issue with much of the Postgres instrumentation is that it
> makes it hard to see how things change over time. I think that that is
> often *way* more informative than static snapshots.
>
> I can see why you'd emphasize the need for PostgreSQL to more or less
> own the end to end experience for something like this. It doesn't
> necessarily follow that the underlying implementation cannot make use
> of infrastructure like eBPF, though. Fast user space probes provably
> have no overhead, and can be compiled-in by distros that can support
> it. There hasn't been a consistent effort to make that stuff
> available, but I doubt that that tells us much about what is possible.
> The probes that we have today are somewhat of a grab-bag, that aren't
> particularly useful -- so it's a chicken-and-egg thing.

Not exactly. There already is a very good extension for Postgres called 
pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling

What is missing here is mostly the documentation. This extension should 
become a part of Postgres proper and the events should be documented as 
they are (mostly) documented for Oracle. Oracle uses trace files 
instead. However, with Postgres equivalence of files and tables, this is 
not a big difference.


>
> It would probably be helpful if you could describe what you feel is
> missing in more general terms -- while perhaps giving specific
> practical examples of specific scenarios that give us some sense of
> what the strengths of the model are. ISTM that it's not so much a lack
> of automation in PostgreSQL. It's more like a lack of a generalized
> model, which includes automation, but also some high level top-down
> theory.

I am not Jeff and my opinion is not as valuable and doesn't carry the 
same weight, by far. However, I do believe that we may not see Jeff Holt 
again on this group so I am providing my opinion instead. At least I 
would, in Jeff's place, be reluctant to return to this group.


-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Tim
Дата:
Jeff Holt is probably pretty embarrassed there's some blowhard making a scene using his name in a casual mailing list thread.

On Tue, Oct 5, 2021 at 5:28 PM Mladen Gogala <gogala.mladen@gmail.com> wrote:
Comments in-line

On 10/5/21 16:24, Peter Geoghegan wrote:
> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:
>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>>
>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time.
> I have heard of method R. Offhand it seems roughly comparable to
> something like the Top-down Microarchitecture Analysis Method that low
> level systems programmers sometimes use, along with Intel's pmu-tools
> -- at least at a very high level. The point seems to be to provide a
> workflow that can plausibly zero in on low-level bottlenecks, by
> providing high level context. Many tricky real world problems are in
> some sense a high level problem that is disguised as a low level
> problem. And so all of the pieces need to be present on the board, so
> to speak.
>
> Does that sound accurate?
Yes, that is pretty accurate. It is essentially the same method
described in the "High Performance Computing" books. The trick is to
figure what the process is waiting for and then reduce the wait times.
All computers wait at the same speed.
> One obvious issue with much of the Postgres instrumentation is that it
> makes it hard to see how things change over time. I think that that is
> often *way* more informative than static snapshots.
>
> I can see why you'd emphasize the need for PostgreSQL to more or less
> own the end to end experience for something like this. It doesn't
> necessarily follow that the underlying implementation cannot make use
> of infrastructure like eBPF, though. Fast user space probes provably
> have no overhead, and can be compiled-in by distros that can support
> it. There hasn't been a consistent effort to make that stuff
> available, but I doubt that that tells us much about what is possible.
> The probes that we have today are somewhat of a grab-bag, that aren't
> particularly useful -- so it's a chicken-and-egg thing.

Not exactly. There already is a very good extension for Postgres called
pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling

What is missing here is mostly the documentation. This extension should
become a part of Postgres proper and the events should be documented as
they are (mostly) documented for Oracle. Oracle uses trace files
instead. However, with Postgres equivalence of files and tables, this is
not a big difference.


>
> It would probably be helpful if you could describe what you feel is
> missing in more general terms -- while perhaps giving specific
> practical examples of specific scenarios that give us some sense of
> what the strengths of the model are. ISTM that it's not so much a lack
> of automation in PostgreSQL. It's more like a lack of a generalized
> model, which includes automation, but also some high level top-down
> theory.

I am not Jeff and my opinion is not as valuable and doesn't carry the
same weight, by far. However, I do believe that we may not see Jeff Holt
again on this group so I am providing my opinion instead. At least I
would, in Jeff's place, be reluctant to return to this group.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/5/21 20:02, Tim wrote:

> Jeff Holt is probably pretty embarrassed there's some blowhard making 
> a scene using his name in a casual mailing list thread.

Wow! What a contribution to the discussion! Calling me a blowhard, all 
while top-posting at the same time. Your post will be remembered for 
generations to come.

Or not. Laurenz will probably tell you that we don't top-post in 
Postgres community. He's good with rules, regulations and the way things 
are done in Postgres community.


-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Jeremy Schneider
Дата:
On 10/5/21 13:24, Peter Geoghegan wrote:
> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:
>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>>
>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their
responsetime.
 
> 
> It would probably be helpful if you could describe what you feel is
> missing in more general terms -- while perhaps giving specific
> practical examples of specific scenarios that give us some sense of
> what the strengths of the model are. ISTM that it's not so much a lack
> of automation in PostgreSQL. It's more like a lack of a generalized
> model, which includes automation, but also some high level top-down
> theory.

Back in my oracle days, I formally used method-R on a few consulting
gigs while working with Hotsos (RIP Gary). Method-R is brilliant, and I
referenced it in my PostgreSQL user group talk about wait events in PG.

https://www.slideshare.net/ardentperf/wait-whats-going-on-inside-my-database-173880246

I'm not the author of Method-R, but I myself would describe it as a
methodical approach to consistently solve business problems rooted in
database performance faster than any other methodical approach, built on
a foundation of wait events, queuing theory and tracing (aka logging).
But the most brilliant part is how Cary Millsap's tireless efforts to
simplify, automate and educate have made it accessible to ordinary data
analysts and project managers all over the world who speak SQL but not C.

PostgreSQL added wait events starting in 9.6 and the last thing that's
missing is an integrated way to trace or log them. A simple starting
point could be a session-level GUC that enables a hook in
pgstat_report_wait_start() and pgstat_report_wait_end() to just drop
messages in the log. These log messages could then easily be processed
to generate the similar profiles to the ones we used with other
databases. Basically I agree 100% with Jeff that while you can do these
things with perf probes or eBPF, there are massive advantages to having
it baked in the database. With the right tools, this makes session
profiling available to regular users (who do their day jobs with excel
rather than eBPF).

However, one problem to watch out for will be whether the existing
PostgreSQL logging infrastructure can handle this. Probably need higher
precision timestamps (I need to check what csvlog has), and it could
still be a lot of volume with some lightweight locks. Whereas Oracle had
each individual process write the wait event trace messages to its own
file, today PostgreSQL only supports either the single-system-wide-file
logging collector, or syslog which I think can only split to 8
destinations (and may be lossy).

There's another use case where high logging bandwidth could also be
useful - temporarily logging all SQL statements to capture workload.
Next time I see someone take down their production database because the
pgBadger doc said "log_min_duration_statement = 0" ... WHY PGBADGER WHY?

Anyway I do hope there will be some improvements in this area with
PostgreSQL. I'm not much of a C coder but maybe I'll take a swing at it
some day!

Anyway, Jeff, nice to see you here - and this is a topic I've thought
about a lot too. PostgreSQL is a pretty cool bit of software, and an
even cooler group of people around it. Hope to see you around some more.  :)

-Jeremy


PS. "tracing versus sampling" was the perpetual debate amongst
performance engineers... we could have some good fun debating along
those lines too. hold my beer


-- 
http://about.me/jeremy_schneider



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/7/21 22:15, Jeremy Schneider wrote:
> On 10/5/21 13:24, Peter Geoghegan wrote:
>> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:
>>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>>>
>>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their
responsetime.
 
>> It would probably be helpful if you could describe what you feel is
>> missing in more general terms -- while perhaps giving specific
>> practical examples of specific scenarios that give us some sense of
>> what the strengths of the model are. ISTM that it's not so much a lack
>> of automation in PostgreSQL. It's more like a lack of a generalized
>> model, which includes automation, but also some high level top-down
>> theory.
> Back in my oracle days, I formally used method-R on a few consulting
> gigs while working with Hotsos (RIP Gary). Method-R is brilliant, and I
> referenced it in my PostgreSQL user group talk about wait events in PG.
>
> https://www.slideshare.net/ardentperf/wait-whats-going-on-inside-my-database-173880246
>
> I'm not the author of Method-R, but I myself would describe it as a
> methodical approach to consistently solve business problems rooted in
> database performance faster than any other methodical approach, built on
> a foundation of wait events, queuing theory and tracing (aka logging).
> But the most brilliant part is how Cary Millsap's tireless efforts to
> simplify, automate and educate have made it accessible to ordinary data
> analysts and project managers all over the world who speak SQL but not C.
>
> PostgreSQL added wait events starting in 9.6 and the last thing that's
> missing is an integrated way to trace or log them. A simple starting
> point could be a session-level GUC that enables a hook in
> pgstat_report_wait_start() and pgstat_report_wait_end() to just drop
> messages in the log. These log messages could then easily be processed
> to generate the similar profiles to the ones we used with other
> databases. Basically I agree 100% with Jeff that while you can do these
> things with perf probes or eBPF, there are massive advantages to having
> it baked in the database. With the right tools, this makes session
> profiling available to regular users (who do their day jobs with excel
> rather than eBPF).
>
> However, one problem to watch out for will be whether the existing
> PostgreSQL logging infrastructure can handle this. Probably need higher
> precision timestamps (I need to check what csvlog has), and it could
> still be a lot of volume with some lightweight locks. Whereas Oracle had
> each individual process write the wait event trace messages to its own
> file, today PostgreSQL only supports either the single-system-wide-file
> logging collector, or syslog which I think can only split to 8
> destinations (and may be lossy).
>
> There's another use case where high logging bandwidth could also be
> useful - temporarily logging all SQL statements to capture workload.
> Next time I see someone take down their production database because the
> pgBadger doc said "log_min_duration_statement = 0" ... WHY PGBADGER WHY?
>
> Anyway I do hope there will be some improvements in this area with
> PostgreSQL. I'm not much of a C coder but maybe I'll take a swing at it
> some day!
>
> Anyway, Jeff, nice to see you here - and this is a topic I've thought
> about a lot too. PostgreSQL is a pretty cool bit of software, and an
> even cooler group of people around it. Hope to see you around some more.  :)
>
> -Jeremy
>
>
> PS. "tracing versus sampling" was the perpetual debate amongst
> performance engineers... we could have some good fun debating along
> those lines too. hold my beer
>
>
Hi Jeremy,

There is an extension which does wait event sampling:

https://github.com/postgrespro/pg_wait_sampling

It's one of the Postgres Pro extensions, I like it a lot. Postgres Pro 
is getting very popular on the Azure cloud. It's essentially Microsoft 
response to Aurora. Also EnterpriseDB has the event interface and the 
views analogous to Oracle: edb$session_wait_history, edb$session_waits 
and edb$system_waits views are implementing the event interface in Edb. 
You can look them up in the documentation, the documentation is 
available on the web. The foundation is already laid, what is needed are 
the finishing touches, like the detailed event documentation. I am 
currently engaged in a pilot porting project, porting an application 
from Oracle to Postgres.  I was looking into the event interface in 
detail. And we are testing the EDB as well.  As an Oraclite to Oraclite, 
I have to commend EDB, it's an excellent piece of software, 75% cheaper 
than Oracle.

I agree with you about the logging capacity. Postgres is very loquacious 
when it comes to logging. I love that feature because pgBadger reports 
are even better than the AWR reports. Oracle is very loquacious and 
verbose too. $ORACLE_BASE/diag/rdbms/.../trace is chock full of trace 
files plus the alert log, of course. That is why the adrci utility has 
parameters for the automatic cleanup of the traceand core dump files. 
Sometimes they did fill the file system.

As for the "tracing vs. sampling" debate, Oracle has both. 
V$ACTIVE_SESSION_HISTORY is a sampling view. Sampling views are more 
practical, especially when there are pooled connections. Personally, I 
would prefer sampling.



-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Jeremy Schneider
Дата:
On Oct 7, 2021, at 19:38, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Oct  7, 2021 at 07:15:39PM -0700, Jeremy Schneider wrote:
>> PostgreSQL added wait events starting in 9.6 and the last thing that's
>> missing is an integrated way to trace or log them. A simple starting
>> point could be a session-level GUC that enables a hook in
>> pgstat_report_wait_start() and pgstat_report_wait_end() to just drop
>> messages in the log. These log messages could then easily be processed
>> to generate the similar profiles to the ones we used with other
>> databases. Basically I agree 100% with Jeff that while you can do these
>> things with perf probes or eBPF, there are massive advantages to having
>> it baked in the database. With the right tools, this makes session
>> profiling available to regular users (who do their day jobs with excel
>> rather than eBPF).
>
> Our wait events reported in pg_stat_activity are really only a first
> step --- I always felt it needed an external tool to efficiently collect
> and report those wait events.  I don't think the server log is the right
> place to collect them.

What would you think about adding hooks to the functions I mentioned, if someone wrote an open source extension that
coulddo things with the wait event start/stop times in a preload library? 

But we could use parameters too, that’s another gap. For example - which buffer, object, etc for buffer_content? Which
filenodeand block for an IO? Which relation OID for a SQL lock? Then you can find which table, whether the hot block is
aroot or leaf of a btree, etc. This can be done by extending the wait infra to accept two or three arbitrary
“informational”parameters, maybe just numeric for efficiency, or maybe string, and each individual wait event can
decidewhat to do with them. We’d want to pass that info out over the hooks too. This is another reason to support wait
eventtracing in the DB - sometimes it might be difficult to get all the relevant context with a kernel probe on an
externaltool. 

-Jeremy

Sent from my TI-83




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Bruce Momjian
Дата:
On Thu, Oct  7, 2021 at 11:35:16PM -0400, Mladen Gogala wrote:
> 
> On 10/7/21 22:15, Jeremy Schneider wrote:
> There is an extension which does wait event sampling:
> 
> https://github.com/postgrespro/pg_wait_sampling
> 
> It's one of the Postgres Pro extensions, I like it a lot. Postgres Pro is
> getting very popular on the Azure cloud. It's essentially Microsoft response
> to Aurora. Also EnterpriseDB has the event interface and the views analogous
> to Oracle: edb$session_wait_history, edb$session_waits and edb$system_waits
> views are implementing the event interface in Edb. You can look them up in
> the documentation, the documentation is available on the web. The foundation
> is already laid, what is needed are the finishing touches, like the detailed
> event documentation. I am currently engaged in a pilot porting project,

Ah, this is exactly what I wanted to know --- what people are using the
event waits for.  Can you tell if these are done all externally, or if
they need internal database changes?

> I agree with you about the logging capacity. Postgres is very loquacious
> when it comes to logging. I love that feature because pgBadger reports are
> even better than the AWR reports. Oracle is very loquacious and verbose too.

Nice, I had not heard that before.

> As for the "tracing vs. sampling" debate, Oracle has both.
> V$ACTIVE_SESSION_HISTORY is a sampling view. Sampling views are more
> practical, especially when there are pooled connections. Personally, I would
> prefer sampling.

Yes, slide 101 here:

    https://momjian.us/main/writings/pgsql/administration.pdf#page=101

shows the Postgres monitoring options for reporting and
alterting/aggegation.  Yes, both are needed for wait event, and right
now we really don't have either for wait events --- just the raw
information.

However, I also need to ask how the wait event information, whether
tracing or sampling, can be useful for Postgres because that will drive
the solution.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Thomas Kellerer
Дата:
Bruce Momjian schrieb am 08.10.2021 um 17:21:
> However, I also need to ask how the wait event information, whether
> tracing or sampling, can be useful for Postgres because that will drive
> the solution.

I guess everyone will use that information in a different way.

We typically use the AWR reports as a post-mortem analysis tool if
something goes wrong in our application (=customer specific projects)

E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning",
then we usually request an AWR report from the time span in question. Quite frequently
this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history.

So in our case it's not really used for active monitoring, but for
finding the root cause after the fact.

I don't know how representative this usage is though.

Thomas




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Bruce Momjian
Дата:
On Fri, Oct  8, 2021 at 05:28:37PM +0200, Thomas Kellerer wrote:
> Bruce Momjian schrieb am 08.10.2021 um 17:21:
> > However, I also need to ask how the wait event information, whether
> > tracing or sampling, can be useful for Postgres because that will drive
> > the solution.
> 
> I guess everyone will use that information in a different way.
> 
> We typically use the AWR reports as a post-mortem analysis tool if
> something goes wrong in our application (=customer specific projects)
> 
> E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning",
> then we usually request an AWR report from the time span in question. Quite frequently
> this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history.
> 
> So in our case it's not really used for active monitoring, but for
> finding the root cause after the fact.
> 
> I don't know how representative this usage is though.

OK, that's a good usecase, and something that certainly would apply to
Postgres.  Don't you often need more than just wait events to find the
cause, like system memory usage, total I/O, etc?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Julien Rouhaud
Дата:
On Fri, Oct 8, 2021 at 11:40 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Oct  8, 2021 at 05:28:37PM +0200, Thomas Kellerer wrote:
> >
> > We typically use the AWR reports as a post-mortem analysis tool if
> > something goes wrong in our application (=customer specific projects)
> >
> > E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning",
> > then we usually request an AWR report from the time span in question. Quite frequently
> > this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history.
> >
> > So in our case it's not really used for active monitoring, but for
> > finding the root cause after the fact.
> >
> > I don't know how representative this usage is though.
>
> OK, that's a good usecase, and something that certainly would apply to
> Postgres.  Don't you often need more than just wait events to find the
> cause, like system memory usage, total I/O, etc?

You usually need a variety of metrics to be able to find what is
actually causing $random_incident, so the more you can aggregate in
your performance tool the better.  Wait events are an important piece
of that puzzle.

As a quick example for wait events, I recently had to diagnose some
performance issue, which turned out to be some process reaching the 64
subtransactions with the well known consequences.  I had
pg_wait_sampling aggregated metrics available so it was really easy to
know that the slowdown was due to that.  Knowing what application
exactly reached those 64 subtransactions is another story.



Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Thomas Kellerer
Дата:
Bruce Momjian schrieb am 08.10.2021 um 17:40:
>> I guess everyone will use that information in a different way.
>>
>> We typically use the AWR reports as a post-mortem analysis tool if
>> something goes wrong in our application (=customer specific projects)
>>
>> E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning",
>> then we usually request an AWR report from the time span in question. Quite frequently
>> this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history.
>>
>> So in our case it's not really used for active monitoring, but for
>> finding the root cause after the fact.
>>
>> I don't know how representative this usage is though.
>
> OK, that's a good usecase, and something that certainly would apply to
> Postgres.  Don't you often need more than just wait events to find the
> cause, like system memory usage, total I/O, etc?

Yes, the AWR report contains that information as well. e.g. sorts that spilled
to disk, shared memory at the start and end, top 10 statements sorted by
total time, individual time, I/O, number of executions, segments (tables)
that received the highest I/O (read and write) and so on.
It's really huge.





Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Mladen Gogala
Дата:
On 10/8/21 11:21, Bruce Momjian wrote:
> Ah, this is exactly what I wanted to know --- what people are using the
> event waits for.  Can you tell if these are done all externally, or if
> they need internal database changes?
Well, the methodology goes like this: we get the slow queries from 
pgBadger report and then run explain (analyze, timing, buffers) on the 
query. If we still cannot figure out how to improve things, we check the 
events and see what the query is waiting for. After that we may add an 
index, partition the table, change index structure or do something like 
that. Unrelated to this discussion, I discovered Bloom extension. Bloom 
indexes are phenomenally useful. I apologize for the digression.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Franck Pachot
Дата:


On Sun, Oct 10, 2021 at 11:06 PM Jeff Holt <jeff.holt@method-r.com> wrote:

TLDR; If I spend the time necessary to instrument the many functions that are the equivalent of the Oracle counterparts, would anyone pull those changes and use them? Specifically, for those who know Oracle, I'm talking about implementing:

  1. The portion of the ALTER SESSION that enables extended SQL trace
  2. Most of the DBMS_MONITOR and DBMS_APPLICATION_INFO packages
  3. Instrument the thousand or so functions that are the equivalent of those found in Oracle's V$EVENT_NAME
  4. Dynamic performance view V$DIAG_INFO

For the last 35 years, I've made my living helping people solve Oracle performance problems by looking at it, which means:

Trace a user experience and profile the trace file to (a) reveal where the time has gone and its algorithm and (b) make it easy to imagine the cost of possible solutions as well as the savings in response time or resources.

I've even submitted change requests to improve Oracle's tracing features while working for them and since those glorious five years.

Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.

I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time. The biggest roadblock is that without a lot of automation, a user of any kind must log into the server and attempt to get the data that are now traditionally child's play for Oracle. The second biggest roadblock I see is the recompilation that is required for the server components (i.e., postgreSQL, operating system). My initial attempts to get anything useful out of postgreSQL were dismal failures and I think it should be infinitely easier.

Running either dtrace and eBPF scripts on the server should not be required. The instrumentation and the code being instrumented should be tightly coupled. Doing so will allow anyone on any platform for any PostgreSQL version to get a trace file just as easily as people do for Oracle.


I hope this kind of instrumentation will make its way to PostgreSQL one day. Knowing where the time is spent changes the performance troubleshooting approach from guess-and-try to a scientific method. This is what made Linux a valid OS for enterprises, when instrumentation reached the same level as we got on Unix. There's a demand for it in enterprises: for example, EDB Advanced Server implemented timed wait events. I'm sure having it in open source postgres will help to understand the performance issues encountered by users, then helping to improve the database. Profiling where the database time is spent should not be reserved to commercial databases. Having the source code visible is not sufficient to understand what happens in production. Observability should also be there.

There is a fear in the postgres community that features are implemented just because they exist in oracle, and mentioning oracle is often seen suspicious. Probably because of the risk of adding complexity for no user value. Here, about instrumentation, I think that looking at what Oracle did during 20 years is a good start. Because instrumentation is not an easy task. Some waits are too short to have meaningful timing (the timing itself may take more cpu cycles than the instrumentation itself). Some tasks are critical to be measured. Looking at what Oracle Support implemented in order to solve big customer problems can give a good basis. Of course, all this must be adapted for postgres. For example, a write system call may be a logical or physical write because there's no direct I/O. At least, a precise timing, aggregated to histograms, will help to distinguish which writes were filesystem hits, or storage cache hits, or went to disk. And on the most common platform, the overhead is minimal because getting the timestamp can be done in userspace.

Today, Linux has many tools that were not there when Oracle had to implement wait events. And people may think the Linux tools are sufficient today. However, getting system call time is not easy in production (strace must attach to the process) and other tools (perf) are only sampling: gives an idea but hides the details. Unfortunately, what we have from the OS gives interesting clues (for guess and try) but not enough facts (for scientific approach).

So the proposal is great, but there is also the risk of putting a large effort in describing the specification and maybe a patch, and that it is rejected. It should probably be discussed in the -hackers list (https://www.postgresql.org/list/pgsql-hackers/) first. And people will dislike it because it mentions Oracle. Or people will dislike it because they think this should be reserved to commercial forks. Or because it may introduce too much dependency on the OS. But some others will see the value of it. Discussions are good as long as they stay focused on the value of the community project. I don't have skills to contribute to the code, but will be happy to expose the need for this instrumentation (profiling time spent in database functions or system calls) as I have many examples for it.


Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

От
Laurenz Albe
Дата:
On Mon, 2021-10-11 at 00:09 +0200, Franck Pachot wrote:
> And people will dislike it because it mentions Oracle.

I don't think so.
While "Oracle has it" is not a good enough reason for a feature, it
is certainly no counter-indication.

> Or people will dislike it because they think this should be reserved to commercial forks.

That is conceivable, but I think most vendors would prefer to have
that in standard PostgreSQL rather than having to maintain it on
their own.

> Or because it may introduce too much dependency on the OS.

That is possible.  But I think gettimeofday(2) is portable enough.

Yours,
Laurenz Albe