Обсуждение: RE: [HACKERS] Hi

Поиск
Список
Период
Сортировка

RE: [HACKERS] Hi

От
"Meskes, Michael"
Дата:
Okay, here's my idea of how I will get into it:

1) Try to get my apps going. That is I'll try to port my ORACLE
applications to postgresql to have some programs to do testing.
Therefore I need:
2) Add Linus' ecpg precompiler for embedded SQL to the dist.
3) Try to implement some speed-ups I encounter while playing with my
apps.

I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?

Michael

--
Dr. Michael Meskes, Project-Manager    | topsystem Systemhaus GmbH
meskes@topsystem.de                    | Europark A2, Adenauerstr. 20
meskes@debian.org                      | 52146 Wuerselen
Go SF49ers! Go Rhein Fire!             | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux!                  | Fax: (+49) 2405/4670-10

> -----Original Message-----
> From:    Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
> Sent:    Wednesday, February 04, 1998 6:10 PM
> To:    meskes@topsystem.de
> Cc:    pgsql-hackers@postgreSQL.org
> Subject:    Re: [HACKERS] Hi
>
> >
> > Hi,
> >
> > despite bein chronically short of time I'd like to be involved with
> some
> > database development in free software. It seems to me that
> postgresql is
> > the most complete package out there and it also has some nice
> features
> > like object identity. I consider myself fairly knowledgable when it
> > comes to databases (at least it brought me my Ph.D.) so I think I
> could
> > do some. But before I do so I'd like to ask some questions about the
> > status of development:
> >
>
> > 3) How would you describe the general goal of this project: Offer as
> > good a database system as you can create as free software? That is
> > should it be able to handle huge amounts of data, or is it seen more
> as
> > a tool for a private person/small business?
>
> About a year ago, our goal was to make it more reliable.  At this
> point,
> our goal is SQL compliance, speed, and features.  See the TODO list in
> the distribution or on our web page for an idea of our open items.
>
> > 4) Are there areas/Which ares need help?
>
> What we really need are more people who can get a handle on the entire
> source tree, so we can implement some of the more complex SQL
> features,
> or improve some of the older code.
>
> --
> Bruce Momjian
> maillist@candle.pha.pa.us

Re: [HACKERS] Hi

От
Bruce Momjian
Дата:
>
> Okay, here's my idea of how I will get into it:
>
> 1) Try to get my apps going. That is I'll try to port my ORACLE
> applications to postgresql to have some programs to do testing.
> Therefore I need:
> 2) Add Linus' ecpg precompiler for embedded SQL to the dist.
> 3) Try to implement some speed-ups I encounter while playing with my
> apps.
>
> I take it some of you have already run the backend against a profiler.
> Could anyone please tell me which functions need works most?
>

I have already hit many of the speedups people have reported in the
profiler by changing the logic, or inlining part/all of the function.  I
recommend running your own profile and see what it shows.

--
Bruce Momjian
maillist@candle.pha.pa.us

Profile of current backend

От
Mattias Kregert
Дата:
Meskes, Michael wrote:
>
> I take it some of you have already run the backend against a profiler.
> Could anyone please tell me which functions need works most?

SYSTEM: Linux 2.0.29, gcc 2.7.2
-------------------------------
[make distclean]
[CVSup'ed current source 1998-02-05]
[./configure]
[make]
  [CUSTOM_LDFLAGS not settable in Makefile.custom ...]
  [... setting LDFLAGS=-pg ins Makefile.global]

  --> fmgr.c: In function `fmgr_pl':
  --> fmgr.c:46: `fmgr_pl_finfo' undeclared (first use this function)
  [fix] cp backend/utils/fmgr.h backend/utils/fmgr/fmgr.h

[make install]
[Run initdb]
[Regression tests]
  strings .. failed [ok]
  lseg .. failed [new operator <= on lseg]
  horology .. failed [daylight savings error?]
  triggers .. failed
  select_views .. failed


PROFILE OF REGRESSION TEST:
---------------------------
 time   seconds   seconds    calls  ms/call  ms/call  name
 35.16      7.97     7.97                             mcount (profiler overhead)
  5.91      9.31     1.34     9924     0.14     0.20  heapgettup
  4.54     10.34     1.03   234597     0.00     0.01  hash_search
  2.12     10.82     0.48   151781     0.00     0.00  SpinAcquire
  2.03     11.28     0.46    46635     0.01     0.02  SearchSysCache
  1.63     11.65     0.37   171345     0.00     0.00  tag_hash
  1.46     11.98     0.33    20511     0.02     0.02  yylex
  1.41     12.30     0.32    31311     0.01     0.03  LockAcquire
  1.28     12.59     0.29     1522     0.19     0.61  yyparse
  1.24     12.87     0.28    41176     0.01     0.01  fmgr_isbuiltin
  1.01     13.10     0.23   201572     0.00     0.00  AllocSetAlloc
  0.84     13.29     0.19   186214     0.00     0.00  OrderedElemPop
  0.75     13.46     0.17    42896     0.00     0.00  nocachegetattr
  0.71     13.62     0.16    29869     0.01     0.04  LockRelease
  0.66     13.77     0.15   151781     0.00     0.00  SpinRelease
  0.66     13.92     0.15    88766     0.00     0.01  newNode
  0.66     14.07     0.15    52248     0.00     0.00  _bt_compare
... around 1850 functions follows (0.66 - 0.0%)

heapgettup (called by, calling):
--------------------------------
index % time    self  children    called     name
                0.01    0.01      96/9924        heap_markpos [448]
                0.21    0.11    1566/9924        heap_restrpos [102]
                1.12    0.58    8262/9924        heap_getnext [20]
[14]    13.8    1.34    0.69    9924         heapgettup [14]
                0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
                0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
                0.10    0.00   26276/42896       nocachegetattr [158]
                0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
                0.04    0.00  117785/126582      char16eq [339]
                0.02    0.00  111941/111994      int4eq [427]
                0.02    0.00  109647/112329      nameeq [426]
                0.00    0.01    1770/31585       ReleaseBuffer [134]
                0.01    0.00   14000/50173       PageGetMaxOffsetNumber [341]
                0.01    0.00    5185/5433        chareq [537]
                0.01    0.00   15566/90147       BufferGetBlock [311]
                0.01    0.00   17336/240911      BufferIsValid [207]
                0.00    0.00       4/1973        int2eq [575]
                0.00    0.00    7412/7412        nextpage [1186]
                0.00    0.00     951/1358        SetBufferCommitInfoNeedsSave [1253]
--------------------------------

The *complete* gprof output is here:
  ftp://postgresql.org/pub/incoming/regression-profile-980205.gz 156k

/* m */

Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
Interesting.  Nothing is jumping out at me.  Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.

None of the calls looks like it should be inlined.  Do you see any that
look good for inlining?


>
> Meskes, Michael wrote:
> >
> > I take it some of you have already run the backend against a profiler.
> > Could anyone please tell me which functions need works most?
>
> SYSTEM: Linux 2.0.29, gcc 2.7.2
> -------------------------------
> [make distclean]
> [CVSup'ed current source 1998-02-05]
> [./configure]
> [make]
>   [CUSTOM_LDFLAGS not settable in Makefile.custom ...]
>   [... setting LDFLAGS=-pg ins Makefile.global]
>
>   --> fmgr.c: In function `fmgr_pl':
>   --> fmgr.c:46: `fmgr_pl_finfo' undeclared (first use this function)
>   [fix] cp backend/utils/fmgr.h backend/utils/fmgr/fmgr.h
>
> [make install]
> [Run initdb]
> [Regression tests]
>   strings .. failed [ok]
>   lseg .. failed [new operator <= on lseg]
>   horology .. failed [daylight savings error?]
>   triggers .. failed
>   select_views .. failed
>
>
> PROFILE OF REGRESSION TEST:
> ---------------------------
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  35.16      7.97     7.97                             mcount (profiler overhead)
>   5.91      9.31     1.34     9924     0.14     0.20  heapgettup
>   4.54     10.34     1.03   234597     0.00     0.01  hash_search
>   2.12     10.82     0.48   151781     0.00     0.00  SpinAcquire
>   2.03     11.28     0.46    46635     0.01     0.02  SearchSysCache
>   1.63     11.65     0.37   171345     0.00     0.00  tag_hash
>   1.46     11.98     0.33    20511     0.02     0.02  yylex
>   1.41     12.30     0.32    31311     0.01     0.03  LockAcquire
>   1.28     12.59     0.29     1522     0.19     0.61  yyparse
>   1.24     12.87     0.28    41176     0.01     0.01  fmgr_isbuiltin
>   1.01     13.10     0.23   201572     0.00     0.00  AllocSetAlloc
>   0.84     13.29     0.19   186214     0.00     0.00  OrderedElemPop
>   0.75     13.46     0.17    42896     0.00     0.00  nocachegetattr
>   0.71     13.62     0.16    29869     0.01     0.04  LockRelease
>   0.66     13.77     0.15   151781     0.00     0.00  SpinRelease
>   0.66     13.92     0.15    88766     0.00     0.01  newNode
>   0.66     14.07     0.15    52248     0.00     0.00  _bt_compare
> ... around 1850 functions follows (0.66 - 0.0%)
>
> heapgettup (called by, calling):
> --------------------------------
> index % time    self  children    called     name
>                 0.01    0.01      96/9924        heap_markpos [448]
>                 0.21    0.11    1566/9924        heap_restrpos [102]
>                 1.12    0.58    8262/9924        heap_getnext [20]
> [14]    13.8    1.34    0.69    9924         heapgettup [14]
>                 0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>                 0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>                 0.10    0.00   26276/42896       nocachegetattr [158]
>                 0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>                 0.04    0.00  117785/126582      char16eq [339]
>                 0.02    0.00  111941/111994      int4eq [427]
>                 0.02    0.00  109647/112329      nameeq [426]
>                 0.00    0.01    1770/31585       ReleaseBuffer [134]
>                 0.01    0.00   14000/50173       PageGetMaxOffsetNumber [341]
>                 0.01    0.00    5185/5433        chareq [537]
>                 0.01    0.00   15566/90147       BufferGetBlock [311]
>                 0.01    0.00   17336/240911      BufferIsValid [207]
>                 0.00    0.00       4/1973        int2eq [575]
>                 0.00    0.00    7412/7412        nextpage [1186]
>                 0.00    0.00     951/1358        SetBufferCommitInfoNeedsSave [1253]
> --------------------------------
>
> The *complete* gprof output is here:
>   ftp://postgresql.org/pub/incoming/regression-profile-980205.gz 156k
>
> /* m */
>
>


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profile of current backend

От
Michael Meskes
Дата:
Bruce Momjian writes:
> Interesting.  Nothing is jumping out at me.  Looks like we could try to
> clean up heapgettup() to see if there is anything in there that can be
> speeded up.
>
> None of the calls looks like it should be inlined.  Do you see any that
> look good for inlining?

Not exactly. But to get my application running we have to do something to
speed it up. This morning I started my program on a database with 15165
records in one table and something like 100 in a second plus 2 in a third.
Then my software tries to add records to the first table and for each record
added it updates a record in the other two. This doesn't count the select
etc.

Anyway, the data isn't processed fast enough. I tried adding about 600
records which should be done (the time in which the data was send) in 5
minutes, but our system needed almost 8 minutes to insert the data. And this
doesn't cause Oracle too much trouble.

So I guess there's need for some speed-up. :-)

Michael

--
Dr. Michael Meskes, Project-Manager    | topsystem Systemhaus GmbH
meskes@topsystem.de                    | Europark A2, Adenauerstr. 20
meskes@debian.org                      | 52146 Wuerselen
Go SF49ers! Go Rhein Fire!             | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux!                  | Fax: (+49) 2405/4670-10

Re: [HACKERS] Profile of current backend

От
Mattias Kregert
Дата:
Bruce Momjian wrote:
>
> Interesting.  Nothing is jumping out at me.  Looks like we could try to
> clean up heapgettup() to see if there is anything in there that can be
> speeded up.
>
> None of the calls looks like it should be inlined.  Do you see any that
> look good for inlining?

ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().

- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]

- In heapgettup(), 50% is the func itself and 50% is called funcs.
  Top four CPU consumers:
    0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
    0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
    0.10    0.00   26276/42896       nocachegetattr [158]
    0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]

  RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.

- Looking at RelationGetBufferWithBuffer():
    0.00    0.10    4603/32354       ReadBuffer [55]
  ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)

  -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()

  -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
  -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
  -> Conclusion: INLINE BufferAlloc().

- Looking at BufferAlloc():
    0.04    0.25   37974/37974       BufTableLookup [114]
    0.10    0.00   32340/151781      SpinAcquire [81]
    0.10    0.00   37470/40585       PinBuffer [209]
    0.08    0.00   38478/43799       RelationGetLRelId [234]
    0.04    0.00   37974/151781      SpinRelease [175]

  -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
  -> BufferAlloc() is the only func calling BufTableLookup().
  -> Conclusion: INLINE BufTableLookup().

- Looking at BufTableLookup():
  86% of CPU time is in calling hash_search(). The rest is own time.

- Looking at hash_search():
    0.13    0.41  179189/179189      call_hash [69]
    0.00    0.00       6/6           bucket_alloc [1084]
  -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().

- Looking at call_hash():
    0.37    0.00  171345/171345      tag_hash [94]
    0.04    0.00    7844/7844        string_hash [348]
  -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
  -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
  -> Could we use a lookup table instead of doing hash calculations? Would not that
  ->  be much faster?


It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?


Guesstimate:
  This would speed up heapgettup() by 10% ???
  Other functions would speed up too.


/* m */

Re: [HACKERS] Profile of current backend

От
"Thomas G. Lockhart"
Дата:
> Not exactly. But to get my application running we have to do something to
> speed it up. This morning I started my program on a database with 15165
> records in one table and something like 100 in a second plus 2 in a third.
> Then my software tries to add records to the first table and for each record
> added it updates a record in the other two. This doesn't count the select
> etc.
>
> Anyway, the data isn't processed fast enough. I tried adding about 600
> records which should be done (the time in which the data was send) in 5
> minutes, but our system needed almost 8 minutes to insert the data. And this
> doesn't cause Oracle too much trouble.
>
> So I guess there's need for some speed-up. :-)

I (and others) had done some benchmarking on simple inserts (6 months ago?) and
had concluded that the speed was similar to other commercial systems (I was
comparing against Ingres). I recall getting ~50TPS.

This was all before Bruce did his work on startup and runtime speeds. You
really think your performance is that far off? You are doing selects on the big
table before inserting? Do you have indices set up?? Our results were for
inserts on a heap table, which has the least overhead...

                                                              - Tom


Re: [HACKERS] Profile of current backend

От
The Hermit Hacker
Дата:
On Fri, 6 Feb 1998, Thomas G. Lockhart wrote:

> > Not exactly. But to get my application running we have to do something to
> > speed it up. This morning I started my program on a database with 15165
> > records in one table and something like 100 in a second plus 2 in a third.
> > Then my software tries to add records to the first table and for each record
> > added it updates a record in the other two. This doesn't count the select
> > etc.
> >
> > Anyway, the data isn't processed fast enough. I tried adding about 600
> > records which should be done (the time in which the data was send) in 5
> > minutes, but our system needed almost 8 minutes to insert the data. And this
> > doesn't cause Oracle too much trouble.
> >
> > So I guess there's need for some speed-up. :-)
>
> I (and others) had done some benchmarking on simple inserts (6 months ago?) and
> had concluded that the speed was similar to other commercial systems (I was
> comparing against Ingres). I recall getting ~50TPS.
>
> This was all before Bruce did his work on startup and runtime speeds. You
> really think your performance is that far off? You are doing selects on the big
> table before inserting? Do you have indices set up?? Our results were for
> inserts on a heap table, which has the least overhead...

    Just curious, but do you have -F set to disable fsync()?  We
really really should disable that by default :(



Re: [HACKERS] Profile of current backend

От
"Vadim B. Mikheev"
Дата:
The Hermit Hacker wrote:
>
> On Fri, 6 Feb 1998, Thomas G. Lockhart wrote:
>
> > > Not exactly. But to get my application running we have to do something to
> > > speed it up. This morning I started my program on a database with 15165
> > > records in one table and something like 100 in a second plus 2 in a third.
> > > Then my software tries to add records to the first table and for each record
> > > added it updates a record in the other two. This doesn't count the select
> > > etc.
> > >
> > > Anyway, the data isn't processed fast enough. I tried adding about 600
> > > records which should be done (the time in which the data was send) in 5
> > > minutes, but our system needed almost 8 minutes to insert the data. And this
> > > doesn't cause Oracle too much trouble.
> > >
> > > So I guess there's need for some speed-up. :-)
> >
> > I (and others) had done some benchmarking on simple inserts (6 months ago?) and
> > had concluded that the speed was similar to other commercial systems (I was
> > comparing against Ingres). I recall getting ~50TPS.
> >
> > This was all before Bruce did his work on startup and runtime speeds. You
> > really think your performance is that far off? You are doing selects on the big
> > table before inserting? Do you have indices set up?? Our results were for
> > inserts on a heap table, which has the least overhead...

And did you use BEGIN/END ?
Auto-commit is ON in postgres...

>
>         Just curious, but do you have -F set to disable fsync()?  We
> really really should disable that by default :(
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oh no!

Vadim

Re: [HACKERS] Profile of current backend

От
The Hermit Hacker
Дата:
On Fri, 6 Feb 1998, Vadim B. Mikheev wrote:

> >         Just curious, but do you have -F set to disable fsync()?  We
> > really really should disable that by default :(
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Oh no!

    Isn't that our standard recommended operating parameter anyway?
Oops, that's just our Performance improving recommendation...sorry...:)



Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
>
> Bruce Momjian wrote:
> >
> > Interesting.  Nothing is jumping out at me.  Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined.  Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().

This is certainly the type of analysis that will help us.  I disagree on
inlining everything, because the code maintenance becomes a nightmare,
but we certainly could improve things.

Currently, if a function is declared static, is it only called from
within that file.  If there is only one call to that function in the
file, it could be inlined.  I suggest we only attack things like the
below where we have a large amount of function call overhead.  Also,
there are cases where inlining the 'top' of the function can be really
good.  For heap_getattr(), many cases were handled in the top that just
did a quick return.  In those cases, I took the top and made a macro out
of it, and only called the function if I needed real processing.

This is a great help, and any patches would be good.  The other nice
thing about this is that it is pretty compartementalized.  You just need
to check that the new code behaves like the old code, and you are safe.

This is great analysis of the problem, too.

>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]

Sure.  Just comment it so it is clear for readers, so it is not pulled
out into a separate function some day.





>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
>   Top four CPU consumers:
>     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>     0.10    0.00   26276/42896       nocachegetattr [158]
>     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>
>   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.

OK.

>
> - Looking at RelationGetBufferWithBuffer():
>     0.00    0.10    4603/32354       ReadBuffer [55]
>   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
>   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
>   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
>   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
>   -> Conclusion: INLINE BufferAlloc().

My only warning is that I have never inlined a function, AND left
another copy of the function non-inlined.  The reason is that you are
left with two copies of the function, and that can get really confusing.
I don't know how others feel about that, but the code is already so
complicated, I hesitate to start duplicating blocks of code.  Of course,
if you make it a macro, you have it in one place, and the preprocessor
is duplicating it for you.

I have converted a function to a macro, and I have inlined
frequently-called functions.


>
> - Looking at BufferAlloc():
>     0.04    0.25   37974/37974       BufTableLookup [114]
>     0.10    0.00   32340/151781      SpinAcquire [81]
>     0.10    0.00   37470/40585       PinBuffer [209]
>     0.08    0.00   38478/43799       RelationGetLRelId [234]
>     0.04    0.00   37974/151781      SpinRelease [175]
>
>   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
>   -> BufferAlloc() is the only func calling BufTableLookup().
>   -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
>   86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
>     0.13    0.41  179189/179189      call_hash [69]
>     0.00    0.00       6/6           bucket_alloc [1084]
>   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
>     0.37    0.00  171345/171345      tag_hash [94]
>     0.04    0.00    7844/7844        string_hash [348]
>   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
>   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
>   -> Could we use a lookup table instead of doing hash calculations? Would not that
>   ->  be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
>   This would speed up heapgettup() by 10% ???
>   Other functions would speed up too.

Makes sense.  If the functions are in the same source file, and are
declared static, doesn't the compiler inline some of them.

See my heap_getattr() macro for a way perhaps to macro-ize some of
these.  The macro looks much clearer than the old style macros.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
> > I (and others) had done some benchmarking on simple inserts (6 months ago?) and
> > had concluded that the speed was similar to other commercial systems (I was
> > comparing against Ingres). I recall getting ~50TPS.
> >
> > This was all before Bruce did his work on startup and runtime speeds. You
> > really think your performance is that far off? You are doing selects on the big
> > table before inserting? Do you have indices set up?? Our results were for
> > inserts on a heap table, which has the least overhead...
>
>     Just curious, but do you have -F set to disable fsync()?  We
> really really should disable that by default :(

Agreed.  But maybe we should wait until we get pg_log syncing so we have
30-second reliability.  Don't know where that fits on Vadim's list.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profile of current backend

От
Michael Meskes
Дата:
The Hermit Hacker writes:
>     Just curious, but do you have -F set to disable fsync()?  We
> really really should disable that by default :(

I tried with -F and it runs nicely. No difference to see between PostgreSQL
and Oracle. I just ran another test which includes table creation, inserts
and drop (source follows). Here's the result:

Oracle 7.3.3.4.0:
I needed 21 seconds and -345682 microseconds for this test

PostgreSQL without -F:
I needed 152 seconds and -623545 microseconds for this test

PostgreSQL with -F:
I needed 5 seconds and 84411 microseconds for this test

Whow!

Here's the source (yes, our precompiler can handle this kind of program
already :-)):

#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>

exec sql include sqlca;

#define       SQLCODE    sqlca.sqlcode

void
db_error (char *msg)
{
    sqlca.sqlerrm.sqlerrmc[sqlca.sqlerrm.sqlerrml] = '\0';
    printf ("%s: db error %s\n", msg, sqlca.sqlerrm.sqlerrmc);
    exit (1);
}

int
main ()
{
exec sql begin declare section;
    long i;
exec sql end declare section;
    struct timeval tvs, tve;

    gettimeofday(&tvs, NULL);

    exec sql connect 'mm';
    if (SQLCODE)
        db_error ("connect");

    exec sql create table perftest(number int4, ascii char16);
    if (SQLCODE)
                db_error ("create t");

    exec sql create unique index number on perftest(number);
    if (SQLCODE)
                db_error ("create i");

    for (i = 0;i < 1407; i++)
    {
        exec sql begin declare section;
            char text[16];
        exec sql end declare section;

        sprintf(text, "%ld", i);
        exec sql insert into perftest(number, ascii) values (:i, :text);
        if (SQLCODE)
            db_error ("insert");

        exec sql commit;
        if (SQLCODE)
            db_error ("commit");
    }

    exec sql drop index number;
    if (SQLCODE)
        db_error ("drop i");

    exec sql drop table perftest;
    if (SQLCODE)
        db_error ("drop t");

    exec sql commit;
    if (SQLCODE)
        db_error ("commit");

    gettimeofday(&tve, NULL);

    printf("I needed %ld seconds and %ld microseconds for this test\n", tve.tv_sec - tvs.tv_sec, tve.tv_usec -
tvs.tv_usec);

    return (0);
}

Michael

--
Dr. Michael Meskes, Project-Manager    | topsystem Systemhaus GmbH
meskes@topsystem.de                    | Europark A2, Adenauerstr. 20
meskes@debian.org                      | 52146 Wuerselen
Go SF49ers! Go Rhein Fire!             | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux!                  | Fax: (+49) 2405/4670-10

Re: [HACKERS] Profile of current backend

От
"Thomas G. Lockhart"
Дата:
> Here's the source (yes, our precompiler can handle this kind of program
> already :-)):

Looks nice!

                                      - Tom


Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
>
> The Hermit Hacker writes:
> >     Just curious, but do you have -F set to disable fsync()?  We
> > really really should disable that by default :(
>
> I tried with -F and it runs nicely. No difference to see between PostgreSQL
> and Oracle. I just ran another test which includes table creation, inserts
> and drop (source follows). Here's the result:
>
> Oracle 7.3.3.4.0:
> I needed 21 seconds and -345682 microseconds for this test
>
> PostgreSQL without -F:
> I needed 152 seconds and -623545 microseconds for this test
>
> PostgreSQL with -F:
> I needed 5 seconds and 84411 microseconds for this test
>
> Whow!
>


This is good news.  Is this with 6.3 or 6.2.1?

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
Can we go anywhere with this?

>
> Bruce Momjian wrote:
> >
> > Interesting.  Nothing is jumping out at me.  Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined.  Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().
>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
>   Top four CPU consumers:
>     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>     0.10    0.00   26276/42896       nocachegetattr [158]
>     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>
>   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
>
> - Looking at RelationGetBufferWithBuffer():
>     0.00    0.10    4603/32354       ReadBuffer [55]
>   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
>   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
>   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
>   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
>   -> Conclusion: INLINE BufferAlloc().
>
> - Looking at BufferAlloc():
>     0.04    0.25   37974/37974       BufTableLookup [114]
>     0.10    0.00   32340/151781      SpinAcquire [81]
>     0.10    0.00   37470/40585       PinBuffer [209]
>     0.08    0.00   38478/43799       RelationGetLRelId [234]
>     0.04    0.00   37974/151781      SpinRelease [175]
>
>   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
>   -> BufferAlloc() is the only func calling BufTableLookup().
>   -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
>   86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
>     0.13    0.41  179189/179189      call_hash [69]
>     0.00    0.00       6/6           bucket_alloc [1084]
>   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
>     0.37    0.00  171345/171345      tag_hash [94]
>     0.04    0.00    7844/7844        string_hash [348]
>   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
>   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
>   -> Could we use a lookup table instead of doing hash calculations? Would not that
>   ->  be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
>   This would speed up heapgettup() by 10% ???
>   Other functions would speed up too.
>
>
> /* m */
>


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Regression test

От
Mattias Kregert
Дата:
In src/test/regress:
  it is time to remove the "PST8PDT" stuff from the files
  README and regress.sh

"set timezone to 'PST8PDT'" should be added to sql/datetime.sql

/* m */

Re: [HACKERS] Regression test

От
"Thomas G. Lockhart"
Дата:
>   it is time to remove the "PST8PDT" stuff from the files
>   README and regress.sh
>
> "set timezone to 'PST8PDT'" should be added to sql/datetime.sql

Hmm. It would have to be added in several regression tests, not just the
datetime one. Also, having it isolated to the client configuration in
regress.sh gives one the opportunity to fix it or adjust it for your
platform.

I'm not sure what the README says about it; that may be obsolete. Is the
PGTZ setting in regress.sh causing you trouble??

                      - Tom

Re: [HACKERS] Profile of current backend

От
Michael Meskes
Дата:
Bruce Momjian writes:
> Can we go anywhere with this?

Did anyone do anything with this already?

> > ExecScan() seems to be the only func which calls SeqNext(), which in
> > turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> > the calls to heapgettup().
> >
> > - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
> >
> > - In heapgettup(), 50% is the func itself and 50% is called funcs.
> >   Top four CPU consumers:
> >     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
> >     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
> >     0.10    0.00   26276/42896       nocachegetattr [158]
> >     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
> >
> >   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
> >
> > - Looking at RelationGetBufferWithBuffer():
> >     0.00    0.10    4603/32354       ReadBuffer [55]
> >   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
> >
> >   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
> >
> >   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
> >   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
> >   -> Conclusion: INLINE BufferAlloc().
> >
> > - Looking at BufferAlloc():
> >     0.04    0.25   37974/37974       BufTableLookup [114]
> >     0.10    0.00   32340/151781      SpinAcquire [81]
> >     0.10    0.00   37470/40585       PinBuffer [209]
> >     0.08    0.00   38478/43799       RelationGetLRelId [234]
> >     0.04    0.00   37974/151781      SpinRelease [175]
> >
> >   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
> >   -> BufferAlloc() is the only func calling BufTableLookup().
> >   -> Conclusion: INLINE BufTableLookup().
> >
> > - Looking at BufTableLookup():
> >   86% of CPU time is in calling hash_search(). The rest is own time.
> >
> > - Looking at hash_search():
> >     0.13    0.41  179189/179189      call_hash [69]
> >     0.00    0.00       6/6           bucket_alloc [1084]
> >   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
> >
> > - Looking at call_hash():
> >     0.37    0.00  171345/171345      tag_hash [94]
> >     0.04    0.00    7844/7844        string_hash [348]
> >   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
> >   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
> >   -> Could we use a lookup table instead of doing hash calculations? Would not that
> >   ->  be much faster?
> >
> >
> > It looks to me as if there are too many levels of function calls.
> > Perhaps all functions which are called by only one other func should be inlined?

Isn't this a good solution? A function only called by one other function has
its right to exist only for readability. And this optimization could be done
automatically.

Michael

--
Dr. Michael Meskes, Project-Manager    | topsystem Systemhaus GmbH
meskes@topsystem.de                    | Europark A2, Adenauerstr. 20
meskes@debian.org                      | 52146 Wuerselen
Go SF49ers! Go Rhein Fire!             | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux!                  | Fax: (+49) 2405/4670-10

Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
>
> Bruce Momjian writes:
> > Can we go anywhere with this?
>
> Did anyone do anything with this already?

No one yet.

>
> > > ExecScan() seems to be the only func which calls SeqNext(), which in
> > > turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> > > the calls to heapgettup().
> > >
> > > - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
> > >
> > > - In heapgettup(), 50% is the func itself and 50% is called funcs.
> > >   Top four CPU consumers:
> > >     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
> > >     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
> > >     0.10    0.00   26276/42896       nocachegetattr [158]
> > >     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
> > >
> > >   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
> > >
> > > - Looking at RelationGetBufferWithBuffer():
> > >     0.00    0.10    4603/32354       ReadBuffer [55]
> > >   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
> > >
> > >   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
> > >
> > >   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
> > >   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
> > >   -> Conclusion: INLINE BufferAlloc().
> > >
> > > - Looking at BufferAlloc():
> > >     0.04    0.25   37974/37974       BufTableLookup [114]
> > >     0.10    0.00   32340/151781      SpinAcquire [81]
> > >     0.10    0.00   37470/40585       PinBuffer [209]
> > >     0.08    0.00   38478/43799       RelationGetLRelId [234]
> > >     0.04    0.00   37974/151781      SpinRelease [175]
> > >
> > >   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
> > >   -> BufferAlloc() is the only func calling BufTableLookup().
> > >   -> Conclusion: INLINE BufTableLookup().
> > >
> > > - Looking at BufTableLookup():
> > >   86% of CPU time is in calling hash_search(). The rest is own time.
> > >
> > > - Looking at hash_search():
> > >     0.13    0.41  179189/179189      call_hash [69]
> > >     0.00    0.00       6/6           bucket_alloc [1084]
> > >   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
> > >
> > > - Looking at call_hash():
> > >     0.37    0.00  171345/171345      tag_hash [94]
> > >     0.04    0.00    7844/7844        string_hash [348]
> > >   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
> > >   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
> > >   -> Could we use a lookup table instead of doing hash calculations? Would not that
> > >   ->  be much faster?
> > >
> > >
> > > It looks to me as if there are too many levels of function calls.
> > > Perhaps all functions which are called by only one other func should be inlined?
>
> Isn't this a good solution? A function only called by one other function has
> its right to exist only for readability. And this optimization could be done
> automatically.

Wouldn't be such a big deal if it was not call so many times.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Profile of current backend

От
Bruce Momjian
Дата:
>
> Bruce Momjian wrote:
> >
> > Interesting.  Nothing is jumping out at me.  Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined.  Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().
>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
>   Top four CPU consumers:
>     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>     0.10    0.00   26276/42896       nocachegetattr [158]
>     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>
>   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
>
> - Looking at RelationGetBufferWithBuffer():
>     0.00    0.10    4603/32354       ReadBuffer [55]
>   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
>   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
>   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
>   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
>   -> Conclusion: INLINE BufferAlloc().
>
> - Looking at BufferAlloc():
>     0.04    0.25   37974/37974       BufTableLookup [114]
>     0.10    0.00   32340/151781      SpinAcquire [81]
>     0.10    0.00   37470/40585       PinBuffer [209]
>     0.08    0.00   38478/43799       RelationGetLRelId [234]
>     0.04    0.00   37974/151781      SpinRelease [175]
>
>   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
>   -> BufferAlloc() is the only func calling BufTableLookup().
>   -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
>   86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
>     0.13    0.41  179189/179189      call_hash [69]
>     0.00    0.00       6/6           bucket_alloc [1084]
>   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
>     0.37    0.00  171345/171345      tag_hash [94]
>     0.04    0.00    7844/7844        string_hash [348]
>   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
>   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
>   -> Could we use a lookup table instead of doing hash calculations? Would not that
>   ->  be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
>   This would speed up heapgettup() by 10% ???
>   Other functions would speed up too.

I have reviewed your findings, and have inlined 9 small functions that
are called for every row.  Unfortunately, several of the ones you
mention above are called via function pointers, so it was not possible
to inline them.

In my testing, I created a table with 160k rows, and did a SELECT that
would do a sequential scan of the table, and looked at the functions
that were called the most.

---------------------------------------------------------------------------

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 54.3       2.65     2.65                             mcount (472)
 16.2       3.44     0.79     2999     0.26     0.26  _read [18]
  3.5       3.61     0.17   160148     0.00     0.01  _heapgettup [15]
  2.5       3.73     0.12   174007     0.00     0.00  _SpinAcquire [36]
  2.3       3.84     0.11   981732     0.00     0.00  _BufferIsValid [37]
  2.3       3.95     0.11   160148     0.00     0.01  _heap_getnext [14]
  2.0       4.05     0.10   160096     0.00     0.00  _ExecEvalVar [38]
  2.0       4.15     0.10   160096     0.00     0.00  _ExecMakeFunctionResult <cycle 3> [26]
  1.4       4.22     0.07   174007     0.00     0.00  _SpinRelease [41]
  1.2       4.28     0.06    18176     0.00     0.00  _hash_search [40]
  1.0       4.33     0.05   480288     0.00     0.00  _ExecEvalExpr <cycle 3> [34]
  1.0       4.38     0.05   167313     0.00     0.00  _RelationGetLRelId [47]
  1.0       4.43     0.05   160096     0.00     0.00  _ExecEvalFuncArgs <cycle 3> [48]
  0.8       4.47     0.04   160097     0.00     0.01  _SeqNext [13]
  0.8       4.51     0.04   160096     0.00     0.00  _ExecQual [22]
  0.6       4.54     0.03   160096     0.00     0.00  _ExecEvalOper <cycle 3> [51]
  0.4       4.56     0.02   163015     0.00     0.00  _IncrBufferRefCount [61]
  0.4       4.58     0.02   161342     0.00     0.00  _fmgr_c <cycle 5> [29]
  0.4       4.60     0.02   160451     0.00     0.00  _TransactionIdEquals [62]
  0.4       4.62     0.02   160148     0.00     0.00  _RelationGetBufferWithBuffer [28]
  0.4       4.64     0.02   160098     0.00     0.00  _TupIsNull [63]
  0.4       4.66     0.02   160097     0.00     0.00  _ExecIncrSlotBufferRefcnt [45]
  0.4       4.68     0.02   160096     0.00     0.00  _ExecQualClause [23]
  0.4       4.70     0.02     6161     0.00     0.00  _strcpy [64]
  0.4       4.72     0.02       31     0.65     0.65  _open [65]
  0.2       4.73     0.01   164288     0.00     0.00  _ReleaseBuffer [81]
  0.2       4.74     0.01   164048     0.00     0.00  _PageGetMaxOffsetNumber [97]
  0.2       4.75     0.01   160450     0.00     0.00  _HeapTupleSatisfiesVisibility [50]
  0.2       4.76     0.01   160449     0.00     0.00  _HeapTupleSatisfiesNow [98]
  0.2       4.77     0.01   160100     0.00     0.00  _ExecClearTuple [49]
  0.2       4.78     0.01   160098     0.00     0.00  _ExecStoreTuple [46]
  0.2       4.79     0.01   160096     0.00     0.00  _bpchareq [102]
  0.2       4.80     0.01    15696     0.00     0.00  _call_hash [99]
  0.2       4.81     0.01    12187     0.00     0.00  _malloc [96]
  0.2       4.82     0.01     9182     0.00     0.00  _PortalHeapMemoryAlloc [73]
  0.2       4.83     0.01     7257     0.00     0.00  _newNode [54]
  0.2       4.84     0.01     2973     0.00     0.20  _ReleaseAndReadBuffer [25]
  0.2       4.85     0.01     1241     0.01     0.02  _LockAcquire [58]
  0.2       4.86     0.01      189     0.05     0.05  _IsBootstrapProcessingMode [103]
  0.2       4.87     0.01        3     3.33     3.33  _shmdt [101]
  0.2       4.88     0.01        1    10.00  2172.84  _ExecScan [11]
  0.0       4.88     0.00   166214     0.00     0.00  _BufferGetBlock [72]
  0.0       4.88     0.00   160449     0.00     0.00  _heapisoverride [477]
  0.0       4.88     0.00    15303     0.00     0.00  _tag_hash [478]
  0.0       4.88     0.00    11718     0.00     0.00  _AllocSetContains [479]
  0.0       4.88     0.00    11718     0.00     0.00  _OrderedSetContains [480]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPush [481]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPushHead [482]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPushInto [483]
  0.0       4.88     0.00    11361     0.00     0.00  _AllocSetAlloc [104]
  0.0       4.88     0.00    11361     0.00     0.00  _MemoryContextAlloc [67]
  0.0       4.88     0.00    11339     0.00     0.00  _palloc [68]
  0.0       4.88     0.00    10515     0.00     0.00  _free [484]
  0.0       4.88     0.00    10488     0.00     0.00  _AllocSetFree [485]
  0.0       4.88     0.00    10488     0.00     0.00  _OrderedElemPop [486]
  0.0       4.88     0.00     8176     0.00     0.00  _get_expr [487]
  0.0       4.88     0.00     7953     0.00     0.00  _AllocSetGetFirst [488]
  0.0       4.88     0.00     7953     0.00     0.00  _OrderedSetGetHead [489]
  0.0       4.88     0.00     7952     0.00     0.00  _OrderedElemGetBase [490]
  0.0       4.88     0.00     7424     0.00     0.00  _memmove [491]


---------------------------------------------------------------------------

Of these, I inlined:

    #define TransactionIdEquals(id1, id2) \
    #define TupIsNull(slot) \
    #define ExecIncrSlotBufferRefcnt(slot) \
    #define BufferIsValid(bufnum) \
    #define IncrBufferRefCount(buffer) \
    #define BufferGetBlock(buffer) \
    #define PageGetMaxOffsetNumber(page) \
    #define HeapTupleSatisfiesVisibility(tuple, seeself) \
    #define    heapisoverride() \

When I say linined, I mean I made macros of them.  I did not inline the
code into other functions, because you loose the modularity of the
function name, and many are called multiple times.

These were all small functions, only a few lines of actual code.
BufferIsValid was called several times per row.

With that said, the execution times did not change at all.  What I did
see is that the mcount row in gprof output:

 54.3       2.65     2.65                             mcount (472)

decreased from 2.65 to 1.79, which means that less time was spent in
function calls.  Not sure why that time did not translate to better
overall performance, but the inlining is a clear win for these small
functions.  mcount is artificially high anyway because it is doing the
profiling.  The resulting gprof output shows a much smaller list of
functions called 160k times.

A patch will be posted to the patches list, for any of you who subscribe
to that.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)