Обсуждение: bloated heapam.h

Поиск
Список
Период
Сортировка

bloated heapam.h

От
Alvaro Herrera
Дата:
Hi,

I noticed heapam.h is included in way too many places.  This is bad IMHO
because heapam.h itself includes a lot of other headers.

A lot of these are easy to fix; the source files just need to include
some other headers.  Standard cleanup, I don't think anybody would
object to that.  For example,

Index: src/backend/access/gin/ginvacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/gin/ginvacuum.c,v
retrieving revision 1.19
diff -c -p -r1.19 ginvacuum.c
*** src/backend/access/gin/ginvacuum.c  1 Jan 2008 19:45:46 -0000   1.19
--- src/backend/access/gin/ginvacuum.c  9 May 2008 18:44:31 -0000
***************
*** 15,24 **** #include "postgres.h" #include "access/genam.h" #include "access/gin.h"
- #include "access/heapam.h" #include "miscadmin.h" #include "storage/freespace.h"
! #include "storage/freespace.h" #include "commands/vacuum.h"  typedef struct
--- 15,23 ---- #include "postgres.h" #include "access/genam.h" #include "access/gin.h" #include "miscadmin.h" #include
"storage/freespace.h"
! #include "storage/lmgr.h" #include "commands/vacuum.h"  typedef struct


Others are more conflictive.  For example syncscan.c is keeping the
prototypes for its own functions on heapam.h.  Also pruneheap.c and
rewriteheap.c.  As a result, not only themselves need to include
heapam.h (without any other need for it), but they force some other
files into including heapam.h to get their prototypes.  I think this is
a mistake; I propose splitting those prototypes to their own files, and
#including those as appropriate.

Objections?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):

> Others are more conflictive.  For example syncscan.c is keeping the
> prototypes for its own functions on heapam.h.  Also pruneheap.c and
> rewriteheap.c.  As a result, not only themselves need to include
> heapam.h (without any other need for it), but they force some other
> files into including heapam.h to get their prototypes.  I think this is
> a mistake; I propose splitting those prototypes to their own files, and
> #including those as appropriate.
> 
> Objections?

I have similar thing in my TODO list. See my patch from March commit fest and 
discussion. I need solve two main issues - remove postgres.h from binaries and 
keep history of structures (for pg_upgrade project).

My idea is split structures and functions in separate header files.
    Zdenek

http://archives.postgresql.org/pgsql-patches/2007-10/msg00197.php
http://archives.postgresql.org/pgsql-patches/2008-04/msg00149.php

    


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):
>
>> Others are more conflictive.  For example syncscan.c is keeping the
>> prototypes for its own functions on heapam.h.  Also pruneheap.c and
>> rewriteheap.c.  As a result, not only themselves need to include
>> heapam.h (without any other need for it), but they force some other
>> files into including heapam.h to get their prototypes.  I think this is
>> a mistake; I propose splitting those prototypes to their own files, and
>> #including those as appropriate.
>>
>> Objections?
>
> I have similar thing in my TODO list. See my patch from March commit fest 
> and discussion. I need solve two main issues - remove postgres.h from 
> binaries and keep history of structures (for pg_upgrade project).

Yeah, I remember that.  Is there any progress on that front?

BTW I noticed that I was a bit careless in the description.  rewriteheap
already has its own rewriteheap.h file; and there's no point at all in
separating pruneheap.c declarations into another file.

The one that makes a bit more sense is a new syncscan.h.  And there are
a lot of things in heapam.h that actually correspond to tuple
manipulation (heap_form_tuple and so on), so perhaps a new header file
would be appropriate, but there's already htup.h which contains
tuple-related stuff.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: bloated heapam.h

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The one that makes a bit more sense is a new syncscan.h.  And there are
> a lot of things in heapam.h that actually correspond to tuple
> manipulation (heap_form_tuple and so on), so perhaps a new header file
> would be appropriate, but there's already htup.h which contains
> tuple-related stuff.

After actually looking at the header a bit ...

+1 for moving fastgetattr, heap_getattr, and the heaptuple.c functions
to htup.h.  I don't see any big gain from relocating the other stuff;
it seems to largely all use about the same set of typedefs.

It looks to me actually that a large part of your complaint is that
heapam.h #includes more than it has to.  Have you tried just cutting its
#include list to the minimum needed to compile its function declarations?
Likely this would force more #includes in .c files but I don't object
to that.  (I might be wrong, but I believe that Bruce's script for
removing "unnecessary" #includes is not bright enough to make such
tradeoffs, so it'd let bloated #include lists in headers survive.)
        regards, tom lane


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> +1 for moving fastgetattr, heap_getattr, and the heaptuple.c functions
> to htup.h.  I don't see any big gain from relocating the other stuff;
> it seems to largely all use about the same set of typedefs.

Ultimately that was my conclusion too.  I tried moving the heaptuple.c
functions to htup.h, but hit the problem that pg_dump.c wants to include
that file.  It then fails to compile because it doesn't know Datum
(defined in postgres.h, so unavailable to frontend programs).  I worked
around that by enclosing the prototypes in #ifndef FRONTEND, but that
seems ugly.

Apparently the reason for pg_dump.c to need htup.h is getAttrName which
needs system columns' attribute numbers.  Of course, the first thing
that comes to mind is that we should fix pg_dump to not require that
header in the first place.  Perhaps we can get the names by querying the
server, at the same time we get the user column names.  (Apparently this
is only used to dump unique indexes on system columns.)

The easiest actual solution, however, seems to be to move those
attribute numbers to a separate header, say access/sysattrs.h.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: bloated heapam.h

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Apparently the reason for pg_dump.c to need htup.h is getAttrName which
> needs system columns' attribute numbers.  Of course, the first thing
> that comes to mind is that we should fix pg_dump to not require that
> header in the first place.  Perhaps we can get the names by querying the
> server, at the same time we get the user column names.  (Apparently this
> is only used to dump unique indexes on system columns.)

> The easiest actual solution, however, seems to be to move those
> attribute numbers to a separate header, say access/sysattrs.h.

Yeah, probably.  There is certainly no reason for a frontend program
to be dealing in Datum; but it isn't unreasonable for it to know
the system column numbers, since it can see those in the catalogs.

I'm sure we have work in front of us to get these things separated
out a bit better.
        regards, tom lane


Re: bloated heapam.h

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> So here's a patch (includes the #ifndef FRONTEND hack in htup.h.)

I like this except for the #ifndef FRONTEND hack --- you're going to
need to fix that before applying.  I'm good with doing that by pushing
the system attribute numbers into a separate header.

BTW, you didn't compress the patch, which likely explains why it didn't
show up on -hackers.  If you want to post a shorter form, I think that
the diffs in the header files are the only interesting part; the ensuing
additions in .c files are just mechanical.
        regards, tom lane


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > So here's a patch (includes the #ifndef FRONTEND hack in htup.h.)
> 
> I like this except for the #ifndef FRONTEND hack --- you're going to
> need to fix that before applying.  I'm good with doing that by pushing
> the system attribute numbers into a separate header.

Committed with the attribute numbers moved to access/sysattr.h, which is
what pg_dump now uses.

Thanks.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> > > So here's a patch (includes the #ifndef FRONTEND hack in htup.h.)
> > 
> > I like this except for the #ifndef FRONTEND hack --- you're going to
> > need to fix that before applying.  I'm good with doing that by pushing
> > the system attribute numbers into a separate header.
> 
> Committed with the attribute numbers moved to access/sysattr.h, which is
> what pg_dump now uses.

Oops :-(  I just noticed that I removed bufmgr.h from bufpage.h, which
is a change you had objected to previously :-(

http://archives.postgresql.org/pgsql-patches/2008-04/msg00149.php

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> Oops :-(  I just noticed that I removed bufmgr.h from bufpage.h, which
> is a change you had objected to previously :-(

However, it seems the right fix is to move BufferGetPageSize and
BufferGetPage from bufpage.h to bufmgr.h.

(Digging further, it seems like bufpage.h should also include transam.h
in order to get TransactionIdIsNormal ... I start to wonder how many
problems of this nature we have on our headers.  Without having a way to
detect whether the defined macros are valid, it seems hard to check
programatically, however.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Oops :-(  I just noticed that I removed bufmgr.h from bufpage.h, which
> is a change you had objected to previously :-(
> http://archives.postgresql.org/pgsql-patches/2008-04/msg00149.php

Hmm.  I did notice that the patch seemed to need to add bufmgr.h
inclusions to an awful lot of files, but I'd forgotten the argument
about the bufpage.h macros needing it.  Do you want to undo that
aspect of it?
        regards, tom lane


Re: bloated heapam.h

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> However, it seems the right fix is to move BufferGetPageSize and
> BufferGetPage from bufpage.h to bufmgr.h.

That sounds sane if it fixes the problem.

> (Digging further, it seems like bufpage.h should also include transam.h
> in order to get TransactionIdIsNormal ... I start to wonder how many
> problems of this nature we have on our headers.  Without having a way to
> detect whether the defined macros are valid, it seems hard to check
> programatically, however.)

Yeah, I'm certain there's some other problems of the same kind, but I
don't see any easy way to find 'em.
        regards, tom lane


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > However, it seems the right fix is to move BufferGetPageSize and
> > BufferGetPage from bufpage.h to bufmgr.h.
> 
> That sounds sane if it fixes the problem.

Actually it's not, because bufmgr.h does not include bufpage.h, and it
knows nothing about "pages".

However, I think most additions of bufmgr.h to source files are not all
that bogus -- a lot of the warnings I got were about
BufferAccessStrategyType.  I'm still checking what's the best way to
deal with this, and if I can't find anything else I'll re-add bufmgr.h
to bufpage.h.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
>> Oops :-(  I just noticed that I removed bufmgr.h from bufpage.h, which
>> is a change you had objected to previously :-(
>
> However, it seems the right fix is to move BufferGetPageSize and
> BufferGetPage from bufpage.h to bufmgr.h.

+1 It makes more sense.

> (Digging further, it seems like bufpage.h should also include transam.h
> in order to get TransactionIdIsNormal ... I start to wonder how many
> problems of this nature we have on our headers.  Without having a way to
> detect whether the defined macros are valid, it seems hard to check
> programatically, however.)
>

I attached script which should check it. In first step it runs C preprocessor on
each header (postgres.h is included as well). The output from first step is
processed again with preprocessor and define.h is included. Define.h contains
"all" used macros in following format:

#define SIGABRT "NOT_EXPANDED_SIGABRT"

Main problem is how to generate define.h. I used following magic:

grep "^#define" `find . -name "*.h"` | cut -d" "  -f 2 | cut -f 1 | cut -f 1 -d"("

but it generates some noise as well. Maybe some Perl or AWK magic should be better.

        Zdenek

Вложения

Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Oops :-(  I just noticed that I removed bufmgr.h from bufpage.h, which
> > is a change you had objected to previously :-(
> > http://archives.postgresql.org/pgsql-patches/2008-04/msg00149.php
> 
> Hmm.  I did notice that the patch seemed to need to add bufmgr.h
> inclusions to an awful lot of files, but I'd forgotten the argument
> about the bufpage.h macros needing it.  Do you want to undo that
> aspect of it?

Done -- I put back bufmgr.h into bufpage.h.  Surely there is a better
structure for this, perhaps involving more header files, but I don't
have time for that ATM.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera wrote:

>> (Digging further, it seems like bufpage.h should also include transam.h
>> in order to get TransactionIdIsNormal ... I start to wonder how many
>> problems of this nature we have on our headers.  Without having a way to
>> detect whether the defined macros are valid, it seems hard to check
>> programatically, however.)
>
> I attached script which should check it. In first step it runs C 
> preprocessor on each header (postgres.h is included as well). The output 
> from first step is processed again with preprocessor and define.h is 
> included. Define.h contains "all" used macros in following format:
>
> #define SIGABRT "NOT_EXPANDED_SIGABRT"
>
> Main problem is how to generate define.h. I used following magic:
>
> grep "^#define" `find . -name "*.h"` | cut -d" "  -f 2 | cut -f 1 | cut -f 1 -d"("
>
> but it generates some noise as well. Maybe some Perl or AWK magic should be better.

So were you able to detect anything bogus with this technique?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> Alvaro Herrera wrote:
> 
>>> (Digging further, it seems like bufpage.h should also include transam.h
>>> in order to get TransactionIdIsNormal ... I start to wonder how many
>>> problems of this nature we have on our headers.  Without having a way to
>>> detect whether the defined macros are valid, it seems hard to check
>>> programatically, however.)
>> I attached script which should check it. In first step it runs C 
>> preprocessor on each header (postgres.h is included as well). The output 
>> from first step is processed again with preprocessor and define.h is 
>> included. Define.h contains "all" used macros in following format:
>>
>> #define SIGABRT "NOT_EXPANDED_SIGABRT"
>>
>> Main problem is how to generate define.h. I used following magic:
>>
>> grep "^#define" `find . -name "*.h"` | cut -d" "  -f 2 | cut -f 1 | cut -f 1 -d"("
>>
>> but it generates some noise as well. Maybe some Perl or AWK magic should be better.
> 
> So were you able to detect anything bogus with this technique?
> 

No, everything looks OK.
    Zdenek


Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):
>> Zdenek Kotala wrote:

>>> I attached script which should check it. In first step it runs C  
>>> preprocessor on each header (postgres.h is included as well). The 
>>> output from first step is processed again with preprocessor and 
>>> define.h is included.

>> So were you able to detect anything bogus with this technique?
>
> No, everything looks OK.

Well, then it is not working, because transam.h is missing from
bufpage.h ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> Alvaro Herrera napsal(a):
>>> Zdenek Kotala wrote:
> 
>>>> I attached script which should check it. In first step it runs C  
>>>> preprocessor on each header (postgres.h is included as well). The 
>>>> output from first step is processed again with preprocessor and 
>>>> define.h is included.
> 
>>> So were you able to detect anything bogus with this technique?
>> No, everything looks OK.
> 
> Well, then it is not working, because transam.h is missing from
> bufpage.h ...
> 

It tried catch problems with defines not with datatypes. If you mean that 
TransactionID is not defined.

I try to play with lint now if it gets better results.
    Zdenek





Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):

>> Well, then it is not working, because transam.h is missing from
>> bufpage.h ...
>
> It tried catch problems with defines not with datatypes. If you mean that 
> TransactionID is not defined.

No, I mean that TransactionIdIsNormal() is used in PageSetPrunable().

> I try to play with lint now if it gets better results.

Ok, good.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):

>> I try to play with lint now if it gets better results.
>
> Ok, good.

Hmm, It generates a lot of unnecessary includes in *.c files. I have checked
only couple of them and it seems that they are really unnecessary. A attach
output. Unfortunately, it does not detect missing heapam.h from bufpage.h.
However, I have not tested all magic switches yet :-).  There are also several
reports about system headers file, but it could be platform specific warning.

    Zdenek

Вложения

Re: bloated heapam.h

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):
>
>>> I try to play with lint now if it gets better results.
>>
>> Ok, good.
>
> Hmm, It generates a lot of unnecessary includes in *.c files. I have 
> checked only couple of them and it seems that they are really 
> unnecessary. A attach output. Unfortunately, it does not detect missing 
> heapam.h from bufpage.h. However, I have not tested all magic switches 
> yet :-).  There are also several reports about system headers file, but 
> it could be platform specific warning.

Interesting.  It seems that Bruce's script could be replaced by lint.
Please share the switches you used.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: bloated heapam.h

От
Zdenek Kotala
Дата:
Alvaro Herrera wrote:
> Zdenek Kotala wrote:
>> Alvaro Herrera napsal(a):
>>
>>>> I try to play with lint now if it gets better results.
>>> Ok, good.
>> Hmm, It generates a lot of unnecessary includes in *.c files. I have 
>> checked only couple of them and it seems that they are really 
>> unnecessary. A attach output. Unfortunately, it does not detect missing 
>> heapam.h from bufpage.h. However, I have not tested all magic switches 
>> yet :-).  There are also several reports about system headers file, but 
>> it could be platform specific warning.
> 
> Interesting.  It seems that Bruce's script could be replaced by lint.
> Please share the switches you used.
> 

I used following switches which only shows unused included file. I run command 
in backend directory.

lint -I ../include -errtags -errhdr=%all -Ncheck=%none -Nlevel=1 -erroff=%all 
-erroff=no%E_INCL_NUSD -c  `find . -name "*.c"` > include.out

Good to mention that I use lint from Sun Studio 12.

    Zdenek