Обсуждение: generating catcache control data

Поиск
Список
Период
Сортировка

generating catcache control data

От
John Naylor
Дата:
Hi,

While digging through the archives, I found a thread from a couple
years back about syscache performance. There was an idea [1] to
generate the cache control data at compile time. That would to remove
the need to perform database access to complete cache initialization,
as well as the need to check in various places whether initialization
has happened.

If this were done, catcache.c:InitCatCachePhase2() and
catcache.c:CatalogCacheInitializeCache() would disappear, and
syscache.c:InitCatalogCachePhase2() could be replaced by code that
simply opens the relations when writing new init files. Another
possibility this opens up is making the SysCacheRelationOid and
SysCacheSupportingRelOid arrays constant data as well.


Here's a basic design sketch:

1. Generate the current syscache cacheinfo[] array and cacheid enum by
adding a couple arguments to the declarations for system indexes, as
in:

#define DECLARE_UNIQUE_INDEX(name,oid,oid_macro,cacheid,num_buckets,decl)
extern int no_such_variable

DECLARE_UNIQUE_INDEX(pg_amop_opr_fam_index, 2654,
AccessMethodOperatorIndexId, AMOPOPID, 64, on pg_amop using
btree(amopopr oid_ops, amoppurpose char_ops, amopfamily oid_ops));

DECLARE_UNIQUE_INDEX(pg_amop_oid_index, 2756,
AccessMethodOperatorOidIndexId, -, 0, on pg_amop using btree(oid
oid_ops));

...and add in data we already know how to parse from the catalog
headers. Note that the last example has '-' and '0' to mean "no
cache". (The index oid macro is superfluous there, but kept for
consistency.)

2. Expand the cacheinfo[] element structs with the rest of the constant data:

Relname, and relisshared are straightforward. For eq/hash functions,
we could add metadata attributes to pg_type.dat for the relevant
types. Tuple descriptors would get their attrs from schemapg.h.

3. Simplify cat/syscache.c


Is this something worth doing?

[1] https://www.postgresql.org/message-id/1295.1507918074%40sss.pgh.pa.us


-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: generating catcache control data

От
Tom Lane
Дата:
John Naylor <john.naylor@2ndquadrant.com> writes:
> While digging through the archives, I found a thread from a couple
> years back about syscache performance. There was an idea [1] to
> generate the cache control data at compile time. That would to remove
> the need to perform database access to complete cache initialization,
> as well as the need to check in various places whether initialization
> has happened.

Right.

> 1. Generate the current syscache cacheinfo[] array and cacheid enum by
> adding a couple arguments to the declarations for system indexes, as
> in:
> #define DECLARE_UNIQUE_INDEX(name,oid,oid_macro,cacheid,num_buckets,decl)
> extern int no_such_variable

I do not like attaching this data to the DECLARE_UNIQUE_INDEX macros.
It's really no business of the indexes' whether they are associated
with a syscache.  It's *certainly* no business of theirs how many
buckets such a cache should start off with.

I'd be inclined to make a separate file that's specifically concerned
with declaring syscaches, and put all the required data there.

> Relname, and relisshared are straightforward. For eq/hash functions,
> we could add metadata attributes to pg_type.dat for the relevant
> types. Tuple descriptors would get their attrs from schemapg.h.

I don't see a need to hard-wire more information than we do today, and
I'd prefer not to because it adds to the burden of creating new syscaches.
Assuming that the plan is for genbki.pl or some similar script to generate
the constants, it could look up all the appropriate data from the initial
contents for pg_opclass and friends.  That is, basically what we want here
is for a constant-creation script to perform the same lookups that're now
done during backend startup.

> Is this something worth doing?

Hard to tell.  It'd take a few cycles out of backend startup, which
seems like a worthy goal; but I don't know if it'd save enough to be
worth the trouble.  Probably can't tell for sure without doing most
of the work :-(.

Perhaps you could break it up by building a hand-made copy of the
constants and then removing the runtime initialization code.  This'd
be enough to get data on the performance change.  Only if that looked
promising would you need to write the Perl script to compute the
constants.

            regards, tom lane



Re: generating catcache control data

От
Tom Lane
Дата:
... BTW, one other issue with changing this, at least if we want to
precompute tupdescs for all system catalogs used in catcaches, is that
that would put a very big crimp in doing runtime changes to catalogs.
While we'll probably never support changes in the physical layouts
of catalog rows, there is interest in being able to change some
auxiliary pg_attribute fields, e.g. attstattarget [1].  So we'd need
to be sure that the compiled-in tupdescs are only used to disassemble
catalog tuples, and not for other purposes.

Of course this issue arises already for the bootstrap catalogs, so
maybe it's been dealt with sufficiently.  But it's something to keep
an eye on.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/8b00ea5e-28a7-88ba-e848-21528b632354%402ndquadrant.com



Re: generating catcache control data

От
John Naylor
Дата:
On Fri, Oct 11, 2019 at 3:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do not like attaching this data to the DECLARE_UNIQUE_INDEX macros.
> It's really no business of the indexes' whether they are associated
> with a syscache.  It's *certainly* no business of theirs how many
> buckets such a cache should start off with.
>
> I'd be inclined to make a separate file that's specifically concerned
> with declaring syscaches, and put all the required data there.

That gave me another idea that would further reduce the bookkeeping
involved in creating new syscaches -- put declarations in the cache id
enum (syscache.h), like this:

#define DECLARE_SYSCACHE(cacheid,indexname,indexoid,numbuckets) cacheid

enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index,
AggregateFnoidIndexId, 16) = 0,
...
};

> > Relname, and relisshared are straightforward. For eq/hash functions,
> > we could add metadata attributes to pg_type.dat for the relevant
> > types. Tuple descriptors would get their attrs from schemapg.h.
>
> I don't see a need to hard-wire more information than we do today, and
> I'd prefer not to because it adds to the burden of creating new syscaches.
> Assuming that the plan is for genbki.pl or some similar script to generate
> the constants, it could look up all the appropriate data from the initial
> contents for pg_opclass and friends.  That is, basically what we want here
> is for a constant-creation script to perform the same lookups that're now
> done during backend startup.

Looking at it again, the eq/hash functions are local and looked up via
GetCCHashEqFuncs(), but the runtime of that is surely miniscule, so I
left it alone.

> > Is this something worth doing?
>
> Hard to tell.  It'd take a few cycles out of backend startup, which
> seems like a worthy goal; but I don't know if it'd save enough to be
> worth the trouble.  Probably can't tell for sure without doing most
> of the work :-(.

I went ahead and did just enough to remove the relation-opening code.
Looking in the archives, I found this as a quick test:

echo '\set x 1' > x.txt
./inst/bin/pgbench -n -C -c 1 -f x.txt -T 10

Typical numbers:

master:
number of transactions actually processed: 4276
latency average = 2.339 ms
tps = 427.549137 (including connections establishing)
tps = 24082.726350 (excluding connections establishing)

patch:
number of transactions actually processed: 4436
latency average = 2.255 ms
tps = 443.492369 (including connections establishing)
tps = 21817.308410 (excluding connections establishing)

...which amounts to nearly 4% improvement in the first tps number,
which isn't earth-shattering, but it's something. Opinions? It
wouldn't be a lot of additional work to put together a WIP patch.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services