Обсуждение: inconsistency and inefficiency in setup_conversion()

Поиск

Список

Период

Сортировка

inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

28 апреля 2018 г., 21:51:03

Taking a close look at the result of setup_conversion(), wrong or at
least confusing comments are applied to the functions. Consider this
family of conversions:

select conproc, conname
from pg_conversion
where conproc = 'utf8_to_win'::regproc
order by oid;
   conproc   |       conname
-------------+----------------------
 utf8_to_win | utf8_to_windows_866
 utf8_to_win | utf8_to_windows_874
 utf8_to_win | utf8_to_windows_1250
 utf8_to_win | utf8_to_windows_1251
 utf8_to_win | utf8_to_windows_1252
 utf8_to_win | utf8_to_windows_1253
 utf8_to_win | utf8_to_windows_1254
 utf8_to_win | utf8_to_windows_1255
 utf8_to_win | utf8_to_windows_1256
 utf8_to_win | utf8_to_windows_1257
 utf8_to_win | utf8_to_windows_1258
(11 rows)

Then compare the comment on the function:

select proname, description
from pg_description d
join pg_proc p on d.objoid=p.oid
where classoid = 'pg_proc'::regclass
and description ~ 'for UTF8 to WIN';
   proname   |                   description
-------------+--------------------------------------------------
 utf8_to_win | internal conversion function for UTF8 to WIN1258
(1 row)

Notice how the comment refers to the last encoding created. This is
because setup_conversion.sql invokes CREATE OR REPLACE FUNCTION
utf8_to_win [...] multiple times, each with different comments
specific to the encoding. It'd be messy at best to try to construct
the right comment using the current Makefile script. It also can't be
good for initdb performance to create 44 functions just to immediately
drop them. Speaking of, from this thread about initdb performance [1],
setup_conversion() consumed the biggest share of time. I propose to
get rid of the ad hoc $(CONVERSIONS) format and solve the comment
issue, while hopefully shaving a bit more time off of initdb. It seems
our options are the following:

Solution #1 - As alluded to in [1], turn the conversions into
pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
pg_wchar.h to map conversion names to numbers.
Pros:
-likely easy to do
-allows for the removal of an install target in the Makefile as well
as ad hoc logic in MSVC
-uses a format that developers need to use anyway
Cons:
-immediately burns up 88 hard-coded OIDs and one for each time a
conversion proc is created
-would require editing data in two catalogs every time a conversion
proc is created

Solution #2 - Write a new script that would read all the .c files in
the various directories and output two files. These would be COPY'd
into temp tables during initdb, and then inserted into pg_proc,
pg_conversion, and pg_description using SQL.
Pros:
-eliminates all(?) manual catalog maintenance when adding new conversion procs
Cons:
-likely complex and difficult to debug
-further complicates initdb.c
-requires MSVC development

If we do anything, I'd much rather do #1, but that way is not entirely
without downsides compared to doing nothing. Any thoughts?

[1] https://www.postgresql.org/message-id/b549c8ad-f12e-aad1-9a59-b24cb3e55a17@proxel.se

Re: inconsistency and inefficiency in setup_conversion()

От

Tom Lane

Дата:

28 апреля 2018 г., 22:44:46

John Naylor <jcnaylor@gmail.com> writes:
> Taking a close look at the result of setup_conversion(), wrong or at
> least confusing comments are applied to the functions.

Ugh.  Between that and the large chunk of initdb runtime eaten by
setup_conversion(), that seems like plenty of reason to redo it.

> Solution #1 - As alluded to in [1], turn the conversions into
> pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
> pg_wchar.h to map conversion names to numbers.
> Pros:
> -likely easy to do
> -allows for the removal of an install target in the Makefile as well
> as ad hoc logic in MSVC
> -uses a format that developers need to use anyway
> Cons:
> -immediately burns up 88 hard-coded OIDs and one for each time a
> conversion proc is created
> -would require editing data in two catalogs every time a conversion
> proc is created

Given the rate at which new conversion procs have been created
historically (ie, next door to zero, after the initial feature addition),
I don't think that second "con" argument has any force.  Eating a batch
of manually-assigned OIDs seems risky mainly just in that it might force
adjustment of pending patches --- but we deal with that all the time.
So I like this answer, I think.

However, there is a "con" you didn't mention that perhaps ought to be
accounted for.  The way things are done now, neither these C functions
nor the pg_conversion entries are "pinned"; it's possible to drop and/or
recreate them.  That perhaps had significant value during development
of the conversions feature, but I'm doubtful that it's worth much
anymore.  Still, it's worth pointing out in case somebody disagrees.

            regards, tom lane

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

02 мая 2018 г., 18:19:51

On 4/28/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> Solution #1 - As alluded to in [1], turn the conversions into
>> pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
>> pg_wchar.h to map conversion names to numbers.
>> Pros:
>> -likely easy to do
>> -allows for the removal of an install target in the Makefile as well
>> as ad hoc logic in MSVC
>> -uses a format that developers need to use anyway
>> Cons:
>> -immediately burns up 88 hard-coded OIDs and one for each time a
>> conversion proc is created
>> -would require editing data in two catalogs every time a conversion
>> proc is created
>
> Given the rate at which new conversion procs have been created
> historically (ie, next door to zero, after the initial feature addition),
> I don't think that second "con" argument has any force.  Eating a batch
> of manually-assigned OIDs seems risky mainly just in that it might force
> adjustment of pending patches --- but we deal with that all the time.
> So I like this answer, I think.

Attached is a draft patch to do this, along with the conversion script
used to create the entries. In writing this, a few points came up that
are worth bringing up:

-In the original SQL file the functions were not declared with an
explicit volatility, so by default they are 'volatile'. That seems
wrong for this kind of function, so I changed it to 'immutable'. It
seems the CREATE CONVERSION facility was created shortly after the
volatility classes were created, and I couldn't find any discussion
about it.

-I have not done performance testing of initdb yet. I'll do so at a
later date unless someone is excited enough to beat me to it.

-I piggy-backed on the OID lookup machinery for the encoding lookup,
but haven't changed all the comments that refer only to catalogs and
OIDs.

-With the 88 pg_proc entries with prolang=13 along with the 50 or so
with prolang=14, it might be worth it to create a language lookup.
This patch does not do so, however.

-This actually uses up 220 OIDs (88 + 132), since the conversions need
them for their comments to be loaded.

> However, there is a "con" you didn't mention that perhaps ought to be
> accounted for.  The way things are done now, neither these C functions
> nor the pg_conversion entries are "pinned"; it's possible to drop and/or
> recreate them.  That perhaps had significant value during development
> of the conversions feature, but I'm doubtful that it's worth much
> anymore.  Still, it's worth pointing out in case somebody disagrees.

-For this draft, I let them get pinned, and changed the sanity test to
reflect that. It'd be easy enough to add exceptions to setup_depend(),
though. (one for pg_conversion, and one to change the pg_proc query to
exclude C language functions)

I'll create a commitfest entry soon.

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

Thomas Munro

Дата:

17 мая 2018 г., 14:39:50

On Thu, May 3, 2018 at 12:19 AM, John Naylor <jcnaylor@gmail.com> wrote:
> Attached is a draft patch to do this, along with the conversion script
> used to create the entries. In writing this, a few points came up that
> are worth bringing up:

Hi John,

This failed for my patch testing robot on Windows, with this message[1]:

Generating pg_config_paths.h...
No include path; you must specify -I.
Could not open src/backend/catalog/schemapg.h at
src/tools/msvc/Mkvcbuild.pm line 832.

I see that you changed src/backend/catalog/Makefile to pass the new -I
switch to genbki.pl.  I think for Windows you might need to add it to
the line in src/tools/msvc/Solution.pm that invokes genbki.pl via
system()?

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.48

-- 
Thomas Munro
http://www.enterprisedb.com

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

18 мая 2018 г., 16:53:34

On 5/17/18, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> Hi John,
>
> This failed for my patch testing robot on Windows, with this message[1]:
...
> I see that you changed src/backend/catalog/Makefile to pass the new -I
> switch to genbki.pl.  I think for Windows you might need to add it to
> the line in src/tools/msvc/Solution.pm that invokes genbki.pl via
> system()?

Yes, you're quite right. Thanks for the report. I've attached an
updated v2 patchset, with some additional revisions:

I wrote:

> -I have not done performance testing of initdb yet. I'll do so at a
> later date unless someone is excited enough to beat me to it.

Tom Lane reported [1] that setup_conversion() took ~70ms out of
1300ms. With this patch, on hardware with a similar total runtime, the
total is ~65ms faster, as one would expect based on the number of new
entries.

> -I piggy-backed on the OID lookup machinery for the encoding lookup,
> but haven't changed all the comments that refer only to catalogs and
> OIDs.

I considered changing the SGML documentation referring to OID
references [2], since this patch invalidates some details, but in the
end I thought that would make the docs harder to follow for the sake
of a small corner case. Instead, I added some comments around the
encoding lookups to alert the reader we're repurposing the OID lookup
machinery. I can revisit this later if desired.

> -With the 88 pg_proc entries with prolang=13 along with the 50 or so
> with prolang=14, it might be worth it to create a language lookup.
> This patch does not do so, however.

In version 2, patch 0001 adds a pg_language lookup. It turns out doing
so has a side benefit of simplifying Gen_fmgr.pl and its Makefile,
too.

--
[1] https://www.postgresql.org/message-id/7408.1525812528%40sss.pgh.pa.us
[2] https://www.postgresql.org/docs/devel/static/system-catalog-initial-data.html#SYSTEM-CATALOG-OID-REFERENCES

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

Thomas Munro

Дата:

19 мая 2018 г., 04:34:03

On Fri, May 18, 2018 at 10:53 PM, John Naylor <jcnaylor@gmail.com> wrote:
> On 5/17/18, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> Hi John,
>>
>> This failed for my patch testing robot on Windows, with this message[1]:
> ...
>> I see that you changed src/backend/catalog/Makefile to pass the new -I
>> switch to genbki.pl.  I think for Windows you might need to add it to
>> the line in src/tools/msvc/Solution.pm that invokes genbki.pl via
>> system()?
>
> Yes, you're quite right. Thanks for the report. I've attached an
> updated v2 patchset, with some additional revisions:

Looks good on Windows[1] but now it's broken on Linux[2] (also
reproducible on my macOS laptop):

make[2]: *** No rule to make target
`../../../src/include/pg_proc.dat', needed by `fmgr-stamp'.  Stop.

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.75
[2] https://travis-ci.org/postgresql-cfbot/postgresql/builds/380898356

-- 
Thomas Munro
http://www.enterprisedb.com

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

19 мая 2018 г., 12:41:10

On 5/19/18, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> Looks good on Windows[1] but now it's broken on Linux[2] (also
> reproducible on my macOS laptop):

That's what happens when I squeeze in a small Makefile change without
testing the build. Fix attached. Thanks again and sorry for the noise.

(In case I didn't mention previously, the Perl script is for
visibility; reviewers/committers don't need to run it.)

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

02 июля 2018 г., 12:59:03

I've attached v4, which is a rebase plus some comment revisions.

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

Michael Paquier

Дата:

02 октября 2018 г., 08:56:19

Hi John,

On Mon, Jul 02, 2018 at 01:59:03PM +0700, John Naylor wrote:
> I've attached v4, which is a rebase plus some comment revisions.

v4 does not apply anymore.  I am moving this patch to next commit fest,
waiting on author.
--
Michael

Вложения

signature.asc

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

02 октября 2018 г., 13:06:48

On 10/2/18, Michael Paquier <michael@paquier.xyz> wrote:
> v4 does not apply anymore.  I am moving this patch to next commit fest,
> waiting on author.

v5 attached.

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

13 октября 2018 г., 22:42:59

Attached is v6, a simple rebase.

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

26 ноября 2018 г., 11:31:08

v7 is a rebase over recent catalog changes.

-John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

Dmitry Dolgov

Дата:

01 декабря 2018 г., 15:26:14

> On Mon, Nov 26, 2018 at 9:31 AM John Naylor <jcnaylor@gmail.com> wrote:
>
> v7 is a rebase over recent catalog changes.

Thanks for working on this,

I see that the author keeps patch updated, but I'm a bit worried because of
lack of full review since probably May. I'm moving it to the next CF, let's see
if there would be more feedback.

P.S. adding Daniel, since he is assigned as a reviewer.

Re: inconsistency and inefficiency in setup_conversion()

От

John Naylor

Дата:

15 декабря 2018 г., 00:28:08

On 12/1/18, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I see that the author keeps patch updated, but I'm a bit worried because of
> lack of full review since probably May. I'm moving it to the next CF, let's
> see
> if there would be more feedback.
>
> P.S. adding Daniel, since he is assigned as a reviewer.

Having heard nothing in a while, I've removed Daniel as a reviewer to
make room for someone else. He is, of course free to re-add himself.
v8 is attached.

Since it's been a few months since last discussion, I'd like to
summarize the purpose of this patch and advocate for its inclusion in
v12:

1. Correctness
In the intro thread [1], I showed that object comments on some
conversions are wrong, and hard to fix given the current setup. This
is a documentation bug of sorts.

2. Clean-up
Currently, utils/mb/conversion_procs/Makefile has an ad-hoc script to
generate the SQL file, which has to be duplicated in the MSVC tooling,
and executed by initdb.c. Storing the conversions in .dat files
removes the need for any of that.

3. Performance
This patch shaves 5-6% off of initdb. Not as much as hoped, but still
a nice bonus.

[1] https://www.postgresql.org/message-id/CAJVSVGWtUqxpfAaxS88vEGvi%2BjKzWZb2EStu5io-UPc4p9rSJg%40mail.gmail.com

--John Naylor

Вложения

Re: inconsistency and inefficiency in setup_conversion()

От

Tom Lane

Дата:

04 января 2019 г., 03:49:17

John Naylor <jcnaylor@gmail.com> writes:
> Since it's been a few months since last discussion, I'd like to
> summarize the purpose of this patch and advocate for its inclusion in
> v12:

Pushed after some review and correction.  Notably, I didn't like
leaving the info about the encoding lookup out of bki.sgml, so
I changed that.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: inconsistency and inefficiency in setup_conversion()

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения