Обсуждение: CustomScan and readfuncs.c

Поиск
Список
Период
Сортировка

CustomScan and readfuncs.c

От
Kouhei Kaigai
Дата:
Hello,

Under the investigation of ParallelAppend, I noticed here is a few
problems in CustomScan, that prevents to reproduce an equivalent
plan node on the background worker from serialized string.

1. CustomScanMethods->TextOutCustomScan callback
------------------------------------------------
This callback allows to output custom information on nodeToString.
Originally, we intend to use this callback for debug only, because
CustomScan must be copyObject() safe, thus, all the private data
also must be stored in custom_exprs or custom_private.
However, it will lead another problem when we try to reproduce
CustomScan node from the string form generated by outfuncs.c.
If TextOutCustomScan prints something, upcoming _readCustomScan
has to deal with unexpected number of tokens in unexpected format.
I'd like to propose to omit this callback prior to v9.5 release,
for least compatibility issues.

2. Reproduce method table on background worker
----------------------------------------------
The method field of CustomPath/Scan/ScanState is expected to be
a reference to a static structure. Thus, copyObject() does not
copy the entire table, but only pointers.
However, we have no way to guarantee the callback functions have
same entrypoint addressed on background workers. So, we may need
an infrastructure to reproduce same CustomScan node with same
callback function tables, probably, identified by name.
We may have a few ways to solve the problem.

* Add system catalog, function returns pointer
The simplest way, like FDW. System catalog has a name and function
to return callback pointers. It also needs SQL statement support,
even a little down side.

* Registered by name, during shared_preload_libraries only
Like an early version of CustomScan interface, it requires custom-
scan providers to register a pair of name and callbacks, but only
when shared_preload_libraries is processed, to guarantee the callbacks
are also registered in the background workers also.
(Is this assumption right on windows?)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>





Re: CustomScan and readfuncs.c

От
Tom Lane
Дата:
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> Under the investigation of ParallelAppend, I noticed here is a few
> problems in CustomScan, that prevents to reproduce an equivalent
> plan node on the background worker from serialized string.

> 1. CustomScanMethods->TextOutCustomScan callback
> ------------------------------------------------
> This callback allows to output custom information on nodeToString.
> Originally, we intend to use this callback for debug only, because
> CustomScan must be copyObject() safe, thus, all the private data
> also must be stored in custom_exprs or custom_private.
> However, it will lead another problem when we try to reproduce
> CustomScan node from the string form generated by outfuncs.c.
> If TextOutCustomScan prints something, upcoming _readCustomScan
> has to deal with unexpected number of tokens in unexpected format.

Um ... wait a second.  There is no support in readfuncs for any
plan node type, and never has been, and I seriously doubt that there
ever should be.  I do not think it makes sense to ship plans around
in the way you seem to have in mind.  (Also, I don't think the
problems you mention are exactly unique to CustomScan.  There's no
reason to assume that FDW plans could survive this treatment either,
since we do not know what's in the fdw_private stuff; certainly no
one has ever suggested that it should not contain pointers to static
data.)

> I'd like to propose to omit this callback prior to v9.5 release,
> for least compatibility issues.

I regard our commitment to cross-version compatibility for the
custom scan APIs as being essentially zero, for reasons previously
discussed.  So if this goes away in 9.6 it will not matter, but we
might as well leave it in for now for debug support.
        regards, tom lane



Re: CustomScan and readfuncs.c

От
Kouhei Kaigai
Дата:
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > Under the investigation of ParallelAppend, I noticed here is a few
> > problems in CustomScan, that prevents to reproduce an equivalent
> > plan node on the background worker from serialized string.
>
> > 1. CustomScanMethods->TextOutCustomScan callback
> > ------------------------------------------------
> > This callback allows to output custom information on nodeToString.
> > Originally, we intend to use this callback for debug only, because
> > CustomScan must be copyObject() safe, thus, all the private data
> > also must be stored in custom_exprs or custom_private.
> > However, it will lead another problem when we try to reproduce
> > CustomScan node from the string form generated by outfuncs.c.
> > If TextOutCustomScan prints something, upcoming _readCustomScan
> > has to deal with unexpected number of tokens in unexpected format.
>
> Um ... wait a second.  There is no support in readfuncs for any
> plan node type, and never has been, and I seriously doubt that there
> ever should be.  I do not think it makes sense to ship plans around
> in the way you seem to have in mind.  (Also, I don't think the
> problems you mention are exactly unique to CustomScan.  There's no
> reason to assume that FDW plans could survive this treatment either,
> since we do not know what's in the fdw_private stuff; certainly no
> one has ever suggested that it should not contain pointers to static
> data.)
>
Yep, no Plan node types are supported at this moment, however, will
appear soon by the Funnel + PartialSeqScan nodes.
It serializes a partial plan subtree using nodeToString() then gives
the flatten PlannedStmt to background workers.
I'm now investigating to apply same structure to Append not to kick
child nodes in parallel.
Once various plan node types appear in readfuncs.c, we have to care
about this problem, don't it? I'm working for the patch submission
of ParallelAppend on the next commit-fest, so like to make a consensus
how to treat this matter.

> > I'd like to propose to omit this callback prior to v9.5 release,
> > for least compatibility issues.
>
> I regard our commitment to cross-version compatibility for the
> custom scan APIs as being essentially zero, for reasons previously
> discussed.  So if this goes away in 9.6 it will not matter, but we
> might as well leave it in for now for debug support.
>
I don't argue this point strongly. If TextOutCustomScan shall be
obsoleted on v9.6, it is just kindness for developers not to use
this callback.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: CustomScan and readfuncs.c

От
Kouhei Kaigai
Дата:
> 2. Reproduce method table on background worker
> ----------------------------------------------
> The method field of CustomPath/Scan/ScanState is expected to be
> a reference to a static structure. Thus, copyObject() does not
> copy the entire table, but only pointers.
> However, we have no way to guarantee the callback functions have
> same entrypoint addressed on background workers. So, we may need
> an infrastructure to reproduce same CustomScan node with same
> callback function tables, probably, identified by name.
> We may have a few ways to solve the problem.
>
> * Add system catalog, function returns pointer
> The simplest way, like FDW. System catalog has a name and function
> to return callback pointers. It also needs SQL statement support,
> even a little down side.
>
I tried to design a DDL statement and relevant system catalog as follows.
 #define CustomPlanRelationId    3999  CATALOG(pg_custom_plan,3999) {     NameData    custom_name;     regproc
custom_handler;} FormData_pg_custom_plan; 

This simple catalog saves a pair of name and handler function of custom
plan provider. Like FDW, this handler function returns pointers to the
entrypoint to be called by set_(rel|join)_pathlist_hook and relevant
CustomXXXMethods table.

User can register a custom plan provider using the following statement: CREATE CUSTOM PLAN <name> HANDLER
<function_name>;

And unregister: DROP CUSTOM PLAN <name>;

This enhancement allows background workers to reproduce CustomScan node
that was serialized by nodeToString(), as long as provider is specified
by the name.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Kaigai Kouhei(海外 浩平)
> Sent: Monday, July 27, 2015 8:42 AM
> To: 'Tom Lane'
> Cc: pgsql-hackers@postgresql.org
> Subject: RE: [HACKERS] CustomScan and readfuncs.c
>
> > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > > Under the investigation of ParallelAppend, I noticed here is a few
> > > problems in CustomScan, that prevents to reproduce an equivalent
> > > plan node on the background worker from serialized string.
> >
> > > 1. CustomScanMethods->TextOutCustomScan callback
> > > ------------------------------------------------
> > > This callback allows to output custom information on nodeToString.
> > > Originally, we intend to use this callback for debug only, because
> > > CustomScan must be copyObject() safe, thus, all the private data
> > > also must be stored in custom_exprs or custom_private.
> > > However, it will lead another problem when we try to reproduce
> > > CustomScan node from the string form generated by outfuncs.c.
> > > If TextOutCustomScan prints something, upcoming _readCustomScan
> > > has to deal with unexpected number of tokens in unexpected format.
> >
> > Um ... wait a second.  There is no support in readfuncs for any
> > plan node type, and never has been, and I seriously doubt that there
> > ever should be.  I do not think it makes sense to ship plans around
> > in the way you seem to have in mind.  (Also, I don't think the
> > problems you mention are exactly unique to CustomScan.  There's no
> > reason to assume that FDW plans could survive this treatment either,
> > since we do not know what's in the fdw_private stuff; certainly no
> > one has ever suggested that it should not contain pointers to static
> > data.)
> >
> Yep, no Plan node types are supported at this moment, however, will
> appear soon by the Funnel + PartialSeqScan nodes.
> It serializes a partial plan subtree using nodeToString() then gives
> the flatten PlannedStmt to background workers.
> I'm now investigating to apply same structure to Append not to kick
> child nodes in parallel.
> Once various plan node types appear in readfuncs.c, we have to care
> about this problem, don't it? I'm working for the patch submission
> of ParallelAppend on the next commit-fest, so like to make a consensus
> how to treat this matter.
>
> > > I'd like to propose to omit this callback prior to v9.5 release,
> > > for least compatibility issues.
> >
> > I regard our commitment to cross-version compatibility for the
> > custom scan APIs as being essentially zero, for reasons previously
> > discussed.  So if this goes away in 9.6 it will not matter, but we
> > might as well leave it in for now for debug support.
> >
> I don't argue this point strongly. If TextOutCustomScan shall be
> obsoleted on v9.6, it is just kindness for developers not to use
> this callback.
>
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: CustomScan and readfuncs.c

От
Robert Haas
Дата:
On Sun, Jul 26, 2015 at 11:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Um ... wait a second.  There is no support in readfuncs for any
> plan node type, and never has been, and I seriously doubt that there
> ever should be.

As KaiGai says, the parallel query stuff contemplates changing this;
how else will we get the plan from the master to each worker?  We
could invent a bespoke serialization format, but that seems to have
exactly nothing to recommend it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company