Обсуждение: patch: function xmltable

Поиск
Список
Период
Сортировка

patch: function xmltable

От
Pavel Stehule
Дата:
Hi

I am sending implementation of xmltable function. The code should to have near to final quality and it is available for testing.

I invite any help with documentation and testing.

Regards

Pavel
Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-08-19 10:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I am sending implementation of xmltable function. The code should to have near to final quality and it is available for testing.

I invite any help with documentation and testing.

new update - the work with nodes is much more correct now.

Regards

Pavel


Regards

Pavel

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-08-23 21:00 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2016-08-19 10:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I am sending implementation of xmltable function. The code should to have near to final quality and it is available for testing.

I invite any help with documentation and testing.

new update - the work with nodes is much more correct now.

next update

fix memory leak

Pavel
 

Regards

Pavel


Regards

Pavel


Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

minor update - using DefElem instead own private parser type

Regards

Pavel
Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> minor update - using DefElem instead own private parser type

I'm really glad that you're doing this and I'll take a look at it for this CF.

It's quite a big patch so I expect this will take a few rounds of
review and updating.


Patch applies cleanly and builds cleanly on master both with and
without --with-xml .

Overall, I think this needs to be revised with appropriate comments.
Whitespace/formatting needs fixing since it's all over the place.
Documentation is insufficient (per notes below).

Re identifier naming, some of this code uses XmlTable naming patterns,
some uses TableExpr prefixes. Is that intended to indicate a bounary
between things re-usable for other structured data ingesting
functions? Do you expect a "JSONEXPR" or similar in future? That's
alluded to by

+/*----------
+ * TableExpr - used for XMLTABLE function
+ *
+ * This can be used for json_table, jsonb_table functions in future
+ *----------
+ */
+typedef struct TableExpr
+{
...

If so, should this really be two patches, one to add the table
expression infrastructure and another to add XMLTABLE that uses it?
Also, why in that case does so much of the TableExpr code call
directly into XmlTable code? It doesn't look very generic.

Overall I find identifier naming to be a bit inconsisent and think
it's necessary to make it clear that all the "TableExpr" stuff is for
XMLTABLE specifically, if that's the case, or make the delineation
clearer if not.

I'd also like to see tests that exercise the ruleutils get_rule_expr
parts of the code for the various XMLTABLE variants.

Similarly, since this seems to add a new xpath parser, that needs
comprehensive tests. Maybe re-purpose an existing xpath test data set?




More detailed comments:
====

Docs comments:
     The <function>xmltable</function> produces [a] table based on
[the] passed XML value.

The docs are pretty minimal and don't explain the various clauses of
XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
there a good cross reference link available? The PASSING clause? etc.

How does XMLTABLE decide what to iterate over, and how to iterate over it?

Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.

What standard, if any, does this conform to? Does it resemble
implementations elsewhere? What limitations or unsupported features
does it have relative to those standards?



execEvalTableExpr seems to be defined twice, with a difference in
case. This is probably not going to fly:


+static Datum
+execEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{

+static Datum
+ExecEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{


It looks like you've split the function into a "guts" and "wrapper"
part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
That seems reasonable for readability, but the naming isn't.

A comment is needed to explain what ExecEvalTableExpr is / does. If
it's XMLTABLE specific (which it looks like based on the code), its
name should reflect that. This pattern is repeated elsewhere; e.g.
TableExprState is really the state for an XMLTABLE expression. But
PostgreSQL actually has TABLE statements, and in future we might want
to support table-expressions, so I don't think this naming is
appropriate. This is made worse by the lack of comments on things like
the definition of TableExprState. Please use something that makes it
clear it's for XMLTABLE and add appropriate comments.

Formatting of variables, arguments, function signatures etc is
random/haphazard and doesn't follow project convention. It's neither
aligned or unaligned in the normal way, I don't understand the random
spacing at all. Maybe you should try to run pgindent and then extract
just the changes related to your patch? Or run your IDE/editor's
indent function on your changes? Right now it's actually kind of hard
to read. Do you edit with tabstop set to 1 normally or something like
that?

There's a general lack of comments throughout the added code.

In execEvalTableExpr, why are we looping over namespaces? What's that
for? Comment would be nice.

Typo: Path caclulation => Path calculation

What does XmlTableSetRowPath() do? It seems to copy its argument.
Nothing further is done with the row_path argument after it's called
by execEvalTableExpr, so what context is that memory in and do we have
to worry about it if it's large?

execEvalTableExpr says it's doing "path calculation". What it actually
appears to do is evaluate the path expressions, if provided, and
otherwise use the column name as the implied path expression. (The
docs should mention that).

It's wasn't immediately obvious to me what the branch around
tstate->for_ordinality_col   is for and what the alternate path's
purpose is in terms of XMLTABLE's behaviour, until I read the parser
definition. That's largely because the behaviour of XMLTABLE is
underspecified in the docs, since once you know ORDINALITY columns
exist it's pretty obvious what it's doing.

Similarly, for the alternate branch   tstate->ncols   , the
XmlTableGetRowValue call there is meant to do what exactly, and
why/under what conditions? Is it for situations where the field type
is a whole-row value? a composite type? (I'm deliberately not studying
this too deeply, these are points I'd like to see commented so it can
be understood to some reasonable degree at a skim-read).

               /* result is one more columns every time */
"one or more"


               /* when typmod is not valid, refresh it */               if (te->typmod == -1)


Is this a cache? How is it valid or not valid and when? The comment
(thanks!) on TableExprGetTupleDesc says:

/** When we skip transform stage (in view), then TableExpr's* TupleDesc should not be valid. Refresh is necessary.*/

but I'm not really grasping what you're trying to explain here. What
transform stage? What view? This could well be my ignorance of this
part of the code; if it should be understandable by a reader who is
appropriately familiar with the executor that's fine, but if it's
specific to how XMLTABLE works some more explanation would be good.

Good that you've got all the required node copy/in/out funcs in place.

Please don't use the name "used_dns". Anyone reading that will read it
as "domain name service" and that's actually confusing with XML
because of XML schema lookups. Maybe used_defnamespace ?   used
def_ns?

I haven't looked closely at keyword/parser changes yet, but it doesn't
look like you added any reserved keywords, which is good. It does add
unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
unreserved keywords is or the significance of that.

New ereport() calls specify ERRCODEs, which is good.

PostgreSQL already has XPATH support in the form of xmlexists(...)
etc. Why is getXPathToken() etc needed? What re-use is possible here?
There's no explanation in the patch header or comments. Should the new
xpath parser be re-used by the existing xpath stuff? Why can't we use
libxml's facilities? etc. This at least needs explaining in the
submission, and some kind of hint as to why we have two different ways
to do it is needed in the code. If we do need a new XML parser, should
it be bundled in adt/xml.c along with a lot of user-facing
functionality, or a separate file?


How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
this? It doesn't look like they're intended to be called directly by
the user, and they're not documented (or commented).

I don't understand this at all:



+/*
+ * There are different requests from XMLTABLE, JSON_TABLE functions
+ * on passed data than has CREATE TABLE command. It is reason for
+ * introduction special structure instead using ColumnDef.
+ */
+typedef struct TableExprRawCol
+{
+    NodeTag     type;
+    char       *colname;
+    TypeName   *typeName;
+    bool        for_ordinality;
+    bool        is_not_null;
+    Node       *path_expr;
+    Node       *default_expr;
+    int         location;
+} TableExprRawCol;



That's my first-pass commentary. I'll return to this once you've had a
chance to take a look at these and tell me all the places I got it
wrong ;)



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-09-06 6:54 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> minor update - using DefElem instead own private parser type

I'm really glad that you're doing this and I'll take a look at it for this CF.

It's quite a big patch so I expect this will take a few rounds of
review and updating.

Thank you for review
 


Patch applies cleanly and builds cleanly on master both with and
without --with-xml .

Overall, I think this needs to be revised with appropriate comments.
Whitespace/formatting needs fixing since it's all over the place.
Documentation is insufficient (per notes below).

I am not able to write documentation in English language :( - This function is pretty complex - so I hope so anybody with better language skills can help with this. It respects standard and it respects little bit different Oracle's behave too (different order of DEFAULT and PATH parts). 
 

Re identifier naming, some of this code uses XmlTable naming patterns,
some uses TableExpr prefixes. Is that intended to indicate a bounary
between things re-usable for other structured data ingesting
functions? Do you expect a "JSONEXPR" or similar in future? That's
alluded to by

This structure should be reused by JSON_TABLE function. Now, it is little bit strange, because there is only XMLTABLE implementation - and I have to choose between a) using two different names now, b) renaming some part in future.

And although XMLTABLE and JSON_TABLE functions are pretty similar - share 90% of data (input value, path, columns definitions), these functions has different syntax - so only middle level code should be shared.
 

+/*----------
+ * TableExpr - used for XMLTABLE function
+ *
+ * This can be used for json_table, jsonb_table functions in future
+ *----------
+ */
+typedef struct TableExpr
+{
...

If so, should this really be two patches, one to add the table
expression infrastructure and another to add XMLTABLE that uses it?
Also, why in that case does so much of the TableExpr code call
directly into XmlTable code? It doesn't look very generic.

Currently the common part is not too big - just the Node related part - I am not sure about necessity of two patches. I am agree, there is missing some TableExpBuilder, where can be better isolated the XML part.  
 

Overall I find identifier naming to be a bit inconsisent and think
it's necessary to make it clear that all the "TableExpr" stuff is for
XMLTABLE specifically, if that's the case, or make the delineation
clearer if not.

I'd also like to see tests that exercise the ruleutils get_rule_expr
parts of the code for the various XMLTABLE variants.

Similarly, since this seems to add a new xpath parser, that needs
comprehensive tests. Maybe re-purpose an existing xpath test data set?


sure
 



More detailed comments:
====

Docs comments:

      The <function>xmltable</function> produces [a] table based on
[the] passed XML value.

The docs are pretty minimal and don't explain the various clauses of
XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
there a good cross reference link available? The PASSING clause? etc.

How does XMLTABLE decide what to iterate over, and how to iterate over it?

Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.

What standard, if any, does this conform to? Does it resemble
implementations elsewhere? What limitations or unsupported features
does it have relative to those standards?



execEvalTableExpr seems to be defined twice, with a difference in
case. This is probably not going to fly:


+static Datum
+execEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{

+static Datum
+ExecEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{


It looks like you've split the function into a "guts" and "wrapper"
part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
That seems reasonable for readability, but the naming isn't.

I invite any idea how these functions should be named.
 

A comment is needed to explain what ExecEvalTableExpr is / does. If
it's XMLTABLE specific (which it looks like based on the code), its
name should reflect that. This pattern is repeated elsewhere; e.g.
TableExprState is really the state for an XMLTABLE expression. But
PostgreSQL actually has TABLE statements, and in future we might want
to support table-expressions, so I don't think this naming is
appropriate. This is made worse by the lack of comments on things like
the definition of TableExprState. Please use something that makes it
clear it's for XMLTABLE and add appropriate comments.

I understand, so using TableExpr can be strange (for XMLTABLE function). But when we will have JSON_TABLE function, then it will have a sense.

"TableExprState" is consistent with "TableExpr".

Any idea how it should be changed?

 

Formatting of variables, arguments, function signatures etc is
random/haphazard and doesn't follow project convention. It's neither
aligned or unaligned in the normal way, I don't understand the random
spacing at all. Maybe you should try to run pgindent and then extract
just the changes related to your patch? Or run your IDE/editor's
indent function on your changes? Right now it's actually kind of hard
to read. Do you edit with tabstop set to 1 normally or something like
that?

There's a general lack of comments throughout the added code.

In execEvalTableExpr, why are we looping over namespaces? What's that
for? Comment would be nice.

Typo: Path caclulation => Path calculation

What does XmlTableSetRowPath() do? It seems to copy its argument.
Nothing further is done with the row_path argument after it's called
by execEvalTableExpr, so what context is that memory in and do we have
to worry about it if it's large?

execEvalTableExpr says it's doing "path calculation". What it actually
appears to do is evaluate the path expressions, if provided, and
otherwise use the column name as the implied path expression. (The
docs should mention that).

It's wasn't immediately obvious to me what the branch around
tstate->for_ordinality_col   is for and what the alternate path's
purpose is in terms of XMLTABLE's behaviour, until I read the parser
definition. That's largely because the behaviour of XMLTABLE is
underspecified in the docs, since once you know ORDINALITY columns
exist it's pretty obvious what it's doing.

Similarly, for the alternate branch   tstate->ncols   , the
XmlTableGetRowValue call there is meant to do what exactly, and
why/under what conditions? Is it for situations where the field type
is a whole-row value? a composite type? (I'm deliberately not studying
this too deeply, these are points I'd like to see commented so it can
be understood to some reasonable degree at a skim-read).


                /* result is one more columns every time */
"one or more"



                /* when typmod is not valid, refresh it */
                if (te->typmod == -1)


Is this a cache? How is it valid or not valid and when? The comment
(thanks!) on TableExprGetTupleDesc says:

/*
 * When we skip transform stage (in view), then TableExpr's
 * TupleDesc should not be valid. Refresh is necessary.
 */

but I'm not really grasping what you're trying to explain here. What
transform stage? What view? This could well be my ignorance of this
part of the code; if it should be understandable by a reader who is
appropriately familiar with the executor that's fine, but if it's
specific to how XMLTABLE works some more explanation would be good.

This is most difficult part of this patch, and I am not sure it it is fully correctly implemented. I use TupleDesc cache. The TupleDesc is created in parser/transform stage. When the XMLTABLE is used in some view, then the transformed parser tree is materialized - and when the view is used in query, then this tree is loaded and the parser/transform stage is "skipped". I'll check this code against implementation of ROW constructor and I'll try to do more comments there.

 

Good that you've got all the required node copy/in/out funcs in place.

Please don't use the name "used_dns". Anyone reading that will read it
as "domain name service" and that's actually confusing with XML
because of XML schema lookups. Maybe used_defnamespace ?   used
def_ns?

good idea
 

I haven't looked closely at keyword/parser changes yet, but it doesn't
look like you added any reserved keywords, which is good. It does add
unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
unreserved keywords is or the significance of that.

New ereport() calls specify ERRCODEs, which is good.

PostgreSQL already has XPATH support in the form of xmlexists(...)
etc. Why is getXPathToken() etc needed? What re-use is possible here?
There's no explanation in the patch header or comments. Should the new
xpath parser be re-used by the existing xpath stuff? Why can't we use
libxml's facilities? etc. This at least needs explaining in the
submission, and some kind of hint as to why we have two different ways
to do it is needed in the code. If we do need a new XML parser, should
it be bundled in adt/xml.c along with a lot of user-facing
functionality, or a separate file?


libxml2 and our XPATH function doesn't support default namespace ( http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I implemented. This is the mayor issue of libxml2 library. Another difference between XPATH function and XMLTABLE function is using two phase searching and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using two XPATH expressions - for row data cutting and next for column data cutting (from row data). The our XPATH functions is pretty simple mapped to libxml2 XPATH API. But it is not possible with XMLTABLE function - due design of this function in standard (it is more user friendly and doesn't require exactly correct xpath expressions).

I didn't find any API in libxml2 for a work with parsed xpath expressions - I need some info about the first and last token of xpath expression - it is base for decision about using prefix or suffix.

This functionality (xpath expression parser) cannot be used for our XPATH function now - maybe default namespace in future.
 

How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
this? It doesn't look like they're intended to be called directly by
the user, and they're not documented (or commented).

Probably I used wrong names. XMLTABLE function is running in two different modes - with explicitly defined columns (XmlTableGetValue is used), and without explicitly defined columns - so result is one XML column and only one one step searching is used (there are not column related xpath expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue is used for getting one column value, the function XmlTableGetRowValue is used for getting one value too, but in special case, when there are not any other value.

I don't understand this at all:



+/*
+ * There are different requests from XMLTABLE, JSON_TABLE functions
+ * on passed data than has CREATE TABLE command. It is reason for
+ * introduction special structure instead using ColumnDef.
+ */
+typedef struct TableExprRawCol
+{
+    NodeTag     type;
+    char       *colname;
+    TypeName   *typeName;
+    bool        for_ordinality;
+    bool        is_not_null;
+    Node       *path_expr;
+    Node       *default_expr;
+    int         location;
+} TableExprRawCol;



I am sorry. It is my fault. Now we have very similar node ColumnDef. This node is designed for usage in utility commands - and it is not designed for usage inside a query. I had to decide between enhancing ColumnDef node or introduction new special node. Because there are more special attributes and it is hard to serialize current ColumnDef, I decided to use new node.
 

That's my first-pass commentary. I'll return to this once you've had a
chance to take a look at these and tell me all the places I got it
wrong ;)


Thank for this

Regard

Pavel


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:






libxml2 and our XPATH function doesn't support default namespace ( http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I implemented. This is the mayor issue of libxml2 library. Another difference between XPATH function and XMLTABLE function is using two phase searching and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using two XPATH expressions - for row data cutting and next for column data cutting (from row data). The our XPATH functions is pretty simple mapped to libxml2 XPATH API. But it is not possible with XMLTABLE function - due design of this function in standard (it is more user friendly and doesn't require exactly correct xpath expressions).

libxm2 doesn't support xpath 2.0  where default namespace was introduced.

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 7 September 2016 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Overall, I think this needs to be revised with appropriate comments.
>> Whitespace/formatting needs fixing since it's all over the place.
>> Documentation is insufficient (per notes below).
>
>
> I am not able to write documentation in English language :( - This function
> is pretty complex - so I hope so anybody with better language skills can
> help with this. It respects standard and it respects little bit different
> Oracle's behave too (different order of DEFAULT and PATH parts).

OK, no problem. It can't be committed without more comprehensive docs
though, especially for new and nontrivial functionality.

Is there some reference material you can point to so someone else can
help with docs? And can you describe what differences there are
between your implementation and the reference?

Alternately, if you document it in Czech, do you know of anyone who
could assist in translating to English for the main documentation?

>> Re identifier naming, some of this code uses XmlTable naming patterns,
>> some uses TableExpr prefixes. Is that intended to indicate a bounary
>> between things re-usable for other structured data ingesting
>> functions? Do you expect a "JSONEXPR" or similar in future? That's
>> alluded to by
>
>
> This structure should be reused by JSON_TABLE function. Now, it is little
> bit strange, because there is only XMLTABLE implementation - and I have to
> choose between a) using two different names now, b) renaming some part in
> future.

OK. Are you planning on writing this JSON_TABLE or are you leaving
room for future growth? Either way is fine, just curious.

> And although XMLTABLE and JSON_TABLE functions are pretty similar - share
> 90% of data (input value, path, columns definitions), these functions has
> different syntax - so only middle level code should be shared.

That makes sense.

I think it would be best if you separated out the TableExpr
infrastructure from the XMLTABLE implementation though, so we can
review the first level infrastrcture separately and make this a
2-patch series. Most importantly, doing it that way will help you find
places where TableExpr code calls directly into XMLTABLE code. If
TableExpr is supposed to be reusable for json etc, it probably
shouldn't be calling XmlTable stuff directly.

That also means somewhat smaller simpler patches, which probably isn't bad.

I don't necessarily think this needs to be fully pluggable with
callbacks etc. It doesn't sound like you expect this to be used by
extensions or to have a lot of users, right? So it probably just needs
clearer separation of the infrastructure layer from the xmltable
layer. I think splitting the patch will make that easier to see and
make it easier to find problems.

My biggest complaint at the moment is that execEvalTableExpr calls
initXmlTableContext(...) directly, is aware of XML namespaces
directly, calls XmlTableSetRowPath() directly, calls
XmlTableFetchRow() directly, etc. It is in no way generic/reusable for
some later JSONTABLE feature. That needs to be fixed by:

* Renaming it so it's clearly only for XMLTABLE; or
* Abstracting the init context, set row path, fetch row etc operations
so json ones can be plugged in later



> Currently the common part is not too big - just the Node related part - I am
> not sure about necessity of two patches.

The problem is that the common part is all mixed in with the
XMLTABLE-specific part, so it's not at all clear it can be common with
something else.

> I am agree, there is missing some
> TableExpBuilder, where can be better isolated the XML part.

Yeah, that's sort of what I'm getting at.


>> execEvalTableExpr seems to be defined twice, with a difference in
>> case. This is probably not going to fly:
>>
>>
>> +static Datum
>> +execEvalTableExpr(TableExprState *tstate,
>> +                        ExprContext *econtext,
>> +                        bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>> +static Datum
>> +ExecEvalTableExpr(TableExprState *tstate,
>> +                        ExprContext *econtext,
>> +                        bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>>
>> It looks like you've split the function into a "guts" and "wrapper"
>> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
>> That seems reasonable for readability, but the naming isn't.
>
>
> I invite any idea how these functions should be named.

Definitely not how they are ;) . They really can't differ in a single
character's case.

I'm not sure if PostgreSQL has any formal convention for this. Some
places use _impl e.g.  pg_read_barrier_impl() but that's in the
context of an interface-vs-implementation separation, which isn't the
case here.

Some places use _internal, like AlterObjectRename_internal(...), but
that's where there's an associated public/external part, which isn't
the case here.

Some places use _guts e.g. pg_logical_slot_get_changes_guts(...),
largely where there's common use by several callers.

This is a fairly arbitrary function split for readability/length. Is
it actually useful to split this function up at all?

Anyone else have an opinion?

>> A comment is needed to explain what ExecEvalTableExpr is / does. If
>> it's XMLTABLE specific (which it looks like based on the code), its
>> name should reflect that. This pattern is repeated elsewhere; e.g.
>> TableExprState is really the state for an XMLTABLE expression. But
>> PostgreSQL actually has TABLE statements, and in future we might want
>> to support table-expressions, so I don't think this naming is
>> appropriate. This is made worse by the lack of comments on things like
>> the definition of TableExprState. Please use something that makes it
>> clear it's for XMLTABLE and add appropriate comments.
>
>
> I understand, so using TableExpr can be strange (for XMLTABLE function). But
> when we will have JSON_TABLE function, then it will have a sense.

It's pretty hard to review that as shared infrastructure when it's
still tangled up in xmltable specifics, though.

> "TableExprState" is consistent with "TableExpr".
>
> Any idea how it should be changed?

I think if you want it to be shareable infrasructure, you need to
write it so it can be used as shared infrastructure. Not just name it
that way but then make it XMLTABLE specific in actual functionality.


>>                 /* when typmod is not valid, refresh it */
>>                 if (te->typmod == -1)
>>
>>
>> Is this a cache? How is it valid or not valid and when? The comment
>> (thanks!) on TableExprGetTupleDesc says:
>>
>> /*
>>  * When we skip transform stage (in view), then TableExpr's
>>  * TupleDesc should not be valid. Refresh is necessary.
>>  */
>>
>> but I'm not really grasping what you're trying to explain here. What
>> transform stage? What view? This could well be my ignorance of this
>> part of the code; if it should be understandable by a reader who is
>> appropriately familiar with the executor that's fine, but if it's
>> specific to how XMLTABLE works some more explanation would be good.
>
>
> This is most difficult part of this patch, and I am not sure it it is fully
> correctly implemented. I use TupleDesc cache. The TupleDesc is created in
> parser/transform stage. When the XMLTABLE is used in some view, then the
> transformed parser tree is materialized - and when the view is used in
> query, then this tree is loaded and the parser/transform stage is "skipped".
> I'll check this code against implementation of ROW constructor and I'll try
> to do more comments there.

Thanks. It would be good to highlight when this does and does not
happen and why. Why is it necessary at all?

What happens if XMLTABLE is used in a query directly, not part of a
view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared
statement / plpgsql statement / etc? What about a CTE term?

Not necessarily list all these cases one by one, just explain what
happens, when and why. Especially if it's complex, so other readers
can understand it and don't have to study it in detail to understand
what is going on. It does not need to be good public-facing
documentation, and details of wording, grammar etc can be fixed up
later, it's the ideas that matter.

Is this similar to other logic elsewhere? If so, reference that other
logic so readers know where to look. That way if they're
changing/bugfixing/etc one place they know there's another place that
might need changing.

I don't know this area of the code well enough to give a solid review
of the actual functionality, and I don't yet understand what it's
trying to do so it's hard to review it by studying what it actually
does vs what it claims to do. Maybe Peter E can help, he said he was
thinking of looking at this patch too. But more information on what
it's trying to do would be a big help.

>> PostgreSQL already has XPATH support in the form of xmlexists(...)
>> etc. Why is getXPathToken() etc needed? What re-use is possible here?
>> There's no explanation in the patch header or comments. Should the new
>> xpath parser be re-used by the existing xpath stuff? Why can't we use
>> libxml's facilities? etc. This at least needs explaining in the
>> submission, and some kind of hint as to why we have two different ways
>> to do it is needed in the code. If we do need a new XML parser, should
>> it be bundled in adt/xml.c along with a lot of user-facing
>> functionality, or a separate file?
>
> libxml2 and our XPATH function doesn't support default namespace (
> http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> implemented.

OK, that makes sense.

For the purpose of getting this patch in, is it a _necessary_ feature?
Can XMLTABLE be usefully implemented without it, and if so, can it be
added in a subsequent patch? It would be nice to simplify this by
using existing libxml2 functionality in the first version rather than
adding a whole new xpath as well!

> This is the mayor issue of libxml2 library. Another difference
> between XPATH function and XMLTABLE function is using two phase searching
> and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
> two XPATH expressions - for row data cutting and next for column data
> cutting (from row data). The our XPATH functions is pretty simple mapped to
> libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> design of this function in standard (it is more user friendly and doesn't
> require exactly correct xpath expressions).

So you can't use existing libxml2 xpath support to implement XMLTABLE,
even without default namespaces?

> I didn't find any API in libxml2 for a work with parsed xpath expressions -
> I need some info about the first and last token of xpath expression - it is
> base for decision about using prefix or suffix.
>
> This functionality (xpath expression parser) cannot be used for our XPATH
> function now - maybe default namespace in future.

So we'll have two different XPATH implementations for different
places, with different limitations, different possible bugs, etc?

What would be needed to make the new XPATH work for our built-in xpath
functions too?

>> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
>> this? It doesn't look like they're intended to be called directly by
>> the user, and they're not documented (or commented).
>
>
> Probably I used wrong names. XMLTABLE function is running in two different
> modes - with explicitly defined columns (XmlTableGetValue is used), and
> without explicitly defined columns - so result is one XML column and only
> one one step searching is used (there are not column related xpath
> expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
> is used for getting one column value, the function XmlTableGetRowValue is
> used for getting one value too, but in special case, when there are not any
> other value.

So both are internal implementation of the parser-level XMLTABLE(...)
construct and are not intended to be called directly by users - right?

Comments please! A short comment on the function saying this would be
a big help.

Regarding naming, do we already have a convention for functions that
are internal implementation of something the user "spells"
differently? Where it's transformed by the parser? I couldn't find
one. I don't much care about the names so long as there are comments
explaining what calls the functions and what the user-facing interface
that matches the function is.

Is it safe for users to call these directly? What happens if they do
so incorrectly?

Why are they not in pg_proc.h? Do they need to be?

>> +/*
>> + * There are different requests from XMLTABLE, JSON_TABLE functions
>> + * on passed data than has CREATE TABLE command. It is reason for
>> + * introduction special structure instead using ColumnDef.
>> + */
>> +typedef struct TableExprRawCol
>> +{
>> +    NodeTag     type;
>> +    char       *colname;
>> +    TypeName   *typeName;
>> +    bool        for_ordinality;
>> +    bool        is_not_null;
>> +    Node       *path_expr;
>> +    Node       *default_expr;
>> +    int         location;
>> +} TableExprRawCol;
>
> I am sorry. It is my fault. Now we have very similar node ColumnDef. This
> node is designed for usage in utility commands - and it is not designed for
> usage inside a query.

Makes sense.

> I had to decide between enhancing ColumnDef node or
> introduction new special node. Because there are more special attributes and
> it is hard to serialize current ColumnDef, I decided to use new node.

Seems reasonable. The summary is "this is the parse node for a column
of an XMLTABLE expression".

Suggested comment:

/** This is the parsenode for a column definition in a table-expression
like XMLTABLE.** We can't re-use ColumnDef here; the utility command column
definition has all the* wrong attributes for use in table-expressions and just doesn't make
sense here.*/
typedef struct TableExprColumn
{
...
};

?

Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?




-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-07 5:03 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 7 September 2016 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Overall, I think this needs to be revised with appropriate comments.
>> Whitespace/formatting needs fixing since it's all over the place.
>> Documentation is insufficient (per notes below).
>
>
> I am not able to write documentation in English language :( - This function
> is pretty complex - so I hope so anybody with better language skills can
> help with this. It respects standard and it respects little bit different
> Oracle's behave too (different order of DEFAULT and PATH parts).

OK, no problem. It can't be committed without more comprehensive docs
though, especially for new and nontrivial functionality.

Is there some reference material you can point to so someone else can
help with docs? And can you describe what differences there are
between your implementation and the reference?

Alternately, if you document it in Czech, do you know of anyone who
could assist in translating to English for the main documentation?

>> Re identifier naming, some of this code uses XmlTable naming patterns,
>> some uses TableExpr prefixes. Is that intended to indicate a bounary
>> between things re-usable for other structured data ingesting
>> functions? Do you expect a "JSONEXPR" or similar in future? That's
>> alluded to by
>
>
> This structure should be reused by JSON_TABLE function. Now, it is little
> bit strange, because there is only XMLTABLE implementation - and I have to
> choose between a) using two different names now, b) renaming some part in
> future.

OK. Are you planning on writing this JSON_TABLE or are you leaving
room for future growth? Either way is fine, just curious.

> And although XMLTABLE and JSON_TABLE functions are pretty similar - share
> 90% of data (input value, path, columns definitions), these functions has
> different syntax - so only middle level code should be shared.

That makes sense.

I think it would be best if you separated out the TableExpr
infrastructure from the XMLTABLE implementation though, so we can
review the first level infrastrcture separately and make this a
2-patch series. Most importantly, doing it that way will help you find
places where TableExpr code calls directly into XMLTABLE code. If
TableExpr is supposed to be reusable for json etc, it probably
shouldn't be calling XmlTable stuff directly.

That also means somewhat smaller simpler patches, which probably isn't bad.

I don't necessarily think this needs to be fully pluggable with
callbacks etc. It doesn't sound like you expect this to be used by
extensions or to have a lot of users, right? So it probably just needs
clearer separation of the infrastructure layer from the xmltable
layer. I think splitting the patch will make that easier to see and
make it easier to find problems.

My biggest complaint at the moment is that execEvalTableExpr calls
initXmlTableContext(...) directly, is aware of XML namespaces
directly, calls XmlTableSetRowPath() directly, calls
XmlTableFetchRow() directly, etc. It is in no way generic/reusable for
some later JSONTABLE feature. That needs to be fixed by:

* Renaming it so it's clearly only for XMLTABLE; or
* Abstracting the init context, set row path, fetch row etc operations
so json ones can be plugged in later



> Currently the common part is not too big - just the Node related part - I am
> not sure about necessity of two patches.

The problem is that the common part is all mixed in with the
XMLTABLE-specific part, so it's not at all clear it can be common with
something else.

> I am agree, there is missing some
> TableExpBuilder, where can be better isolated the XML part.

Yeah, that's sort of what I'm getting at.


>> execEvalTableExpr seems to be defined twice, with a difference in
>> case. This is probably not going to fly:
>>
>>
>> +static Datum
>> +execEvalTableExpr(TableExprState *tstate,
>> +                        ExprContext *econtext,
>> +                        bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>> +static Datum
>> +ExecEvalTableExpr(TableExprState *tstate,
>> +                        ExprContext *econtext,
>> +                        bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>>
>> It looks like you've split the function into a "guts" and "wrapper"
>> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
>> That seems reasonable for readability, but the naming isn't.
>
>
> I invite any idea how these functions should be named.

Definitely not how they are ;) . They really can't differ in a single
character's case.

I'm not sure if PostgreSQL has any formal convention for this. Some
places use _impl e.g.  pg_read_barrier_impl() but that's in the
context of an interface-vs-implementation separation, which isn't the
case here.

Some places use _internal, like AlterObjectRename_internal(...), but
that's where there's an associated public/external part, which isn't
the case here.

Some places use _guts e.g. pg_logical_slot_get_changes_guts(...),
largely where there's common use by several callers.

This is a fairly arbitrary function split for readability/length. Is
it actually useful to split this function up at all?

Anyone else have an opinion?

>> A comment is needed to explain what ExecEvalTableExpr is / does. If
>> it's XMLTABLE specific (which it looks like based on the code), its
>> name should reflect that. This pattern is repeated elsewhere; e.g.
>> TableExprState is really the state for an XMLTABLE expression. But
>> PostgreSQL actually has TABLE statements, and in future we might want
>> to support table-expressions, so I don't think this naming is
>> appropriate. This is made worse by the lack of comments on things like
>> the definition of TableExprState. Please use something that makes it
>> clear it's for XMLTABLE and add appropriate comments.
>
>
> I understand, so using TableExpr can be strange (for XMLTABLE function). But
> when we will have JSON_TABLE function, then it will have a sense.

It's pretty hard to review that as shared infrastructure when it's
still tangled up in xmltable specifics, though.

> "TableExprState" is consistent with "TableExpr".
>
> Any idea how it should be changed?

I think if you want it to be shareable infrasructure, you need to
write it so it can be used as shared infrastructure. Not just name it
that way but then make it XMLTABLE specific in actual functionality.


>>                 /* when typmod is not valid, refresh it */
>>                 if (te->typmod == -1)
>>
>>
>> Is this a cache? How is it valid or not valid and when? The comment
>> (thanks!) on TableExprGetTupleDesc says:
>>
>> /*
>>  * When we skip transform stage (in view), then TableExpr's
>>  * TupleDesc should not be valid. Refresh is necessary.
>>  */
>>
>> but I'm not really grasping what you're trying to explain here. What
>> transform stage? What view? This could well be my ignorance of this
>> part of the code; if it should be understandable by a reader who is
>> appropriately familiar with the executor that's fine, but if it's
>> specific to how XMLTABLE works some more explanation would be good.
>
>
> This is most difficult part of this patch, and I am not sure it it is fully
> correctly implemented. I use TupleDesc cache. The TupleDesc is created in
> parser/transform stage. When the XMLTABLE is used in some view, then the
> transformed parser tree is materialized - and when the view is used in
> query, then this tree is loaded and the parser/transform stage is "skipped".
> I'll check this code against implementation of ROW constructor and I'll try
> to do more comments there.

Thanks. It would be good to highlight when this does and does not
happen and why. Why is it necessary at all?

What happens if XMLTABLE is used in a query directly, not part of a
view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared
statement / plpgsql statement / etc? What about a CTE term?

Not necessarily list all these cases one by one, just explain what
happens, when and why. Especially if it's complex, so other readers
can understand it and don't have to study it in detail to understand
what is going on. It does not need to be good public-facing
documentation, and details of wording, grammar etc can be fixed up
later, it's the ideas that matter.

Is this similar to other logic elsewhere? If so, reference that other
logic so readers know where to look. That way if they're
changing/bugfixing/etc one place they know there's another place that
might need changing.

I don't know this area of the code well enough to give a solid review
of the actual functionality, and I don't yet understand what it's
trying to do so it's hard to review it by studying what it actually
does vs what it claims to do. Maybe Peter E can help, he said he was
thinking of looking at this patch too. But more information on what
it's trying to do would be a big help.

>> PostgreSQL already has XPATH support in the form of xmlexists(...)
>> etc. Why is getXPathToken() etc needed? What re-use is possible here?
>> There's no explanation in the patch header or comments. Should the new
>> xpath parser be re-used by the existing xpath stuff? Why can't we use
>> libxml's facilities? etc. This at least needs explaining in the
>> submission, and some kind of hint as to why we have two different ways
>> to do it is needed in the code. If we do need a new XML parser, should
>> it be bundled in adt/xml.c along with a lot of user-facing
>> functionality, or a separate file?
>
> libxml2 and our XPATH function doesn't support default namespace (
> http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> implemented.

OK, that makes sense.

For the purpose of getting this patch in, is it a _necessary_ feature?
Can XMLTABLE be usefully implemented without it, and if so, can it be
added in a subsequent patch? It would be nice to simplify this by
using existing libxml2 functionality in the first version rather than
adding a whole new xpath as well!

This is not a xpath implementation - it is preprocessing of xpath expression. without it, a users have to set explicitly PATH clause with explicit prefix "./" and explicit suffix "/text()". The usability will be significantly lower, and what is worst - the examples from internet should not work. Although is is lot of lines, this code is necessary.
 

> This is the mayor issue of libxml2 library. Another difference
> between XPATH function and XMLTABLE function is using two phase searching
> and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
> two XPATH expressions - for row data cutting and next for column data
> cutting (from row data). The our XPATH functions is pretty simple mapped to
> libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> design of this function in standard (it is more user friendly and doesn't
> require exactly correct xpath expressions).

So you can't use existing libxml2 xpath support to implement XMLTABLE,
even without default namespaces?

> I didn't find any API in libxml2 for a work with parsed xpath expressions -
> I need some info about the first and last token of xpath expression - it is
> base for decision about using prefix or suffix.
>
> This functionality (xpath expression parser) cannot be used for our XPATH
> function now - maybe default namespace in future.

So we'll have two different XPATH implementations for different
places, with different limitations, different possible bugs, etc?

It is just preprocessing. The evaluation of xpath expression is part of libxml2 and it is shared.

Our XPATH function is not short, but the reason is reading namespaces data from 2D array. The evaluation of xpath expression is on few lines.
 

What would be needed to make the new XPATH work for our built-in xpath
functions too? 

>> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
>> this? It doesn't look like they're intended to be called directly by
>> the user, and they're not documented (or commented).
>
>
> Probably I used wrong names. XMLTABLE function is running in two different
> modes - with explicitly defined columns (XmlTableGetValue is used), and
> without explicitly defined columns - so result is one XML column and only
> one one step searching is used (there are not column related xpath
> expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
> is used for getting one column value, the function XmlTableGetRowValue is
> used for getting one value too, but in special case, when there are not any
> other value.

So both are internal implementation of the parser-level XMLTABLE(...)
construct and are not intended to be called directly by users - right?

No - it is called from executor - and it should not be called differently. I have to do better separation from executor, and these functions will be private.
 

Comments please! A short comment on the function saying this would be
a big help.

Regarding naming, do we already have a convention for functions that
are internal implementation of something the user "spells"
differently? Where it's transformed by the parser? I couldn't find
one. I don't much care about the names so long as there are comments
explaining what calls the functions and what the user-facing interface
that matches the function is.

Is it safe for users to call these directly? What happens if they do
so incorrectly?

Why are they not in pg_proc.h? Do they need to be?

>> +/*
>> + * There are different requests from XMLTABLE, JSON_TABLE functions
>> + * on passed data than has CREATE TABLE command. It is reason for
>> + * introduction special structure instead using ColumnDef.
>> + */
>> +typedef struct TableExprRawCol
>> +{
>> +    NodeTag     type;
>> +    char       *colname;
>> +    TypeName   *typeName;
>> +    bool        for_ordinality;
>> +    bool        is_not_null;
>> +    Node       *path_expr;
>> +    Node       *default_expr;
>> +    int         location;
>> +} TableExprRawCol;
>
> I am sorry. It is my fault. Now we have very similar node ColumnDef. This
> node is designed for usage in utility commands - and it is not designed for
> usage inside a query.

Makes sense.

> I had to decide between enhancing ColumnDef node or
> introduction new special node. Because there are more special attributes and
> it is hard to serialize current ColumnDef, I decided to use new node.

Seems reasonable. The summary is "this is the parse node for a column
of an XMLTABLE expression".

Suggested comment:

/*
 * This is the parsenode for a column definition in a table-expression
like XMLTABLE.
 *
 * We can't re-use ColumnDef here; the utility command column
definition has all the
 * wrong attributes for use in table-expressions and just doesn't make
sense here.
 */
typedef struct TableExprColumn
{
...
};

?

Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?

TableExprColumn is better

Regards

Pavel
 




--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:

Suggested comment:

/*
 * This is the parsenode for a column definition in a table-expression
like XMLTABLE.
 *
 * We can't re-use ColumnDef here; the utility command column
definition has all the
 * wrong attributes for use in table-expressions and just doesn't make
sense here.
 */
typedef struct TableExprColumn
{
...
};

?

Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?

My previous reply was wrong - it is used by parser only and holds TypeName field. The analogy with ColumnDef raw_default is perfect.

Regards

Pavel
 




--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 7 September 2016 at 14:44, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> Suggested comment:
>>
>> /*
>>  * This is the parsenode for a column definition in a table-expression
>> like XMLTABLE.
>>  *
>>  * We can't re-use ColumnDef here; the utility command column
>> definition has all the
>>  * wrong attributes for use in table-expressions and just doesn't make
>> sense here.
>>  */
>> typedef struct TableExprColumn
>> {
>> ...
>> };
>>
>> ?
>>
>> Why "RawCol" ? What does it become when it's not "raw" anymore? Is
>> that a reference to ColumnDef's raw_default and cooked_default for
>> untransformed vs transformed parse-trees?
>
>
> My previous reply was wrong - it is used by parser only and holds TypeName
> field. The analogy with ColumnDef raw_default is perfect.

Cool, lets just comment that then.

I'll wait on an updated patch per discussion to date. Hopefully
somebody else with more of a clue than me can offer better review of
the executor/view/caching part you specifically called out as complex.
Otherwise maybe it'll be clearer in a revised version.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi,

I am sending new version of this patch

1. now generic TableExpr is better separated from a real content generation
2. I removed cached typmod - using row type cache everywhere - it is consistent with other few places in Pg where dynamic types are used - the result tupdesc is generated few times more - but it is not on critical path.
3. More comments, few more lines in doc.
4. Reformated by pgindent

Regards

Pavel


Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi,

I am sending new version of this patch

1. now generic TableExpr is better separated from a real content generation
2. I removed cached typmod - using row type cache everywhere - it is consistent with other few places in Pg where dynamic types are used - the result tupdesc is generated few times more - but it is not on critical path.
3. More comments, few more lines in doc.
4. Reformated by pgindent

new update

more regress tests

Regards

Pavel
 

Regards

Pavel



Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 9 September 2016 at 21:44, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>> Hi,
>>
>> I am sending new version of this patch
>>
>> 1. now generic TableExpr is better separated from a real content
>> generation
>> 2. I removed cached typmod - using row type cache everywhere - it is
>> consistent with other few places in Pg where dynamic types are used - the
>> result tupdesc is generated few times more - but it is not on critical path.
>> 3. More comments, few more lines in doc.
>> 4. Reformated by pgindent

Thanks.

I applied this on top of the same base as your prior patch so I could
compare changes.

The new docs look good. Thanks for that, I know it's a pain. It'll
need to cover ORDINAL too, but that's not hard. I'll try to find some
time to help with the docs per the references you sent offlist.

Out of interest, should the syntax allow room for future expansion to
permit reading from file rather than just string literal / column
reference? It'd be ideal to avoid reading big documents wholly into
memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
suggest adding that to this patch, just making sure adding it later
would not cause problems.

I see you added a builder context abstraction as discussed, so there's
no longer any direct reference to XMLTABLE specifics from TableExpr
code. Good, thanks for that. It'll make things much less messy when
adding other table expression types as you expressed the desire to do,
and means the TableExpr code now makes more sense as generic
infrastructure.

ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or
better than execEvalTableExpr and ExecEvalTableExpr were anyway.
Eventual committer will probably have opinions here.

Mild nitpick: since you can have multiple namespaces, shouldn't
builder->SetNS be builder->AddNS ?

Added comments are helpful, thanks.

On first read-through this is a big improvement and addresses all the
concerns I raised. Documentation is much much better, thanks, I know
that's a pain.

I'll take a closer read-through shortly.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Craig Ringer
Дата:
> I'll take a closer read-through shortly.


Missing file. You omitted executor/tableexpr.h from the patch, so I
can't compile.

I've expanded and copy-edited the docs. Some of it is guesswork based
on the references you sent and a glance at the code. Please check my
changes carefully. I found a few surprises, like the fact that DEFAULT
isn't a normal literal, it's an xpath expression evaluated at the same
time as the rowexpression.

Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
that it's missing tableexpr.h. For convenient review or to apply to
your working tree I also attach a diff of just my docs changes as
proposed-docs-changes.diff.


Docs:

- Can you send the sample data used to generate the example output?
I'd like to include at least a cut down part of it in the docs to make
it clear how the input correlates with output, and preferably put the
whole thing in an appendix.

- How does it decide what subset of the document to iterate over?
That's presumably rowexpr, which is xpath in postgresql? (I added this
to docs).

- xmlnamespaces clause in docs needs an example for a non-default namespace.

- What effect does xmlnamespaces clause have? Does supplying it allow
you to reference qualified names in xpath? What happens if you don't
specify it for a document that has namespaces or don't define all the
namespaces? What if you reference an undefined namespace in xpath?
What about if an undefined namespace isn't referenced by xpath, but is
inside a node selected by an xpath expression?

- What are the rules for converting the matched XML node into a value?
If the matched node is not a simple text node or lacks a text node as
its single child, what happens?

- What happens if the matched has multiple text node children? This
can happen if, for example, you have something like

    <matchedNode>
      some text <!-- comment splits up text node --> other text
    </matchedNode>

- Is there a way to get an attribute as a value? If so, an example
should show this because it's going to be a common need. Presumably
you want   node/@attrname ?

- What happens if COLUMNS is not specified at all? It looks like it
returns a single column result set with the matched entries as 'xml'
type, so added to docs, please verify.


- PASSING clause isn't really defined. You can specify one PASSING
entry as a literal/colref/expression, and it's the argument xml
document, right? The external docs you referred to say that PASSING
may have a BY VALUE keyword, alias its argument with AS, and may have
expressions, e.g.

    PASSING BY VALUE '<x/>' AS a, '<y/>' AS b

  Neither aliases nor multiple entries are supported by the code or
grammar. Should this be documented as a restriction? Do you know if
that's an extension by the other implementation or if it's SQL/XML
standard? (I've drafted a docs update to cover this in the updated
patch).


- What does BY REF mean? Should this just be mentioned with a "see
xmlexists(...)" since it seems to be compatibility noise? Is there a
corresponding BY VALUE or similar?


- The parser definitions re-use xmlexists_argument . I don't mind
that, but it's worth noting here in case others do.


- Why do the expression arguments take c_expr (anything allowed in
a_expr or b_expr), not b_expr (restricted expression) ?


- Column definitions are underdocumented. The grammar says they can be
NOT NULL, for example, but I don't see that in any of the references
you mailed me nor in the docs. What behaviour is expected for a NOT
NULL column? I've documented my best guess (not checked closely
against the code, please verify).

-




Test suggestions:

- Coverage of multiple text() node children of an element, where split
up by comment or similar

- Coverage of xpath that matches a node with child element nodes



More to come. Please review my docs changes in the mean time. I'm
spending a lot more time on this than I expected so I might have to
get onto other things for a while too.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll take a closer read-through shortly.

>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.

Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.

Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.

Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-12 6:36 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll take a closer read-through shortly.

>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.

Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.

It  is not spam. The previous comment was not correct. DEFAULT is a expression - result of this expression is used, when data is missing.

In standard, and some others implementation, this is literal only. It is similar to DEFAULT clause in CREATE STATEMENT. Postgres allows expression there. Usually Postgres allows expressions everywhere when it has sense, and when it is allowed by bizon parser.

 

Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.

Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-12 3:58 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 9 September 2016 at 21:44, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>> Hi,
>>
>> I am sending new version of this patch
>>
>> 1. now generic TableExpr is better separated from a real content
>> generation
>> 2. I removed cached typmod - using row type cache everywhere - it is
>> consistent with other few places in Pg where dynamic types are used - the
>> result tupdesc is generated few times more - but it is not on critical path.
>> 3. More comments, few more lines in doc.
>> 4. Reformated by pgindent

Thanks.

I applied this on top of the same base as your prior patch so I could
compare changes.

The new docs look good. Thanks for that, I know it's a pain. It'll
need to cover ORDINAL too, but that's not hard. I'll try to find some
time to help with the docs per the references you sent offlist.

Out of interest, should the syntax allow room for future expansion to
permit reading from file rather than just string literal / column
reference? It'd be ideal to avoid reading big documents wholly into
memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
suggest adding that to this patch, just making sure adding it later
would not cause problems.

this is little bit different question - it is server side function, so first question is - how to push usually client side content to server? Next question is how to get this content to a executor. Now only COPY statement is able to do.

I am thinking so this should not be problem, but it requires maybe some special keywords - fileref, local fileref, and some changes in protocol. Because this function has own implementation in parser/transform stage, then nothing will be lost in process, and we can implement lazy parameter evaluation. Another question if libxml2 has enough possibility to work with stream.

One idea - we can introduce "external (server side|client side) blobs" with special types and special streaming IO. With these types, there no changes are necessary on syntax level. With this, the syntax sugar flag "BY REF" can be useful.
 

I see you added a builder context abstraction as discussed, so there's
no longer any direct reference to XMLTABLE specifics from TableExpr
code. Good, thanks for that. It'll make things much less messy when
adding other table expression types as you expressed the desire to do,
and means the TableExpr code now makes more sense as generic
infrastructure.

ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or
better than execEvalTableExpr and ExecEvalTableExpr were anyway.
Eventual committer will probably have opinions here.

Mild nitpick: since you can have multiple namespaces, shouldn't
builder->SetNS be builder->AddNS ?

good idea.
 

Added comments are helpful, thanks.

On first read-through this is a big improvement and addresses all the
concerns I raised. Documentation is much much better, thanks, I know
that's a pain.

I'll take a closer read-through shortly.


updated patch attached - with your documentation.

Regards

Pavel
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-12 6:28 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
> I'll take a closer read-through shortly.


Missing file. You omitted executor/tableexpr.h from the patch, so I
can't compile.

I've expanded and copy-edited the docs. Some of it is guesswork based
on the references you sent and a glance at the code. Please check my
changes carefully. I found a few surprises, like the fact that DEFAULT
isn't a normal literal, it's an xpath expression evaluated at the same
time as the rowexpression.

Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
that it's missing tableexpr.h. For convenient review or to apply to
your working tree I also attach a diff of just my docs changes as
proposed-docs-changes.diff.


Docs:

- Can you send the sample data used to generate the example output?
I'd like to include at least a cut down part of it in the docs to make
it clear how the input correlates with output, and preferably put the
whole thing in an appendix.

it is in regress tests.
 

- How does it decide what subset of the document to iterate over?
That's presumably rowexpr, which is xpath in postgresql? (I added this
to docs).

- xmlnamespaces clause in docs needs an example for a non-default namespace.

- What effect does xmlnamespaces clause have? Does supplying it allow
you to reference qualified names in xpath? What happens if you don't
specify it for a document that has namespaces or don't define all the
namespaces? What if you reference an undefined namespace in xpath?
What about if an undefined namespace isn't referenced by xpath, but is
inside a node selected by an xpath expression?

All this is under libxml2 control - when you use undefined namespace, then libxml2 raises a error. The namespaces in document and in XPath queries are absolutely independent - the relation is a URI. When you use bad URI (referenced by name), then the result will be empty set. When you use undefined name, then you will get a error.
 

- What are the rules for converting the matched XML node into a value?
If the matched node is not a simple text node or lacks a text node as
its single child, what happens?

This process is described and controlled by "XML SQL mapping". The Postgres has minimalistic implementation without possibility of external control and without schema support. The my implementation is simple. When user doesn't specify result target like explicit using of text() function, then the text() function is used implicitly when target type is not XML. Then I dump result to string and I enforce related input functions for target types.

 

- What happens if the matched has multiple text node children? This
can happen if, for example, you have something like

    <matchedNode>
      some text <!-- comment splits up text node --> other text
    </matchedNode>

depends on target type - it is allowed in XML, and it is disallowed for other types. I though about support of a arrays - but the patch will be much more complex - there can be recursion - so I disallowed it. When the user have to solve this issue, then he can use nested XMLTABLE functions and nested function is working with XML type.

Just for record - This issue is solved in JSON_TABLE functions - it allows nested PATHS. But XMLTABLE doesn't allow it.
 

- Is there a way to get an attribute as a value? If so, an example
should show this because it's going to be a common need. Presumably
you want   node/@attrname ?

you can use reference to current node "." - so "./@attname" should to work - a example is in regress tests

 

- What happens if COLUMNS is not specified at all? It looks like it
returns a single column result set with the matched entries as 'xml'
type, so added to docs, please verify.

sure, that is it
 


- PASSING clause isn't really defined. You can specify one PASSING
entry as a literal/colref/expression, and it's the argument xml
document, right? The external docs you referred to say that PASSING
may have a BY VALUE keyword, alias its argument with AS, and may have
expressions, e.g.

    PASSING BY VALUE '<x/>' AS a, '<y/>' AS b

  Neither aliases nor multiple entries are supported by the code or
grammar. Should this be documented as a restriction? Do you know if
that's an extension by the other implementation or if it's SQL/XML
standard? (I've drafted a docs update to cover this in the updated
patch).

The ANSI allows to pass more documents - and then do complex queries with XQuery. Passing more than one document has not sense in libxml2 based implementation, so I didn't supported it. The referenced names can be implemented later - but it needs to changes in XPATH function too.
 


- What does BY REF mean? Should this just be mentioned with a "see
xmlexists(...)" since it seems to be compatibility noise? Is there a
corresponding BY VALUE or similar?

When the XML document is stored as serialized DOM, then by ref means link on this DOM. It has not sense in Postgres - because we store XML documents by value only.
 


- The parser definitions re-use xmlexists_argument . I don't mind
that, but it's worth noting here in case others do.

It is one clause - see SQL/XML doc PASSING <XML table argument passing mechanism>



- Why do the expression arguments take c_expr (anything allowed in
a_expr or b_expr), not b_expr (restricted expression) ?

I don't know - I expect the problems with parser - because PASSING is restricted keyword in ANSI/SQL and unreserved keyword in Postgres.
 


- Column definitions are underdocumented. The grammar says they can be
NOT NULL, for example, but I don't see that in any of the references
you mailed me nor in the docs. What behaviour is expected for a NOT
NULL column? I've documented my best guess (not checked closely
against the code, please verify).


yes - some other databases allows it - I am thinking so it is useful.
 
-




Test suggestions:

- Coverage of multiple text() node children of an element, where split
up by comment or similar

- Coverage of xpath that matches a node with child element nodes

I'll do it.
 



More to come. Please review my docs changes in the mean time. I'm
spending a lot more time on this than I expected so I might have to
get onto other things for a while too.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-12 6:36 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll take a closer read-through shortly.

>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.

Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.

Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.

Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.

It is there for case, when this is allowed. When you change the  target type to any non XML type, then a error is raised.

I didn't write a negative test cases until the text of messages will be final (or checked by native speaker).

Regards

Pavel
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.

Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.

The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.

> updated patch attached - with your documentation.

Will take a look and a better read of the code. Likely tomorrow, I've
got work to do as well.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

There is some opened questions - the standard (and some other databases) requires entering XPath expression as string literal.

I am thinking so it is too strong not necessary limit - (it enforces dynamic query in more cases), so I allowed the expressions there.

Another questions is when these expressions should be evaluated. There are two possibilities - once per query, once per input row. I selected "once per input row mode" - it is simpler to implement it, and it is consistent with other "similar" generators - see the behave and related discussion to "array_to_string" and evaluation of separator argument. The switch to "once per query" should not be hard - but it can be strange for users, because some his volatile expression should be stable.

Regards

Pavel

 

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 12 September 2016 at 14:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> There is some opened questions - the standard (and some other databases)
> requires entering XPath expression as string literal.
>
> I am thinking so it is too strong not necessary limit - (it enforces dynamic
> query in more cases), so I allowed the expressions there.

I agree. There's no reason not to permit expressions there, and there
are many other places where we have similar extensions.

> Another questions is when these expressions should be evaluated. There are
> two possibilities - once per query, once per input row. I selected "once per
> input row mode" - it is simpler to implement it, and it is consistent with
> other "similar" generators - see the behave and related discussion to
> "array_to_string" and evaluation of separator argument. The switch to "once
> per query" should not be hard - but it can be strange for users, because
> some his volatile expression should be stable.

I would've expected once per query. There's no way the expressions can
reference the row data, so there's no reason to evaluate them each
time.

The only use case I see for evaluating them each time is - maybe -
DEFAULT. Where maybe there's a use for nextval() or other volatile
functions. But honestly, I think that's better done explicitly in a
post-pass, i.e.

select uuid_generate_v4(), x.*
from ( xmltable(.....) x
);

in cases where that's what the user actually wants.

There's no other case I can think of where expressions as arguments to
set-returning functions are evaluated once per output row.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Craig Ringer
Дата:
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.

Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.

The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.

> updated patch attached - with your documentation.


>> - Can you send the sample data used to generate the example output?
>> I'd like to include at least a cut down part of it in the docs to make
>> it clear how the input correlates with output, and preferably put the
>> whole thing in an appendix.
>
>
> it is in regress tests.

Makes sense.

It's not that verbose (for XML) and I wonder if it's just worth
including it in-line in the docs along with the XMLTABLE example. It'd
be much easier to understand how XMLTABLE works and what it does then.

>> - What effect does xmlnamespaces clause have? Does supplying it allow
>> you to reference qualified names in xpath? What happens if you don't
>> specify it for a document that has namespaces or don't define all the
>> namespaces? What if you reference an undefined namespace in xpath?
>> What about if an undefined namespace isn't referenced by xpath, but is
>> inside a node selected by an xpath expression?
>
>
> All this is under libxml2 control - when you use undefined namespace, then
> libxml2 raises a error. The namespaces in document and in XPath queries are
> absolutely independent - the relation is a URI. When you use bad URI
> (referenced by name), then the result will be empty set. When you use
> undefined name, then you will get a error.

OK, makes sense.

>> - What are the rules for converting the matched XML node into a value?
>> If the matched node is not a simple text node or lacks a text node as
>> its single child, what happens?
>
> This process is described and controlled by "XML SQL mapping". The Postgres
> has minimalistic implementation without possibility of external control and
> without schema support. The my implementation is simple. When user doesn't
> specify result target like explicit using of text() function, then the
> text() function is used implicitly when target type is not XML. Then I dump
> result to string and I enforce related input functions for target types.

OK, so a subset of the full spec functionality is provided because of
limitations in Pg and libxml2. Makes sense.

My only big concern here is that use of text() is a common mistake in
XSLT, and I think the same thing will happen here. Users expect
comments to be ignored, but in fact a comment inserts a comment node
into the XML DOM, so a comment between two pieces of text produces a
tree of

element text() comment() text()


If you match element/text(), you get a 2-node result and will presumably ERROR.

There is no good way to tell this from

element text()
element text()

when you use an xpath expression like element/text() . So you can't
safely solve it just by concatenating all resulting text() nodes
without surprising behaviour.

>> - What happens if the matched has multiple text node children? This
>> can happen if, for example, you have something like
>>
>>     <matchedNode>
>>       some text <!-- comment splits up text node --> other text
>>     </matchedNode>
>
>
> depends on target type - it is allowed in XML, and it is disallowed for
> other types. I though about support of a arrays - but the patch will be much
> more complex - there can be recursion - so I disallowed it. When the user
> have to solve this issue, then he can use nested XMLTABLE functions and
> nested function is working with XML type.

I don't really understand how that'd work.

Do you know how other implementations handle this?

I think users are going to be VERY surprised when comments in text
break their XML.

>> - Is there a way to get an attribute as a value? If so, an example
>> should show this because it's going to be a common need. Presumably
>> you want   node/@attrname ?
>
>
> you can use reference to current node "." - so "./@attname" should to work -
> a example is in regress tests

cool, just needs mention in docs then.

>> - PASSING clause isn't really defined. You can specify one PASSING
>> entry as a literal/colref/expression, and it's the argument xml
>> document, right? The external docs you referred to say that PASSING
>> may have a BY VALUE keyword, alias its argument with AS, and may have
>> expressions, e.g.
>>
>>     PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>>
>>   Neither aliases nor multiple entries are supported by the code or
>> grammar. Should this be documented as a restriction?
>
>
> The ANSI allows to pass more documents - and then do complex queries with
> XQuery. Passing more than one document has not sense in libxml2 based
> implementation, so I didn't supported it. The referenced names can be
> implemented later - but it needs to changes in XPATH function too.

OK, so my docs addition that just says they're not supported should be fine.

>> - What does BY REF mean? Should this just be mentioned with a "see
>> xmlexists(...)" since it seems to be compatibility noise? Is there a
>> corresponding BY VALUE or similar?
>
> When the XML document is stored as serialized DOM, then by ref means link on
> this DOM. It has not sense in Postgres - because we store XML documents by
> value only.

Right. And since there's already precent for xmlexists there's no
point worrying about whether we lose opportunities to implement it
later.

>> - Why do the expression arguments take c_expr (anything allowed in
>> a_expr or b_expr), not b_expr (restricted expression) ?
>
>
> I don't know - I expect the problems with parser - because PASSING is
> restricted keyword in ANSI/SQL and unreserved keyword in Postgres.

I mean for the rowpath argument, not the parts within
xmlexists_argument. If the rowpath doesn't need to be c_expr
presumably it should be a b_expr or even, if it doesn't cause parsing
ambiguities, an a_expr ? There doesn't seem to be the same issue here
as we have with BETWEEN etc.

>> - Column definitions are underdocumented. The grammar says they can be
>> NOT NULL, for example, but I don't see that in any of the references
>> you mailed me nor in the docs. What behaviour is expected for a NOT
>> NULL column? I've documented my best guess (not checked closely
>> against the code, please verify).
>>
>
> yes - some other databases allows it - I am thinking so it is useful.

Sure. Sounds like my docs additions are probably right then, except
for incorrect description of DEFAULT.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-12 8:46 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.

Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.

The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.

> updated patch attached - with your documentation.


>> - Can you send the sample data used to generate the example output?
>> I'd like to include at least a cut down part of it in the docs to make
>> it clear how the input correlates with output, and preferably put the
>> whole thing in an appendix.
>
>
> it is in regress tests.

Makes sense.

It's not that verbose (for XML) and I wonder if it's just worth
including it in-line in the docs along with the XMLTABLE example. It'd
be much easier to understand how XMLTABLE works and what it does then.

>> - What effect does xmlnamespaces clause have? Does supplying it allow
>> you to reference qualified names in xpath? What happens if you don't
>> specify it for a document that has namespaces or don't define all the
>> namespaces? What if you reference an undefined namespace in xpath?
>> What about if an undefined namespace isn't referenced by xpath, but is
>> inside a node selected by an xpath expression?
>
>
> All this is under libxml2 control - when you use undefined namespace, then
> libxml2 raises a error. The namespaces in document and in XPath queries are
> absolutely independent - the relation is a URI. When you use bad URI
> (referenced by name), then the result will be empty set. When you use
> undefined name, then you will get a error.

OK, makes sense.

>> - What are the rules for converting the matched XML node into a value?
>> If the matched node is not a simple text node or lacks a text node as
>> its single child, what happens?
>
> This process is described and controlled by "XML SQL mapping". The Postgres
> has minimalistic implementation without possibility of external control and
> without schema support. The my implementation is simple. When user doesn't
> specify result target like explicit using of text() function, then the
> text() function is used implicitly when target type is not XML. Then I dump
> result to string and I enforce related input functions for target types.

OK, so a subset of the full spec functionality is provided because of
limitations in Pg and libxml2. Makes sense.

My only big concern here is that use of text() is a common mistake in
XSLT, and I think the same thing will happen here. Users expect
comments to be ignored, but in fact a comment inserts a comment node
into the XML DOM, so a comment between two pieces of text produces a
tree of

element
  text()
  comment()
  text()


If you match element/text(), you get a 2-node result and will presumably ERROR.

There is no good way to tell this from

element
  text()
element
  text()

when you use an xpath expression like element/text() . So you can't
safely solve it just by concatenating all resulting text() nodes
without surprising behaviour.

>> - What happens if the matched has multiple text node children? This
>> can happen if, for example, you have something like
>>
>>     <matchedNode>
>>       some text <!-- comment splits up text node --> other text
>>     </matchedNode>
>

I fixed this case - new regress tests added
 
>
> depends on target type - it is allowed in XML, and it is disallowed for
> other types. I though about support of a arrays - but the patch will be much
> more complex - there can be recursion - so I disallowed it. When the user
> have to solve this issue, then he can use nested XMLTABLE functions and
> nested function is working with XML type.

I don't really understand how that'd work.

Do you know how other implementations handle this?

I think users are going to be VERY surprised when comments in text
break their XML.

>> - Is there a way to get an attribute as a value? If so, an example
>> should show this because it's going to be a common need. Presumably
>> you want   node/@attrname ?
>
>
> you can use reference to current node "." - so "./@attname" should to work -
> a example is in regress tests

cool, just needs mention in docs then.

>> - PASSING clause isn't really defined. You can specify one PASSING
>> entry as a literal/colref/expression, and it's the argument xml
>> document, right? The external docs you referred to say that PASSING
>> may have a BY VALUE keyword, alias its argument with AS, and may have
>> expressions, e.g.
>>
>>     PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>>
>>   Neither aliases nor multiple entries are supported by the code or
>> grammar. Should this be documented as a restriction?
>
>
> The ANSI allows to pass more documents - and then do complex queries with
> XQuery. Passing more than one document has not sense in libxml2 based
> implementation, so I didn't supported it. The referenced names can be
> implemented later - but it needs to changes in XPATH function too.

OK, so my docs addition that just says they're not supported should be fine.

>> - What does BY REF mean? Should this just be mentioned with a "see
>> xmlexists(...)" since it seems to be compatibility noise? Is there a
>> corresponding BY VALUE or similar?
>
> When the XML document is stored as serialized DOM, then by ref means link on
> this DOM. It has not sense in Postgres - because we store XML documents by
> value only.

Right. And since there's already precent for xmlexists there's no
point worrying about whether we lose opportunities to implement it
later.

>> - Why do the expression arguments take c_expr (anything allowed in
>> a_expr or b_expr), not b_expr (restricted expression) ?
>
>
> I don't know - I expect the problems with parser - because PASSING is
> restricted keyword in ANSI/SQL and unreserved keyword in Postgres.

I mean for the rowpath argument, not the parts within
xmlexists_argument. If the rowpath doesn't need to be c_expr
presumably it should be a b_expr or even, if it doesn't cause parsing
ambiguities, an a_expr ? There doesn't seem to be the same issue here
as we have with BETWEEN etc.

b_expr enforces shift/reduce conflict :(
 

>> - Column definitions are underdocumented. The grammar says they can be
>> NOT NULL, for example, but I don't see that in any of the references
>> you mailed me nor in the docs. What behaviour is expected for a NOT
>> NULL column? I've documented my best guess (not checked closely
>> against the code, please verify).
>>
>
> yes - some other databases allows it - I am thinking so it is useful.

Sure. Sounds like my docs additions are probably right then, except
for incorrect description of DEFAULT.

I found other opened question - how we can translate empty tag to SQL value? The Oracle should not to solve this question, but PostgreSQL does. Some databases returns empty string.

I prefer return a empty string - not null in this case. The reason is simple - Empty string is some information - and NULL is less information. When it is necessary I can transform empty string to NULL - different direction is not unique.

Regards

Pavel
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 15 September 2016 at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> b_expr enforces shift/reduce conflict :(

No problem then. I just thought it'd be worth allowing more if it
worked to do so.

> I found other opened question - how we can translate empty tag to SQL value?
> The Oracle should not to solve this question, but PostgreSQL does. Some
> databases returns empty string.

Oracle doesn't solve the problem? it ERRORs?

> I prefer return a empty string - not null in this case.

I agree, and that's consistent with how most XML is interpreted. XSLT
for example considers <x></x> and <x/> to be pretty much the same
thing.

>  The reason is simple
> - Empty string is some information - and NULL is less information. When it
> is necessary I can transform empty string to NULL - different direction is
> not unique.

Yep, I definitely agree. The only issue is if people want a DEFAULT to
be applied for empty tags. But that's something they can do in a
post-process pass easily enough, since XMLTABLE is callable as a
subquery / WITH expression / etc.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Peter Eisentraut
Дата:
On 9/12/16 1:14 AM, Craig Ringer wrote:
> I would've expected once per query. There's no way the expressions can
> reference the row data, so there's no reason to evaluate them each
> time.
> 
> The only use case I see for evaluating them each time is - maybe -
> DEFAULT. Where maybe there's a use for nextval() or other volatile
> functions. But honestly, I think that's better done explicitly in a
> post-pass, i.e.
> 
> select uuid_generate_v4(), x.*
> from (
>   xmltable(.....) x
> );
> 
> in cases where that's what the user actually wants.
> 
> There's no other case I can think of where expressions as arguments to
> set-returning functions are evaluated once per output row.

The SQL standard appears to show what the behavior ought to be:

<XML table> is equivalent to

LATERAL ( XNDC
SELECT SLI1 AS CN1, SLI2 AS CN2, ..., SLINC AS CNNC FROM XMLITERATE (
XMLQUERY ( XTRP XQAL                        RETURNING SEQUENCE BY REF EMPTY ON EMPTY ) )           AS I ( V, N )
) AS CORR DCLP

and SLIj is

CASE WHEN XEj
THEN XMLCAST( XQCj AS DTj CPMj )
ELSE DEFj END

where DEFj is the default expression.

So simplified it is

LATERAL ( SELECT CASE WHEN ... ELSE DEFj END, ... FROM something )

which indicates that the default expression is evaluated for every row.

If we're not sure about all this, it might be worth restricting the
default expressions to stable or immutable expressions for the time being.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-16 1:44 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 15 September 2016 at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> b_expr enforces shift/reduce conflict :(

No problem then. I just thought it'd be worth allowing more if it
worked to do so.

> I found other opened question - how we can translate empty tag to SQL value?
> The Oracle should not to solve this question, but PostgreSQL does. Some
> databases returns empty string.

Oracle doesn't solve the problem? it ERRORs?

Oracle returns NULL. But there are not any difference between NULL and empty string

Regards

Pavel
 

> I prefer return a empty string - not null in this case.

I agree, and that's consistent with how most XML is interpreted. XSLT
for example considers <x></x> and <x/> to be pretty much the same
thing.

>  The reason is simple
> - Empty string is some information - and NULL is less information. When it
> is necessary I can transform empty string to NULL - different direction is
> not unique.

Yep, I definitely agree. The only issue is if people want a DEFAULT to
be applied for empty tags. But that's something they can do in a
post-process pass easily enough, since XMLTABLE is callable as a
subquery / WITH expression / etc.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

new update:

* doc is moved to better place - xml processing functions
* few more regress tests
* call forgotten check_srf_call_placement

Regards

Pavel
Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-18 11:53 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

new update:

* doc is moved to better place - xml processing functions
* few more regress tests
* call forgotten check_srf_call_placement

another small update - fix XMLPath parser - support multibytes characters

Regards

Pavel
 

Regards

Pavel

Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> another small update - fix XMLPath parser - support multibytes characters

I'm returning for another round of review.

The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes ' and " but not &, < or
> . See added test. I have not fixed this, but I think it's clearly
broken:


+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING

'<x><a><ent>'</ent></a><a><ent>"</ent></a><a><ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>'
COLUMNS ent text);
+   ent
+ -------
+  '
+  "
+  &
+  <
+  >
+ (5 rows)

so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.


' and " entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:

+ SELECT * FROM xmltable('/x/a' PASSING

'<x><a><ent>'</ent></a><a><ent>"</ent></a><a><ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>'
COLUMNS ent xml);
+        ent
+ ------------------
+  <ent>'</ent>
+  <ent>"</ent>
+  <ent>&</ent>
+  <ent><</ent>
+  <ent>></ent>
+ (5 rows)


For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.

The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.


Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.

Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.

The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities " & ' < and > at least.

I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.

It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):

SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [ <!ELEMENT foo (#PCDATA)> <!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);

+ ERROR:  invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+                                            ^
+ DETAIL:  line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+  ^
+ line 3: StartTag: invalid element name
+   <!ELEMENT foo (#PCDATA)>
+    ^
+ line 4: StartTag: invalid element name
+   <!ENTITY pg "PostgreSQL">
+    ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+                ^


libxml seems to support documents with internal DTDs:

$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>


so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.


OK, code comments:


In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?


I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.


There's an oversight in tableexpr vs xmltable separation here:

+        case T_TableExpr:
+            *name = "xmltable";
+            return 2;

presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".

Same problem here:

+        case T_TableExpr:
+            {
+                TableExpr  *te = (TableExpr *) node;
+
+                /* c_expr shoud be closed in brackets */
+                appendStringInfoString(buf, "XMLTABLE(");



I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.

I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Craig Ringer
Дата:
> Did some docs copy-editing and integrated some examples.

Whoops, forgot to attach.

Rather than sending a whole new copy of the patch, here's a diff
against your patched tree of my changes so you can see what I've done
and apply the parts you want.

Note that I didn't updated the expected files.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-23 10:05 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> another small update - fix XMLPath parser - support multibytes characters

I'm returning for another round of review.

The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes &apos; and &quot; but not &amp, &lt or
&gt; . See added test. I have not fixed this, but I think it's clearly
broken:


+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent text);
+   ent
+ -------
+  '
+  "
+  &amp;
+  &lt;
+  &gt;
+ (5 rows)

so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.


&apos; and &quot; entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:

+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent xml);
+        ent
+ ------------------
+  <ent>'</ent>
+  <ent>"</ent>
+  <ent>&amp;</ent>
+  <ent>&lt;</ent>
+  <ent>&gt;</ent>
+ (5 rows)


For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.

The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.


Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.

Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.

The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities &quot; &amp; &apos; &lt; and &gt; at least.

I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.

It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):

SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [
  <!ELEMENT foo (#PCDATA)>
  <!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);

+ ERROR:  invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+                                            ^
+ DETAIL:  line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+  ^
+ line 3: StartTag: invalid element name
+   <!ELEMENT foo (#PCDATA)>
+    ^
+ line 4: StartTag: invalid element name
+   <!ENTITY pg "PostgreSQL">
+    ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+                ^


libxml seems to support documents with internal DTDs:

$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>


so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.


OK, code comments:


In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?


I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.


There's an oversight in tableexpr vs xmltable separation here:

+        case T_TableExpr:
+            *name = "xmltable";
+            return 2;

presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".

Same problem here:

+        case T_TableExpr:
+            {
+                TableExpr  *te = (TableExpr *) node;
+
+                /* c_expr shoud be closed in brackets */
+                appendStringInfoString(buf, "XMLTABLE(");


This is correct, but not well commented - looks on XMLEXPR node - TableExpr is a holder, but it is invisible for user. User running a XMLTABLE function and should to see XMLTABLE. It will be more clean when we will support JSON_TABLE function.
 


I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.

I'll try move it to separate file
 

I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.

Thank you. I hope so all major issues are solved. Probably some XML specific related issues are there - but I am happy, so you have well XML knowledge and you will test a corner cases.

Regards

Pavel

 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-09-23 10:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
> Did some docs copy-editing and integrated some examples.

Whoops, forgot to attach.

Rather than sending a whole new copy of the patch, here's a diff
against your patched tree of my changes so you can see what I've done
and apply the parts you want.

Note that I didn't updated the expected files.

I applied your patch - there is small misunderstanding. The PATH is evaluated once for input row already. It is not clean in code, because it is executor node started and running for all rows. I changed it in your part of doc.

   to a simple value before calling the function. <literal>PATH</>
+      expressions are normally evaluated <emphasis>exactly once per result row ## per input row
+      </emphasis>,

Regards

Pavel
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-09-23 10:05 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> another small update - fix XMLPath parser - support multibytes characters

I'm returning for another round of review.

The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes &apos; and &quot; but not &amp, &lt or
&gt; . See added test. I have not fixed this, but I think it's clearly
broken:


+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent text);
+   ent
+ -------
+  '
+  "
+  &amp;
+  &lt;
+  &gt;
+ (5 rows)

so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.


fixed
 

&apos; and &quot; entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:

+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent xml);
+        ent
+ ------------------
+  <ent>'</ent>
+  <ent>"</ent>
+  <ent>&amp;</ent>
+  <ent>&lt;</ent>
+  <ent>&gt;</ent>
+ (5 rows)


For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.

The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.


appended 


Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.

I don't understand to this sentence: "It is possible for a PATH expression to reference output columns that appear before it in the column-list, so paths may be dynamically constructed based on other parts of the XML document:"


Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.

Probably it is possible - it is exactly how you wrote - it needs to check the change. We can try do some possible performance optimizations later - without compatibility issues. Now, I prefer the most simple code.

a note: PATH expression is evaluated for any **input** row. In same moment is processed row path expression and man XML document DOM parsing. So overhead of PATH expression and PATH parsing should not be dominant.
 

The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities &quot; &amp; &apos; &lt; and &gt; at least.

I don't understand, what you are propose here. ?? Please, can you send some examples.

I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.

It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):

SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [
  <!ELEMENT foo (#PCDATA)>
  <!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);

+ ERROR:  invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+                                            ^
+ DETAIL:  line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+  ^
+ line 3: StartTag: invalid element name
+   <!ELEMENT foo (#PCDATA)>
+    ^
+ line 4: StartTag: invalid element name
+   <!ENTITY pg "PostgreSQL">
+    ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+                ^


It is rejected before XMLTABLE function call

postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
postgres$# <!DOCTYPE foo [
postgres$#   <!ELEMENT foo (#PCDATA)>
postgres$#   <!ENTITY pg "PostgreSQL">
postgres$# ]>
postgres$# <foo>Hello &pg;.</foo>
postgres$# $XML$::xml;
ERROR:  invalid XML content
LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
               ^
DETAIL:  line 2: StartTag: invalid element name
<!DOCTYPE foo [
 ^
line 3: StartTag: invalid element name
  <!ELEMENT foo (#PCDATA)>
   ^
line 4: StartTag: invalid element name
  <!ENTITY pg "PostgreSQL">
   ^
line 6: Entity 'pg' not defined
<foo>Hello &pg;.</foo>
               ^
It is disabled by default in libxml2. I found a function xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault

The default behave should be common for all PostgreSQL's libxml2 based function - and then it is different topic - maybe part for PostgreSQL ToDo? But I don't remember any user requests related to this issue.

 

libxml seems to support documents with internal DTDs:

$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>


I removed this tests - it is not related to XMLTABLE function, but to generic XML processing/validation.
 


so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.


OK, code comments:


In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?

If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you want to catch FATAL errors (and when you want to clean shared memory).  XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.



I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.


There's an oversight in tableexpr vs xmltable separation here:

+        case T_TableExpr:
+            *name = "xmltable";
+            return 2;

presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".

Same problem here:

+        case T_TableExpr:
+            {
+                TableExpr  *te = (TableExpr *) node;
+
+                /* c_expr shoud be closed in brackets */
+                appendStringInfoString(buf, "XMLTABLE(");


commented
 


I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.

moved
 

I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.

new version is attached

Regards

Pavel
 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"



>> The docs and tests don't seem to cover XML entities. What's the
>> behaviour there? Core XML only defines one entity, but if a schema
>> defines more how are they processed? The tests need to cover the
>> predefined entities " & ' < and > at least.
>
>
> I don't understand, what you are propose here. ?? Please, can you send some
> examples.

Per below - handling of DTD <!ENTITY> declarations, and the builtin
entity tests I already added tests for.


>> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>>
>> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
>> standalone="yes" ?>
>> <!DOCTYPE foo [
>>   <!ELEMENT foo (#PCDATA)>
>>   <!ENTITY pg "PostgreSQL">
>> ]>
>> <foo>Hello &pg;.</foo>
>> $XML$ COLUMNS foo text);
>>
>> + ERROR:  invalid XML content
>> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
>> +                                            ^
>> + DETAIL:  line 2: StartTag: invalid element name
>> + <!DOCTYPE foo [
>> +  ^
>> + line 3: StartTag: invalid element name
>> +   <!ELEMENT foo (#PCDATA)>
>> +    ^
>> + line 4: StartTag: invalid element name
>> +   <!ENTITY pg "PostgreSQL">
>> +    ^
>> + line 6: Entity 'pg' not defined
>> + <foo>Hello &pg;.</foo>
>> +                ^
>>
>
> It is rejected before XMLTABLE function call
>
> postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> postgres$# <!DOCTYPE foo [
> postgres$#   <!ELEMENT foo (#PCDATA)>
> postgres$#   <!ENTITY pg "PostgreSQL">
> postgres$# ]>
> postgres$# <foo>Hello &pg;.</foo>
> postgres$# $XML$::xml;
> ERROR:  invalid XML content
> LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
>                ^
> DETAIL:  line 2: StartTag: invalid element name
> <!DOCTYPE foo [
[snip]

> It is disabled by default in libxml2. I found a function
> xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault
>
> The default behave should be common for all PostgreSQL's libxml2 based
> function - and then it is different topic - maybe part for PostgreSQL ToDo?
> But I don't remember any user requests related to this issue.


OK, so it's not xmltable specific. Fine by me.

Somebody who cares can deal with it. There's clearly nobody breaking
down the walls wanting the feature.

> I removed this tests - it is not related to XMLTABLE function, but to
> generic XML processing/validation.


Good plan.

>> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
>> instead of a PG_TRY() / PG_CATCH() block?
>
>
> If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you
> want to catch FATAL errors (and when you want to clean shared memory).
> XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.

Ok, makes sense.


>> I don't have the libxml knowledge or remaining brain to usefully
>> evaluate the xpath and xml specifics in xpath.c today. It does strike
>> me that the new xpath parser should probably live in its own file,
>> though.
>
> moved

Thanks.


> new version is attached


Great.

I'm marking this ready for committer at this point.

I think the XML parser likely needs a more close reading, so I'll ping
Peter E to see if he'll have a chance to check that bit out. But by
and large I think the issues have been ironed out - in terms of
functionality, structure and clarity I think it's looking solid.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-27 3:34 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"



>> The docs and tests don't seem to cover XML entities. What's the
>> behaviour there? Core XML only defines one entity, but if a schema
>> defines more how are they processed? The tests need to cover the
>> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
>
>
> I don't understand, what you are propose here. ?? Please, can you send some
> examples.

Per below - handling of DTD <!ENTITY> declarations, and the builtin
entity tests I already added tests for.


>> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>>
>> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
>> standalone="yes" ?>
>> <!DOCTYPE foo [
>>   <!ELEMENT foo (#PCDATA)>
>>   <!ENTITY pg "PostgreSQL">
>> ]>
>> <foo>Hello &pg;.</foo>
>> $XML$ COLUMNS foo text);
>>
>> + ERROR:  invalid XML content
>> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
>> +                                            ^
>> + DETAIL:  line 2: StartTag: invalid element name
>> + <!DOCTYPE foo [
>> +  ^
>> + line 3: StartTag: invalid element name
>> +   <!ELEMENT foo (#PCDATA)>
>> +    ^
>> + line 4: StartTag: invalid element name
>> +   <!ENTITY pg "PostgreSQL">
>> +    ^
>> + line 6: Entity 'pg' not defined
>> + <foo>Hello &pg;.</foo>
>> +                ^
>>
>
> It is rejected before XMLTABLE function call
>
> postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> postgres$# <!DOCTYPE foo [
> postgres$#   <!ELEMENT foo (#PCDATA)>
> postgres$#   <!ENTITY pg "PostgreSQL">
> postgres$# ]>
> postgres$# <foo>Hello &pg;.</foo>
> postgres$# $XML$::xml;
> ERROR:  invalid XML content
> LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
>                ^
> DETAIL:  line 2: StartTag: invalid element name
> <!DOCTYPE foo [
[snip]

> It is disabled by default in libxml2. I found a function
> xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault
>
> The default behave should be common for all PostgreSQL's libxml2 based
> function - and then it is different topic - maybe part for PostgreSQL ToDo?
> But I don't remember any user requests related to this issue.


OK, so it's not xmltable specific. Fine by me.

Somebody who cares can deal with it. There's clearly nobody breaking
down the walls wanting the feature.

> I removed this tests - it is not related to XMLTABLE function, but to
> generic XML processing/validation.


Good plan.

>> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
>> instead of a PG_TRY() / PG_CATCH() block?
>
>
> If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you
> want to catch FATAL errors (and when you want to clean shared memory).
> XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.

Ok, makes sense.


>> I don't have the libxml knowledge or remaining brain to usefully
>> evaluate the xpath and xml specifics in xpath.c today. It does strike
>> me that the new xpath parser should probably live in its own file,
>> though.
>
> moved

Thanks.


> new version is attached


Great.

I'm marking this ready for committer at this point.

Thank you very much

Regards

Pavel
 

I think the XML parser likely needs a more close reading, so I'll ping
Peter E to see if he'll have a chance to check that bit out. But by
and large I think the issues have been ironed out - in terms of
functionality, structure and clarity I think it's looking solid.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"

This was based on a misunderstanding of something you said earlier. I
thought the idea was to allow this to work:

SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);

... but it doesn't:


SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);
ERROR:  column "elemname" does not exist
LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName);

... so please delete that text. I thought I'd tested it but the state
of my tests dir says I just got distracted by another task at the
wrong time.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-09-27 5:53 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"

This was based on a misunderstanding of something you said earlier. I
thought the idea was to allow this to work:

SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);

... but it doesn't:


SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);
ERROR:  column "elemname" does not exist
LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName);

... so please delete that text. I thought I'd tested it but the state
of my tests dir says I just got distracted by another task at the
wrong time.

deleted

Regards

Pavel

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: patch: function xmltable

От
Michael Paquier
Дата:
On Tue, Sep 27, 2016 at 2:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-09-27 5:53 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
>>
>> [...]
>> ... so please delete that text. I thought I'd tested it but the state
>> of my tests dir says I just got distracted by another task at the
>> wrong time.

Moved patch to next CF with same status: ready for committer.
-- 
Michael



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

new update - only doc

+    <para>
+     Only XPath query language is supported. PostgreSQL doesn't support XQuery
+     language. Then the syntax of <function>xmltable</function> doesn't
+     allow to use XQuery related functionality - the name of xml expression
+     (clause <literal>AS</literal>) is not allowed, and only one xml expression
+     should be passed to <function>xmltable</function> function as parameter.
+    </para>

Regards

Pavel
Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
I've been going over this patch.  I think it'd be better to restructure
the <sect2> before adding the docs for this new function; I already
split it out, so don't do anything about this.

Next, looking at struct TableExprBuilder I noticed that the comments are
already obsolete, as they talk about function params that do not exist
(missing_columns) and they fail to mention the ones that do exist.
Also, function member SetContent is not documented at all.  Overall,
these comments do not convey a lot -- apparently, whoever reads them is
already supposed to know how it works: "xyz sets a row generating
filter" doesn't tell me anything.  Since this is API documentation, it
needs to be much clearer.

ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
whatsoever.  Needs fixed.

I wonder if it'd be a good idea to install TableExpr first without the
implementing XMLTABLE, so that it's clearer what is API and what is
implementation.

The number of new keywords in this patch is depressing.  I suppose
there's no way around that -- as I understand, this is caused by the SQL
standard's definition of the syntax for this feature.

Have to go now for a bit -- will continue looking afterwards.  Please
submit delta patches on top of your latest v12 to fix the comments I
mentioned.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-11-17 19:22 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
I've been going over this patch.  I think it'd be better to restructure
the <sect2> before adding the docs for this new function; I already
split it out, so don't do anything about this.

Next, looking at struct TableExprBuilder I noticed that the comments are
already obsolete, as they talk about function params that do not exist
(missing_columns) and they fail to mention the ones that do exist.
Also, function member SetContent is not documented at all.  Overall,
these comments do not convey a lot -- apparently, whoever reads them is
already supposed to know how it works: "xyz sets a row generating
filter" doesn't tell me anything.  Since this is API documentation, it
needs to be much clearer.

ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
whatsoever.  Needs fixed.

I am sending the patch with more comments - but it needs a care someone with good English skills.
 

I wonder if it'd be a good idea to install TableExpr first without the
implementing XMLTABLE, so that it's clearer what is API and what is
implementation.

I am not sure about this step - the API is clean from name. In this moment, for this API is not any other tests than XMLTABLE implementation.
 

The number of new keywords in this patch is depressing.  I suppose
there's no way around that -- as I understand, this is caused by the SQL
standard's definition of the syntax for this feature.

Have to go now for a bit -- will continue looking afterwards.  Please
submit delta patches on top of your latest v12 to fix the comments I
mentioned.


Regards

Pavel
 
--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> 2016-11-17 19:22 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> 
> > Next, looking at struct TableExprBuilder I noticed that the comments are
> > already obsolete, as they talk about function params that do not exist
> > (missing_columns) and they fail to mention the ones that do exist.
> > Also, function member SetContent is not documented at all.  Overall,
> > these comments do not convey a lot -- apparently, whoever reads them is
> > already supposed to know how it works: "xyz sets a row generating
> > filter" doesn't tell me anything.  Since this is API documentation, it
> > needs to be much clearer.
> >
> > ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
> > whatsoever.  Needs fixed.
> 
> I am sending the patch with more comments - but it needs a care someone
> with good English skills.

Thanks, I can help with that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
The SQL standard seems to require a comma after the XMLNAMESPACES
clause:

<XML table> ::= XMLTABLE <left paren> [ <XML namespace declaration> <comma> ]   <XML table row pattern> [ <XML table
argumentlist> ]   COLUMNS <XML table column definitions> <right paren>
 

I don't understand the reason for that, but I have added it:
        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'            {
   TableExpr *n = makeNode(TableExpr);                n->row_path = $8;                n->expr = $9;
n->cols= NIL;                n->namespaces = $5;                n->location = @1;                $$ = (Node *)n;
   }        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS
TableExprColList')'            {                TableExpr *n = makeNode(TableExpr);                n->row_path = $8;
           n->expr = $9;                n->cols = $11;                n->namespaces = $5;                n->location =
@1;               $$ = (Node *)n;            }    ;
 

Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options).  This seems to reduce the size of the
generated gram.c a good dozen kB.


I didn't like much the use of c_expr in all these productions.  As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here.  TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.  

Actually, is the existing XMLEXISTS production correct?  What I see in
the standard is

<XML table row pattern> ::= <character string literal>

<XML table argument list> ::=  PASSING <XML table argument passing mechanism> <XML query argument>    [ { <comma> <XML
queryargument> }... ]
 

<XML table argument passing mechanism> ::= <XML passing mechanism>

<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]

<XML table column definition> ::=   <XML table ordinality column definition> | <XML table regular column definition>

<XML table ordinality column definition> ::= <column name> FOR ORDINALITY

<XML table regular column definition> ::=   <column name> <data type> [ <XML passing mechanism> ] [ <default clause> ]
[PATH <XML table column pattern> ]
 

<XML table column pattern> ::= <character string literal>

so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:

/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:        PASSING c_expr            {                $$ = $2;            }        | PASSING c_expr BY
REF           {                $$ = $2;            }        | PASSING BY REF c_expr            {                $$ =
$4;           }        | PASSING BY REF c_expr BY REF            {                $$ = $4;            }    ;
 

I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?

If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.

The fourth option seems very dubious to me.  This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4C0F6DBF.9000001@mlfowler.com

... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE.  Maybe we
shouldn't reuse xmlexists_argument here at all.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-19 0:42 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
The SQL standard seems to require a comma after the XMLNAMESPACES
clause:

<XML table> ::=
  XMLTABLE <left paren>
  [ <XML namespace declaration> <comma> ]
    <XML table row pattern>
  [ <XML table argument list> ]
    COLUMNS <XML table column definitions> <right paren>

I don't understand the reason for that, but I have added it:

                        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'
                                {
                                        TableExpr *n = makeNode(TableExpr);
                                        n->row_path = $8;
                                        n->expr = $9;
                                        n->cols = NIL;
                                        n->namespaces = $5;
                                        n->location = @1;
                                        $$ = (Node *)n;
                                }
                        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
                                {
                                        TableExpr *n = makeNode(TableExpr);
                                        n->row_path = $8;
                                        n->expr = $9;
                                        n->cols = $11;
                                        n->namespaces = $5;
                                        n->location = @1;
                                        $$ = (Node *)n;
                                }
                ;


yes, looks my oversight - it is better

 
Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options).  This seems to reduce the size of the
generated gram.c a good dozen kB.

If I remember well - this was required by better compatibility with Oracle

ANSI SQL: colname type DEFAULT PATH
Oracle: colname PATH DEFAULT

My implementation allows both combinations - there are two reasons: 1. one less issue when people does port from Oracle, 2. almost all examples of XMLTABLE on a net are from Oracle - it can be unfriendly, when these examples would not work on PG - there was discussion about this issue in this mailing list
 


I didn't like much the use of c_expr in all these productions.  As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here.  TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.

There are two situations: c_expr as document content, and c_expr after DEFAULT and PATH keywords. First probably can be fixed, second not, because "PATH" is unreserved keyword only.
 

Actually, is the existing XMLEXISTS production correct?  What I see in
the standard is

<XML table row pattern> ::= <character string literal>

<XML table argument list> ::=
   PASSING <XML table argument passing mechanism> <XML query argument>
     [ { <comma> <XML query argument> }... ]

<XML table argument passing mechanism> ::= <XML passing mechanism>

<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]

<XML table column definition> ::=
    <XML table ordinality column definition>
  | <XML table regular column definition>

<XML table ordinality column definition> ::= <column name> FOR ORDINALITY

<XML table regular column definition> ::=
    <column name> <data type> [ <XML passing mechanism> ]
  [ <default clause> ]
  [ PATH <XML table column pattern> ]

<XML table column pattern> ::= <character string literal>

so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:

/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
                        PASSING c_expr
                                {
                                        $$ = $2;
                                }
                        | PASSING c_expr BY REF
                                {
                                        $$ = $2;
                                }
                        | PASSING BY REF c_expr
                                {
                                        $$ = $4;
                                }
                        | PASSING BY REF c_expr BY REF
                                {
                                        $$ = $4;
                                }
                ;

I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?
 
If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.


The reason is a compatibility with other products - DB2. XMLTABLE uses same options like XMLEXISTS. These options has zero value for Postgres, but its are important - compatibility, and workable examples.
 
The fourth option seems very dubious to me.  This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4C0F6DBF.9000001@mlfowler.com

... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE.  Maybe we
shouldn't reuse xmlexists_argument here at all.

not sure If I understand

Regards

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-19 5:19 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-11-19 0:42 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
The SQL standard seems to require a comma after the XMLNAMESPACES
clause:

<XML table> ::=
  XMLTABLE <left paren>
  [ <XML namespace declaration> <comma> ]
    <XML table row pattern>
  [ <XML table argument list> ]
    COLUMNS <XML table column definitions> <right paren>

I don't understand the reason for that, but I have added it:

                        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'
                                {
                                        TableExpr *n = makeNode(TableExpr);
                                        n->row_path = $8;
                                        n->expr = $9;
                                        n->cols = NIL;
                                        n->namespaces = $5;
                                        n->location = @1;
                                        $$ = (Node *)n;
                                }
                        | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
                                {
                                        TableExpr *n = makeNode(TableExpr);
                                        n->row_path = $8;
                                        n->expr = $9;
                                        n->cols = $11;
                                        n->namespaces = $5;
                                        n->location = @1;
                                        $$ = (Node *)n;
                                }
                ;


yes, looks my oversight - it is better

 
Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options).  This seems to reduce the size of the
generated gram.c a good dozen kB.

If I remember well - this was required by better compatibility with Oracle

ANSI SQL: colname type DEFAULT PATH
Oracle: colname PATH DEFAULT

My implementation allows both combinations - there are two reasons: 1. one less issue when people does port from Oracle, 2. almost all examples of XMLTABLE on a net are from Oracle - it can be unfriendly, when these examples would not work on PG - there was discussion about this issue in this mailing list
 


I didn't like much the use of c_expr in all these productions.  As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here.  TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.

There are two situations: c_expr as document content, and c_expr after DEFAULT and PATH keywords. First probably can be fixed, second not, because "PATH" is unreserved keyword only.

It is not possible PASSING is unreserved keyword too.

Regards

Pavel
 

Actually, is the existing XMLEXISTS production correct?  What I see in
the standard is

<XML table row pattern> ::= <character string literal>

<XML table argument list> ::=
   PASSING <XML table argument passing mechanism> <XML query argument>
     [ { <comma> <XML query argument> }... ]

<XML table argument passing mechanism> ::= <XML passing mechanism>

<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]

<XML table column definition> ::=
    <XML table ordinality column definition>
  | <XML table regular column definition>

<XML table ordinality column definition> ::= <column name> FOR ORDINALITY

<XML table regular column definition> ::=
    <column name> <data type> [ <XML passing mechanism> ]
  [ <default clause> ]
  [ PATH <XML table column pattern> ]

<XML table column pattern> ::= <character string literal>

so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:

/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
                        PASSING c_expr
                                {
                                        $$ = $2;
                                }
                        | PASSING c_expr BY REF
                                {
                                        $$ = $2;
                                }
                        | PASSING BY REF c_expr
                                {
                                        $$ = $4;
                                }
                        | PASSING BY REF c_expr BY REF
                                {
                                        $$ = $4;
                                }
                ;

I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?
 
If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.


The reason is a compatibility with other products - DB2. XMLTABLE uses same options like XMLEXISTS. These options has zero value for Postgres, but its are important - compatibility, and workable examples.
 
The fourth option seems very dubious to me.  This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4C0F6DBF.9000001@mlfowler.com

... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE.  Maybe we
shouldn't reuse xmlexists_argument here at all.

not sure If I understand

Regards

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Something I just noticed is that transformTableExpr takes a TableExpr
node and returns another TableExpr node.  That's unlike what we do in
other places, where the node returned is of a different type than the
input node.  I'm not real clear what happens if you try to re-transform
a node that was already transformed, but it seems worth thinking about.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Something I just noticed is that transformTableExpr takes a TableExpr
> node and returns another TableExpr node.  That's unlike what we do in
> other places, where the node returned is of a different type than the
> input node.  I'm not real clear what happens if you try to re-transform
> a node that was already transformed, but it seems worth thinking about.

We're not 100% consistent on that --- there are cases such as RowExpr
and CaseExpr where the same struct type is used for pre-parse-analysis
and post-parse-analysis nodes.  I think it's okay as long as the
information content isn't markedly different, ie the transformation
just consists of transforming all the sub-nodes.

Being able to behave sanely on a re-transformation used to be an
issue, but we no longer expect transformExpr to support that.
        regards, tom lane



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-21 21:16 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Something I just noticed is that transformTableExpr takes a TableExpr
> node and returns another TableExpr node.  That's unlike what we do in
> other places, where the node returned is of a different type than the
> input node.  I'm not real clear what happens if you try to re-transform
> a node that was already transformed, but it seems worth thinking about.

We're not 100% consistent on that --- there are cases such as RowExpr
and CaseExpr where the same struct type is used for pre-parse-analysis
and post-parse-analysis nodes.  I think it's okay as long as the
information content isn't markedly different, ie the transformation
just consists of transforming all the sub-nodes.

Being able to behave sanely on a re-transformation used to be an
issue, but we no longer expect transformExpr to support that.

I was not sure in this case - using new node was more clear for me - safeguard against some uninitialized or untransformed value. There in only few bytes memory more overhead.

regards

Pavel
 

                        regards, tom lane

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
I found the whole TableExprGetTupleDesc() function a bit odd in
nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to
execTuples.c -- but only because that's where ExecTypeFromTL and others
already live.  I would have liked to move it to tupdesc.c instead, but
it requires knowledge of executor nodes, which is probably the reason
that ExecTypeFromTL is in execTuples.  I think we'd eat that bit of
ugliness only because we're not the first.  But anyway I quickly ran
into another problem.

I noticed that ExecTypeFromTableExpr is being called from the transform
phase, which is much earlier than the executor.  I noticed because of
the warning that the above movement added to nodeFuncs.c,
src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of function 'ExecTypeFromTableExpr'
[-Wimplicit-function-declaration]

so I thought, hm, is it okay to have parse analysis run an executor
function?  (I suppose this is the reason you put it in nodeFuncs in the
first place).  For fun, I tried this query under GDB, with a breakpoint
on exprTypmod():

SELECT X.* 
FROM emp, 
XMLTABLE ('//depts/dept/employee' passing doc   COLUMNS   empID    INTEGER     PATH '@id',  firstname    int     PATH
'name/first', lastname     VARCHAR(25)     PATH 'name/last') AS X;
 

and sure enough, the type is resolved during parse analysis:

Breakpoint 1, exprTypmod (expr=expr@entry=0x1d23ad8)   at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283             if (!expr)
(gdb) print *expr
$2 = {type = T_TableExpr}
(gdb) bt
#0  exprTypmod (expr=expr@entry=0x1d23ad8)   at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1  0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8,    resultTypeId=0x7ffd482bfdb4,
resultTupleDesc=0x7ffd482bfdb8)  at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c:247
 
#2  0x000000000056de1b in expandRTE (rte=rte@entry=0x1d6b800, rtindex=2,    sublevels_up=0, location=location@entry=7,
 include_dropped=include_dropped@entry=0 '\000',    colnames=colnames@entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18)
at/pgsql/source/master/src/backend/parser/parse_relation.c:2052
 
#3  0x000000000056e131 in expandRelAttrs (pstate=pstate@entry=0x1d238a8,    rte=rte@entry=0x1d6b800, rtindex=<optimized
out>,   sublevels_up=<optimized out>, location=location@entry=7)   at
/pgsql/source/master/src/backend/parser/parse_relation.c:2435
#4  0x000000000056fa64 in ExpandSingleTable (pstate=pstate@entry=0x1d238a8,    rte=rte@entry=0x1d6b800, location=7,
make_target_entry=make_target_entry@entry=1'\001')   at /pgsql/source/master/src/backend/parser/parse_target.c:1266
 
#5  0x000000000057135b in ExpandColumnRefStar (pstate=pstate@entry=0x1d238a8,    cref=0x1d22720,
make_target_entry=make_target_entry@entry=1'\001')   at /pgsql/source/master/src/backend/parser/parse_target.c:1158
 
#6  0x00000000005716f9 in transformTargetList (pstate=0x1d238a8,    targetlist=<optimized out>,
exprKind=EXPR_KIND_SELECT_TARGET)

This seems fine I guess, and it seems to say that we ought to move the
code that generates the tupdesc to back parse analysis rather than
executor.  Okay, fine.  But let's find a better place than nodeFuncs.

But if I move the XMLTABLE() call to the target list instead, the type
is resolved at planner time:

SELECT  
XMLTABLE ('/dept/employee' passing $$<dept bldg="114">    <employee id="903">       <name>
<first>Mary</first>          <last>Jones</last>       </name>       <office>415</office>
<phone>905-403-6112</phone>      <phone>647-504-4546</phone>       <salary currency="USD">64000</salary>   </employee>
 
</dept>$$  COLUMNS   empID    INTEGER     PATH '@id',  firstname    varchar(4)     PATH 'name/first',  lastname
VARCHAR(25)    PATH 'name/last') AS X;
 

Breakpoint 1, exprTypmod (expr=expr@entry=0x1d6bed8)   at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283             if (!expr)
(gdb) bt
#0  exprTypmod (expr=expr@entry=0x1d6bed8)   at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1  0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68,    target=0x1d6c728)   at
/pgsql/source/master/src/backend/optimizer/path/costsize.c:4729
#2  0x000000000066c197 in grouping_planner (root=0x1d6bc68,    inheritance_update=40 '(', inheritance_update@entry=0
'\000',   tuple_fraction=0.01, tuple_fraction@entry=0)   at
/pgsql/source/master/src/backend/optimizer/plan/planner.c:1745
#3  0x000000000066ef64 in subquery_planner (glob=glob@entry=0x1d6bbd0,    parse=parse@entry=0x1d23818,
parent_root=parent_root@entry=0x0,   hasRecursion=hasRecursion@entry=0 '\000',
tuple_fraction=tuple_fraction@entry=0)  at /pgsql/source/master/src/backend/optimizer/plan/planner.c:795
 
#4  0x000000000066fe5e in standard_planner (parse=0x1d23818,    cursorOptions=256, boundParams=<optimized out>)   at
/pgsql/source/master/src/backend/optimizer/plan/planner.c:307

This is surprising, but I'm not sure it's wrong.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-22 21:47 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
I found the whole TableExprGetTupleDesc() function a bit odd in
nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to
execTuples.c -- but only because that's where ExecTypeFromTL and others
already live.  I would have liked to move it to tupdesc.c instead, but
it requires knowledge of executor nodes, which is probably the reason
that ExecTypeFromTL is in execTuples.  I think we'd eat that bit of
ugliness only because we're not the first.  But anyway I quickly ran
into another problem.

I noticed that ExecTypeFromTableExpr is being called from the transform
phase, which is much earlier than the executor.  I noticed because of
the warning that the above movement added to nodeFuncs.c,
src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of function 'ExecTypeFromTableExpr' [-Wimplicit-function-declaration]

The tuple descriptor should not be serialized.

When xmltable is called directly, then living tuple descriptor is used - created in transform time. Another situation is when xmltable is used from view, where transform time is skipped.

Originally I serialized generated type - but I had the problems with record types - the current infrastructure expects serialization only real types.

My solution is a recheck of tuple descriptor in executor time. It is small overhead - once per query - but it allows use xmltable from views without  necessity to specify returned columns explicitly.
 

so I thought, hm, is it okay to have parse analysis run an executor
function?  (I suppose this is the reason you put it in nodeFuncs in the
first place).  For fun, I tried this query under GDB, with a breakpoint
on exprTypmod():

SELECT X.*
FROM emp,
XMLTABLE ('//depts/dept/employee' passing doc
   COLUMNS
   empID    INTEGER     PATH '@id',
   firstname    int     PATH 'name/first',
   lastname     VARCHAR(25)     PATH 'name/last') AS X;

and sure enough, the type is resolved during parse analysis:

Breakpoint 1, exprTypmod (expr=expr@entry=0x1d23ad8)
    at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283             if (!expr)
(gdb) print *expr
$2 = {type = T_TableExpr}
(gdb) bt
#0  exprTypmod (expr=expr@entry=0x1d23ad8)
    at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1  0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8,
    resultTypeId=0x7ffd482bfdb4, resultTupleDesc=0x7ffd482bfdb8)
    at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c:247
#2  0x000000000056de1b in expandRTE (rte=rte@entry=0x1d6b800, rtindex=2,
    sublevels_up=0, location=location@entry=7,
    include_dropped=include_dropped@entry=0 '\000',
    colnames=colnames@entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18)
    at /pgsql/source/master/src/backend/parser/parse_relation.c:2052
#3  0x000000000056e131 in expandRelAttrs (pstate=pstate@entry=0x1d238a8,
    rte=rte@entry=0x1d6b800, rtindex=<optimized out>,
    sublevels_up=<optimized out>, location=location@entry=7)
    at /pgsql/source/master/src/backend/parser/parse_relation.c:2435
#4  0x000000000056fa64 in ExpandSingleTable (pstate=pstate@entry=0x1d238a8,
    rte=rte@entry=0x1d6b800, location=7,
    make_target_entry=make_target_entry@entry=1 '\001')
    at /pgsql/source/master/src/backend/parser/parse_target.c:1266
#5  0x000000000057135b in ExpandColumnRefStar (pstate=pstate@entry=0x1d238a8,
    cref=0x1d22720, make_target_entry=make_target_entry@entry=1 '\001')
    at /pgsql/source/master/src/backend/parser/parse_target.c:1158
#6  0x00000000005716f9 in transformTargetList (pstate=0x1d238a8,
    targetlist=<optimized out>, exprKind=EXPR_KIND_SELECT_TARGET)

This seems fine I guess, and it seems to say that we ought to move the
code that generates the tupdesc to back parse analysis rather than
executor.  Okay, fine.  But let's find a better place than nodeFuncs.

But if I move the XMLTABLE() call to the target list instead, the type
is resolved at planner time:

SELECT
XMLTABLE ('/dept/employee' passing $$<dept bldg="114">
    <employee id="903">
        <name>
            <first>Mary</first>
            <last>Jones</last>
        </name>
        <office>415</office>
        <phone>905-403-6112</phone>
        <phone>647-504-4546</phone>
        <salary currency="USD">64000</salary>
    </employee>
</dept>$$
   COLUMNS
   empID    INTEGER     PATH '@id',
   firstname    varchar(4)     PATH 'name/first',
   lastname     VARCHAR(25)     PATH 'name/last') AS X;

Breakpoint 1, exprTypmod (expr=expr@entry=0x1d6bed8)
    at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283             if (!expr)
(gdb) bt
#0  exprTypmod (expr=expr@entry=0x1d6bed8)
    at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1  0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68,
    target=0x1d6c728)
    at /pgsql/source/master/src/backend/optimizer/path/costsize.c:4729
#2  0x000000000066c197 in grouping_planner (root=0x1d6bc68,
    inheritance_update=40 '(', inheritance_update@entry=0 '\000',
    tuple_fraction=0.01, tuple_fraction@entry=0)
    at /pgsql/source/master/src/backend/optimizer/plan/planner.c:1745
#3  0x000000000066ef64 in subquery_planner (glob=glob@entry=0x1d6bbd0,
    parse=parse@entry=0x1d23818, parent_root=parent_root@entry=0x0,
    hasRecursion=hasRecursion@entry=0 '\000',
    tuple_fraction=tuple_fraction@entry=0)
    at /pgsql/source/master/src/backend/optimizer/plan/planner.c:795
#4  0x000000000066fe5e in standard_planner (parse=0x1d23818,
    cursorOptions=256, boundParams=<optimized out>)
    at /pgsql/source/master/src/backend/optimizer/plan/planner.c:307

This is surprising, but I'm not sure it's wrong.

There are different processing for Set Returning nodes called from paramlist and from tablelist.  In last case the invokes exprTypmod early.

There is a different case, that you didn't check

CREATE VIEW x AS SELECT xmltable(...)
CREATE VIEW x1 AS SELECT * FROM xmltable(...)

close session

and in new session
SELECT * FROM x;
SELECT * FROM x1;

Regards

Pavel


 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
I tried to see if a following RTE was able to "see" the entries created by
XMLTABLE, and sure it can:

SELECT X.*, generate_series 
FROM emp, 
XMLTABLE ('//depts/dept/employee' passing doc   COLUMNS   empID    INTEGER     PATH '@id',  firstname    varchar(25)
PATH 'name/first',  lastname     VARCHAR(25)     PATH 'name/last') AS X,
 
generate_series(900, empid);
empid │ firstname │ lastname │ generate_series 
───────┼───────────┼──────────┼─────────────────  901 │ John      │ Doe      │             900  901 │ John      │ Doe
  │             901  902 │ Peter     │ Pan      │             900  902 │ Peter     │ Pan      │             901  902 │
Peter    │ Pan      │             902  903 │ Mary      │ Jones    │             900  903 │ Mary      │ Jones    │
     901  903 │ Mary      │ Jones    │             902  903 │ Mary      │ Jones    │             903
 
(9 filas)

Cool.

I'm still wondering how this works.  I'll continue to explore the patch
in order to figure out.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> I'm still wondering how this works.  I'll continue to explore the patch
> in order to figure out.

Ah, so it's already parsed as a "range function".  That part looks good
to me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Oh my, I just noticed we have a new xpath preprocessor in this patch
too.  Where did this code come from -- did you write it all from
scratch?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Here's another version.  Not there yet: need to move back the function
to create the tupdesc, as discussed.  Not clear what's the best place,
however.  I modified the grammar a bit (added the missing comma, removed
PATH as an unreserved keyword and just used IDENT, removed the "Opt"
version for column options), and reworked the comments in the transform
phase (I tweaked the code here and there mostly to move things to nicer
places, but it's pretty much the same code).

In the new xpath_parser.c file I think we should tidy things up a bit.
First, it needs more commentary on what the entry function actually
does, in detail.  Also, IMO that function should be at the top of the
file, not at the bottom, followed by all its helpers.  I would like some
more clarity on the provenance of all this code, just to assess the
probability of bugs; mostly as it's completely undocumented.

I don't like the docs either.  I think we should have a complete
reference to the syntax, followed by examples, rather than letting the
examples drive the whole thing.  I fixed the synopsis so that it's not
one very long line.

If you use "PATH '/'" for a column, you get the text for all the entries
in the whole XML, rather than the text for the particular row being
processed.  Isn't that rather weird, or to put it differently, completely
wrong?  I didn't find a way to obtain the whole XML row when you have
the COLUMNS option (which is what I was hoping for with the "PATH '/'").

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Sorry, here's the patch.  Power loss distracted me here.

By the way, the pgindent you did is slightly confused because you failed
to add the new struct types you define to typedefs.list.

I have not looked at the new xml.c code at all, yet.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Andrew Dunstan
Дата:

On 11/23/2016 06:31 PM, Alvaro Herrera wrote:
> Sorry, here's the patch.  Power loss distracted me here.
>
> By the way, the pgindent you did is slightly confused because you failed
> to add the new struct types you define to typedefs.list.


Tips on how to use pgindent for developers:

<http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html>

cheers

andrew




Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> If you use "PATH '/'" for a column, you get the text for all the entries
> in the whole XML, rather than the text for the particular row being
> processed.  Isn't that rather weird, or to put it differently, completely
> wrong?  I didn't find a way to obtain the whole XML row when you have
> the COLUMNS option (which is what I was hoping for with the "PATH '/'").

Ah, apparently you need to use type XML for that column in order for
this to happen.  Example:

insert into emp values ($$
<depts ><dept bldg="102">    <employee id="905">        <name>            <first>John</first>
<last>Doew</last>       </name>                                              <office>344</office>
                                                                       <salary currency="USD">55000</salary>
</employee>
    <employee id="908">        <name>            <first>Peter</first>            <last>Panw</last>        </name>
<office>216</office>        <phone>905-416-5004</phone>    </employee></dept>
 
<dept bldg="115">    <employee id="909">        <name>            <first>Mary</first>            <last>Jonesw</last>
   </name>        <office>415</office>        <phone>905-403-6112</phone>        <phone>647-504-4546</phone>
<salarycurrency="USD">64000</salary>    </employee></dept>
 
</depts>
$$);

Note the weird salary_amount value here:

SELECT x.*
FROM emp, 
XMLTABLE ('//depts/dept/employee' passing doc     COLUMNS        i for ordinality,       empID    int     PATH '@id',
   firstname    varchar(25)     PATH 'name/first' default 'FOOBAR',       lastname     VARCHAR(25)     PATH
'name/last',      salary xml path 'concat(salary/text(), salary/@currency)' default 'DONT KNOW', salary_amount xml path
'/')  WITH ORDINALITY  AS X (i, a, b, c)  limit 1;i │  a  │  b   │  c   │  salary  │     salary_amount     │ ordinality

───┼─────┼──────┼──────┼──────────┼───────────────────────┼────────────1 │ 905 │ John │ Doew │ 55000USD │
      ↵│          1  │     │      │      │          │                      ↵│   │     │      │      │          │
             ↵│   │     │      │      │          │              John    ↵│   │     │      │      │          │
  Doew    ↵│   │     │      │      │          │                      ↵│   │     │      │      │          │          344
       ↵│   │     │      │      │          │          55000       ↵│   │     │      │      │          │
    ↵│   │     │      │      │          │                      ↵│   │     │      │      │          │
 ↵│   │     │      │      │          │                      ↵│   │     │      │      │          │              Peter
↵│  │     │      │      │          │              Panw    ↵│   │     │      │      │          │                      ↵│
 │     │      │      │          │          216         ↵│   │     │      │      │          │          905-416-5004↵│
│    │      │      │          │                      ↵│   │     │      │      │          │                      ↵│   │
  │      │      │          │                      ↵│   │     │      │      │          │                      ↵│   │
│     │      │          │                      ↵│   │     │      │      │          │                      ↵│   │     │
   │      │          │              Mary    ↵│   │     │      │      │          │              Jonesw  ↵│   │     │
│      │          │                      ↵│   │     │      │      │          │          415         ↵│   │     │      │
    │          │          905-403-6112↵│   │     │      │      │          │          647-504-4546↵│   │     │      │
 │          │          64000       ↵│   │     │      │      │          │                      ↵│   │     │      │
│         │                      ↵│   │     │      │      │          │                       │ 
 
(1 fila)


If you declare salary_amount to be text instead, it doesn't happen anymore.
Apparently if you put it in a namespace, it doesn't hapen either.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Oh my, I just noticed we have a new xpath preprocessor in this patch
too.  Where did this code come from -- did you write it all from
scratch?

I wrote it from scratch - libxml2 has not any API for iteration over XPath expression (different than iteration over XPath expression result), and what I have info, there will not be any new API in libxml2.

There are two purposes:

Safe manipulation with XPath expression prefixes - ANSI SQL design implicitly expects some prefix, but it can be used manually. The prefix should not be used twice and in some situations, when it can breaks the expression.

Second goal is support default namespaces - when we needed parser for first task, then the enhancing for this task was not too much lines more.

This parser can be used for enhancing current XPath function - default namespaces are pretty nice, when you have to use namespaces.
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-24 0:29 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Here's another version.  Not there yet: need to move back the function
to create the tupdesc, as discussed.  Not clear what's the best place,
however.  I modified the grammar a bit (added the missing comma, removed
PATH as an unreserved keyword and just used IDENT, removed the "Opt"
version for column options), and reworked the comments in the transform
phase (I tweaked the code here and there mostly to move things to nicer
places, but it's pretty much the same code).

In the new xpath_parser.c file I think we should tidy things up a bit.
First, it needs more commentary on what the entry function actually
does, in detail.  Also, IMO that function should be at the top of the
file, not at the bottom, followed by all its helpers.  I would like some
more clarity on the provenance of all this code, just to assess the
probability of bugs; mostly as it's completely undocumented.

I don't like the docs either.  I think we should have a complete
reference to the syntax, followed by examples, rather than letting the
examples drive the whole thing.  I fixed the synopsis so that it's not
one very long line.

If you use "PATH '/'" for a column, you get the text for all the entries
in the whole XML, rather than the text for the particular row being
processed.  Isn't that rather weird, or to put it differently, completely
wrong?  I didn't find a way to obtain the whole XML row when you have
the COLUMNS option (which is what I was hoping for with the "PATH '/'").

This is a libxml2 behave

Postprocessing only check result and try to push the result to expected types.

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-24 5:52 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Oh my, I just noticed we have a new xpath preprocessor in this patch
too.  Where did this code come from -- did you write it all from
scratch?

I wrote it from scratch - libxml2 has not any API for iteration over XPath expression (different than iteration over XPath expression result), and what I have info, there will not be any new API in libxml2.

There are two purposes:

Safe manipulation with XPath expression prefixes - ANSI SQL design implicitly expects some prefix, but it can be used manually. The prefix should not be used twice and in some situations, when it can breaks the expression.

Implicit prefix for column PATH expressions is "./". An user can use it explicitly.

In my initial patches, the manipulations with XPath expression was more complex - now, it can be reduced - but then we lost default namespaces support, what is nice feature, supported other providers.


 

Second goal is support default namespaces - when we needed parser for first task, then the enhancing for this task was not too much lines more.

This parser can be used for enhancing current XPath function - default namespaces are pretty nice, when you have to use namespaces.
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:
> Hi
> 
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> 
> > Oh my, I just noticed we have a new xpath preprocessor in this patch
> > too.  Where did this code come from -- did you write it all from
> > scratch?
> 
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.

Okay, I agree that the default namespace stuff looks worthwhile in the
long run.  But I don't have enough time to review the xpath parser stuff
in the current commitfest, and I think it needs at the very least a lot
of additional code commentary.

However I think the rest of it can reasonably go in -- I mean the SQL
parse of it, analysis, executor.  Let me propose this: you split the
patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT,
and we introduce just the TableExpr stuff plus the XMLTABLE function.  I
can commit that part in the current commitfest, and we leave the
xpath_parser plus associated features for the upcoming commitfest.

Deal?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-24 18:29 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > Oh my, I just noticed we have a new xpath preprocessor in this patch
> > too.  Where did this code come from -- did you write it all from
> > scratch?
>
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.

Okay, I agree that the default namespace stuff looks worthwhile in the
long run.  But I don't have enough time to review the xpath parser stuff
in the current commitfest, and I think it needs at the very least a lot
of additional code commentary.

However I think the rest of it can reasonably go in -- I mean the SQL
parse of it, analysis, executor.  Let me propose this: you split the
patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT,
and we introduce just the TableExpr stuff plus the XMLTABLE function.  I
can commit that part in the current commitfest, and we leave the
xpath_parser plus associated features for the upcoming commitfest. 

Deal?

ok

can me send your last work?

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> can me send your last work?

Sure, it's in the archives --
https://www.postgresql.org/message-id/20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pavel Stehule wrote:
>> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>>> too.  Where did this code come from -- did you write it all from
>>> scratch?

>> I wrote it from scratch - libxml2 has not any API for iteration over XPath
>> expression (different than iteration over XPath expression result), and
>> what I have info, there will not be any new API in libxml2.

> Okay, I agree that the default namespace stuff looks worthwhile in the
> long run.  But I don't have enough time to review the xpath parser stuff
> in the current commitfest, and I think it needs at the very least a lot
> of additional code commentary.

contrib/xml2 has always relied on libxslt for xpath functionality.
Can we do that here instead of writing, debugging, and documenting
a pile of new code?
        regards, tom lane



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-24 18:51 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pavel Stehule wrote:
>> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>>> too.  Where did this code come from -- did you write it all from
>>> scratch?

>> I wrote it from scratch - libxml2 has not any API for iteration over XPath
>> expression (different than iteration over XPath expression result), and
>> what I have info, there will not be any new API in libxml2.

> Okay, I agree that the default namespace stuff looks worthwhile in the
> long run.  But I don't have enough time to review the xpath parser stuff
> in the current commitfest, and I think it needs at the very least a lot
> of additional code commentary.

contrib/xml2 has always relied on libxslt for xpath functionality.
Can we do that here instead of writing, debugging, and documenting
a pile of new code?

I am sorry - I don't see it. There is nothing complex manipulation with XPath expressions.

Regards

Pavel
 

                        regards, tom lane

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-24 18:41 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> can me send your last work?

Sure, it's in the archives --
https://www.postgresql.org/message-id/20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql

Here is updated patch without default namespace support (and without XPath expression transformation).

Due last changes in parser https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd I had to use c_expr on other positions ( xmlnamespace definition).

I don't think it is limit - in 99% there will be const literal.

Regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Michael Paquier
Дата:
On Fri, Nov 25, 2016 at 3:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-11-24 18:51 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> contrib/xml2 has always relied on libxslt for xpath functionality.
>> Can we do that here instead of writing, debugging, and documenting
>> a pile of new code?
>
> I am sorry - I don't see it. There is nothing complex manipulation with
> XPath expressions.

You are missing the point here, which is to make the implementation
footprint as light as possible, especially if the added functionality
is already present in a dependency that Postgres can be linked to. OK,
libxslt can only be linked with contrib/xml2/ now, but it would be at
least worth looking at how much the current patch can be simplified
for things like transformTableExpr or XmlTableGetValue by relying on
some existing routines. Nit: I did not look at the patch in details,
but I find the size of the latest version sent, 167kB, scary as it
complicates review and increases the likeliness of bugs.
-- 
Michael



Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Michael Paquier wrote:

> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.

Here's the stat.  Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged.  So it's clearly a large patch, but there are
large docs and tests too, not just code.
doc/src/sgml/func.sgml               | 376 ++++++++++++++++++---src/backend/executor/execQual.c      | 335
+++++++++++++++++++src/backend/executor/execTuples.c   |  42 +++src/backend/nodes/copyfuncs.c        |  66
++++src/backend/nodes/equalfuncs.c      |  51 +++src/backend/nodes/nodeFuncs.c        | 100
++++++src/backend/nodes/outfuncs.c        |  51 +++src/backend/nodes/readfuncs.c        |  42
+++src/backend/optimizer/util/clauses.c|  33 ++src/backend/parser/gram.y            | 181
++++++++++-src/backend/parser/parse_coerce.c   |  33 +-src/backend/parser/parse_expr.c      | 182
+++++++++++src/backend/parser/parse_target.c   |   7 +src/backend/utils/adt/Makefile       |   2
+-src/backend/utils/adt/ruleutils.c   | 100 ++++++src/backend/utils/adt/xml.c          | 610
+++++++++++++++++++++++++++++++++++src/backend/utils/adt/xpath_parser.c| 337
+++++++++++++++++++src/backend/utils/fmgr/funcapi.c    |  13 +src/include/executor/executor.h      |   1
+src/include/executor/tableexpr.h    |  69 ++++src/include/funcapi.h                |   1
-src/include/nodes/execnodes.h       |  31 ++src/include/nodes/nodes.h            |   4 +src/include/nodes/parsenodes.h
     |  21 ++src/include/nodes/primnodes.h        |  40 +++src/include/parser/kwlist.h          |   3
+src/include/parser/parse_coerce.h   |   4 +src/include/utils/xml.h              |   2
+src/include/utils/xpath_parser.h    |  24 ++src/test/regress/expected/xml.out    | 415
++++++++++++++++++++++++src/test/regress/expected/xml_1.out | 323
+++++++++++++++++++src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++src/test/regress/sql/xml.sql
 | 170 ++++++++++33 files changed, 4019 insertions(+), 64 deletions(-)
 

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-25 3:31 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Michael Paquier wrote:

> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.

Here's the stat.  Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged.  So it's clearly a large patch, but there are
large docs and tests too, not just code.

yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function is not trivial.

lot of code is mechanical - nodes related. The really complex part is only in xml.c. There is one longer function only - the complexity is based on mapping libxml2 result to PostgreSQL result (with catching exceptions due releasing libxml2 sources).

The all changes are well isolated -  there is less risk to break some other.
 

 doc/src/sgml/func.sgml               | 376 ++++++++++++++++++---
 src/backend/executor/execQual.c      | 335 +++++++++++++++++++
 src/backend/executor/execTuples.c    |  42 +++
 src/backend/nodes/copyfuncs.c        |  66 ++++
 src/backend/nodes/equalfuncs.c       |  51 +++
 src/backend/nodes/nodeFuncs.c        | 100 ++++++
 src/backend/nodes/outfuncs.c         |  51 +++
 src/backend/nodes/readfuncs.c        |  42 +++
 src/backend/optimizer/util/clauses.c |  33 ++
 src/backend/parser/gram.y            | 181 ++++++++++-
 src/backend/parser/parse_coerce.c    |  33 +-
 src/backend/parser/parse_expr.c      | 182 +++++++++++
 src/backend/parser/parse_target.c    |   7 +
 src/backend/utils/adt/Makefile       |   2 +-
 src/backend/utils/adt/ruleutils.c    | 100 ++++++
 src/backend/utils/adt/xml.c          | 610 +++++++++++++++++++++++++++++++++++
 src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
 src/backend/utils/fmgr/funcapi.c     |  13 +
 src/include/executor/executor.h      |   1 +
 src/include/executor/tableexpr.h     |  69 ++++
 src/include/funcapi.h                |   1 -
 src/include/nodes/execnodes.h        |  31 ++
 src/include/nodes/nodes.h            |   4 +
 src/include/nodes/parsenodes.h       |  21 ++
 src/include/nodes/primnodes.h        |  40 +++
 src/include/parser/kwlist.h          |   3 +
 src/include/parser/parse_coerce.h    |   4 +
 src/include/utils/xml.h              |   2 +
 src/include/utils/xpath_parser.h     |  24 ++
 src/test/regress/expected/xml.out    | 415 ++++++++++++++++++++++++
 src/test/regress/expected/xml_1.out  | 323 +++++++++++++++++++
 src/test/regress/expected/xml_2.out  | 414 ++++++++++++++++++++++++
 src/test/regress/sql/xml.sql         | 170 ++++++++++
 33 files changed, 4019 insertions(+), 64 deletions(-)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-25 7:44 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-11-25 3:31 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Michael Paquier wrote:

> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.

Here's the stat.  Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged.  So it's clearly a large patch, but there are
large docs and tests too, not just code.

yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function is not trivial.

regress tests are about 50%

 

lot of code is mechanical - nodes related. The really complex part is only in xml.c. There is one longer function only - the complexity is based on mapping libxml2 result to PostgreSQL result (with catching exceptions due releasing libxml2 sources).

The all changes are well isolated -  there is less risk to break some other.
 

 doc/src/sgml/func.sgml               | 376 ++++++++++++++++++---
 src/backend/executor/execQual.c      | 335 +++++++++++++++++++
 src/backend/executor/execTuples.c    |  42 +++
 src/backend/nodes/copyfuncs.c        |  66 ++++
 src/backend/nodes/equalfuncs.c       |  51 +++
 src/backend/nodes/nodeFuncs.c        | 100 ++++++
 src/backend/nodes/outfuncs.c         |  51 +++
 src/backend/nodes/readfuncs.c        |  42 +++
 src/backend/optimizer/util/clauses.c |  33 ++
 src/backend/parser/gram.y            | 181 ++++++++++-
 src/backend/parser/parse_coerce.c    |  33 +-
 src/backend/parser/parse_expr.c      | 182 +++++++++++
 src/backend/parser/parse_target.c    |   7 +
 src/backend/utils/adt/Makefile       |   2 +-
 src/backend/utils/adt/ruleutils.c    | 100 ++++++
 src/backend/utils/adt/xml.c          | 610 +++++++++++++++++++++++++++++++++++
 src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
 src/backend/utils/fmgr/funcapi.c     |  13 +
 src/include/executor/executor.h      |   1 +
 src/include/executor/tableexpr.h     |  69 ++++
 src/include/funcapi.h                |   1 -
 src/include/nodes/execnodes.h        |  31 ++
 src/include/nodes/nodes.h            |   4 +
 src/include/nodes/parsenodes.h       |  21 ++
 src/include/nodes/primnodes.h        |  40 +++
 src/include/parser/kwlist.h          |   3 +
 src/include/parser/parse_coerce.h    |   4 +
 src/include/utils/xml.h              |   2 +
 src/include/utils/xpath_parser.h     |  24 ++
 src/test/regress/expected/xml.out    | 415 ++++++++++++++++++++++++
 src/test/regress/expected/xml_1.out  | 323 +++++++++++++++++++
 src/test/regress/expected/xml_2.out  | 414 ++++++++++++++++++++++++
 src/test/regress/sql/xml.sql         | 170 ++++++++++
 33 files changed, 4019 insertions(+), 64 deletions(-)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> Here is updated patch without default namespace support (and without XPath
> expression transformation).
> 
> Due last changes in parser
> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
> I had to use c_expr on other positions ( xmlnamespace definition).
> 
> I don't think it is limit - in 99% there will be const literal.

Argh.  I can't avoid the feeling that I'm missing some parser trickery
here.  We have the XMLNAMESPACES keyword and the clause-terminating
comma to protect these clauses, there must be a way to define this piece
of the grammar so that there's no conflict, without losing the freedom
in the expressions.  But I don't see how.  Now I agree that xml
namespace definitions are going to be string literals in almost all
cases (or in extra sophisticated cases, column names) ... it's probably
better to spend the bison-fu in the document expression or the column
options, or better yet the xmlexists_argument stuff.  But I don't see
possibility of improvements in any of those places, so let's put it
aside -- we can improve later, if need arises.

In any case, it looks like we can change c_expr to b_expr in a few
places, which is good because then operators work (in particular, unless
I misread the grammar, foo||bar doesn't work with c_expr and does work
with b_expr, which seems the most useful in this case).  Also, it makes
no sense to support (in the namespaces clause) DEFAULT a_expr if the
IDENT case uses only b_expr, so let's reduce both to just b_expr.

While I'm looking at node definitions, I see a few things that could use
some naming improvement.  For example, "expr" for TableExpr is a bit
unexpressive.  We could use "document_expr" there, perhaps.  "row_path"
seems fixated on the XML case and the expression be path; let's use
"row_expr" there.  And "cols" could be "column_exprs" perhaps.  (All
those renames cause fall-out in various node-related files, so let's
think carefully to avoid renaming them multiple times.)

In primnodes, you kept the comment that says "xpath".  Please update
that to not-just-XML reality.

Please fix the comment in XmlTableAddNs; NULL is no longer a valid value.

parse_expr.c has two unused variables; please remove them.

This test in ExecEvalTableExprProtected looks weird:if (i != tstate->for_ordinality_col - 1)
please change to comparing "i + 1" (convert array index into attribute
number), and invert the boolean expression, leaving the for_ordinality
case on top and the rest in the "else".  That seems easier to read.
Also, we customarily use post-increment (rownum++) instead of pre-incr.

In execQual.c I think it's neater to have ExecEvalTableExpr go before
its subroutine.  Actually, I wonder whether it is really necessary to
have a subroutine in the first place; you could just move the entire
contents of that subroutine to within the PG_TRY block instead.  The
only thing you lose is one indentation level.  I'm not sure about this
one, but it's worth considering.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-28 23:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> Here is updated patch without default namespace support (and without XPath
> expression transformation).
>
> Due last changes in parser
> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
> I had to use c_expr on other positions ( xmlnamespace definition).
>
> I don't think it is limit - in 99% there will be const literal.

Argh.  I can't avoid the feeling that I'm missing some parser trickery
here.  We have the XMLNAMESPACES keyword and the clause-terminating
comma to protect these clauses, there must be a way to define this piece
of the grammar so that there's no conflict, without losing the freedom
in the expressions.  But I don't see how.  Now I agree that xml
namespace definitions are going to be string literals in almost all
cases (or in extra sophisticated cases, column names) ... it's probably
better to spend the bison-fu in the document expression or the column
options, or better yet the xmlexists_argument stuff.  But I don't see
possibility of improvements in any of those places, so let's put it
aside -- we can improve later, if need arises.

The problem is in unreserved keyword "PASSING" probably.
 

In any case, it looks like we can change c_expr to b_expr in a few
places, which is good because then operators work (in particular, unless
I misread the grammar, foo||bar doesn't work with c_expr and does work
with b_expr, which seems the most useful in this case).  Also, it makes
no sense to support (in the namespaces clause) DEFAULT a_expr if the
IDENT case uses only b_expr, so let's reduce both to just b_expr.

I changed all what was possible to b_expr.
 

While I'm looking at node definitions, I see a few things that could use
some naming improvement.  For example, "expr" for TableExpr is a bit
unexpressive.  We could use "document_expr" there, perhaps.  "row_path"
seems fixated on the XML case and the expression be path; let's use
"row_expr" there.  And "cols" could be "column_exprs" perhaps.  (All
those renames cause fall-out in various node-related files, so let's
think carefully to avoid renaming them multiple times.)

Columns is not only expr - list - so I renamed it to "columns". Other renamed like you proposed
 

In primnodes, you kept the comment that says "xpath".  Please update
that to not-just-XML reality.

fixed
 

Please fix the comment in XmlTableAddNs; NULL is no longer a valid value.

fixed
 

parse_expr.c has two unused variables; please remove them.

fixed
 

This test in ExecEvalTableExprProtected looks weird:
        if (i != tstate->for_ordinality_col - 1)
please change to comparing "i + 1" (convert array index into attribute
number), and invert the boolean expression, leaving the for_ordinality
case on top and the rest in the "else".  That seems easier to read.
Also, we customarily use post-increment (rownum++) instead of pre-incr.


fiexed
 
In execQual.c I think it's neater to have ExecEvalTableExpr go before
its subroutine.  Actually, I wonder whether it is really necessary to
have a subroutine in the first place; you could just move the entire
contents of that subroutine to within the PG_TRY block instead.  The
only thing you lose is one indentation level.  I'm not sure about this
one, but it's worth considering.

done
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Regards

Pavel

Вложения

Re: patch: function xmltable

От
Craig Ringer
Дата:
On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> The problem is in unreserved keyword "PASSING" probably.

Yeah, I think that's what I hit when trying to change it.

Can't you just parenthesize the expression to use operators like ||
etc? If so, not a big deal.




-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> The problem is in unreserved keyword "PASSING" probably.

Yeah, I think that's what I hit when trying to change it.

Can't you just parenthesize the expression to use operators like ||
etc? If so, not a big deal.


???

 



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:
> 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
> 
> > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com>
> > wrote:
> >
> > > The problem is in unreserved keyword "PASSING" probably.
> >
> > Yeah, I think that's what I hit when trying to change it.
> >
> > Can't you just parenthesize the expression to use operators like ||
> > etc? If so, not a big deal.
> >
> ???

"'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
to manually add parens around any expressions that they want to use.
That's not necessary most of the time since we've been able to use
b_expr in most places.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
>
> > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com>
> > wrote:
> >
> > > The problem is in unreserved keyword "PASSING" probably.
> >
> > Yeah, I think that's what I hit when trying to change it.
> >
> > Can't you just parenthesize the expression to use operators like ||
> > etc? If so, not a big deal.
> >
> ???

"'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
to manually add parens around any expressions that they want to use.
That's not necessary most of the time since we've been able to use
b_expr in most places.

Now I understand

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Pavel Stehule
Дата:
<p dir="ltr"><p dir="ltr">Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <<a
href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>:<br/> ><br /> ><br /> ><br /> >
2016-11-3013:38 GMT+01:00 Alvaro Herrera <<a
href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>:<br/> >><br /> >> Pavel Stehule
wrote:<br/> >> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <<a
href="mailto:craig@2ndquadrant.com">craig@2ndquadrant.com</a>>:<br/> >> ><br /> >> > > On 30
November2016 at 05:36, Pavel Stehule <<a href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>><br />
>>> > wrote:<br /> >> > ><br /> >> > > > The problem is in unreserved keyword
"PASSING"probably.<br /> >> > ><br /> >> > > Yeah, I think that's what I hit when trying to
changeit.<br /> >> > ><br /> >> > > Can't you just parenthesize the expression to use operators
like||<br /> >> > > etc? If so, not a big deal.<br /> >> > ><br /> >> > ???<br />
>><br/> >> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users<br /> >> to
manuallyadd parens around any expressions that they want to use.<br /> >> That's not necessary most of the time
sincewe've been able to use<br /> >> b_expr in most places.<br /> ><p dir="ltr">still there are one c_expr,
butwithout new reserved word there are not change to reduce it.<br /><p dir="ltr">><br /> > Now I understand<br
/>><br /> > Regards<br /> ><br /> > Pavel<br /> >  <br /> >><br /> >><br /> >> --<br
/>>> Álvaro Herrera                <a href="https://www.2ndQuadrant.com/">https://www.2ndQuadrant.com/</a><br />
>>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services<br /> ><br /> > 

Re: patch: function xmltable

От
Haribabu Kommi
Дата:


On Thu, Dec 1, 2016 at 2:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <pavel.stehule@gmail.com>:
>
>
>
> 2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>>
>> Pavel Stehule wrote:
>> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
>> >
>> > > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com>
>> > > wrote:
>> > >
>> > > > The problem is in unreserved keyword "PASSING" probably.
>> > >
>> > > Yeah, I think that's what I hit when trying to change it.
>> > >
>> > > Can't you just parenthesize the expression to use operators like ||
>> > > etc? If so, not a big deal.
>> > >
>> > ???
>>
>> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
>> to manually add parens around any expressions that they want to use.
>> That's not necessary most of the time since we've been able to use
>> b_expr in most places.
>

still there are one c_expr, but without new reserved word there are not change to reduce it.



Moved to next CF with the same status (ready for committer).

Regards,
Hari Babu
Fujitsu Australia

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Hm, you omitted tableexpr.h from the v15 patch ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-02 17:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Hm, you omitted tableexpr.h from the v15 patch ...

I am sorry

should be ok now

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Here's version 17.  I have made significant changes here.

1. Restructure the execQual code.  Instead of a PG_TRY wrapper, I have
split this code in three pieces; there's the main code with the PG_TRY
wrappers and is mainly in charge of the builderCxt pointer.  In the
previous coding there was a shim that examined builderCxt but was not
responsible for setting it up, which was ugly.  The second part is the
"initializer" which sets the row and column filters and does namespace
processing.  The third part is the "FetchRow" logic.  It seems to me
much cleaner this way.

2. rename the "builder" stuff to use the "routine" terminology.  This is
in line with what we do for other function-pointer-filled structs, such
as FdwRoutine, IndexAmRoutine etc.  I also cleaned up the names a bit
more.

3. Added a magic number to the table builder context struct, so that we
can barf appropriately.  This is in line with PgXmlErrorContext --
mostly for future-proofing.  I didn't test this too hard.  Also, moved
the XmlTableContext struct declaration nearer the top of the file, as is
customary.  (We don't really need it that way, since the functions are
all declared taking void *, but it seems cleaner to me anyway).

4. I added, edited, and fixed a large number of code comments.

This is looking much better now, but it still needs at least the
following changes.

First, we need to fix is the per_rowset_memcxt thingy.  I think the way
it's currently being used is rather ugly; it looks to me like the memory
context does not belong into the XmlTableContext struct at all.
Instead, the executor code should keep the memcxt pointer in a state
struct of its own, and it should be the executor's responsibility to
change to the appropriate context before calling the table builder
functions.  In particular, this means that the table context can no
longer be a void * pointer; it needs to be a struct that's defined by
the executor (probably a primnodes.h one).  The void * pointer is
stashed inside that struct.  Also, the "routine" pointer should not be
part of the void * struct, but of the executor's struct.  So the
execQual code can switch to the memory context, and destroy it
appropriately.

Second, we should make gram.y set a new "function type" value in the
TableExpr it creates, so that the downstream code (transformTableExpr,
ExecInitExpr, ruleutils.c) really knows that the given function is
XmlTableExpr, instead of guessing just because it's the only implemented
case.  Probably this "function type" is an enum (currently with a single
value TableExprTypeXml or something like that) in primnodes.

Finally, there's the pending task of renaming and moving
ExecTypeFromTableExpr to some better place.  Not sure that moving it
back to nodeFuncs is a nice idea.  Looks to me like calling it from
ExprTypmod is a rather ugly idea.

Hmm, ruleutils ... not sure what to think of this one.

The typedefs.list changes are just used to pgindent the affected code
correctly.  It's not for commit.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:
Hi

2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Here's version 17.  I have made significant changes here.

1. Restructure the execQual code.  Instead of a PG_TRY wrapper, I have
split this code in three pieces; there's the main code with the PG_TRY
wrappers and is mainly in charge of the builderCxt pointer.  In the
previous coding there was a shim that examined builderCxt but was not
responsible for setting it up, which was ugly.  The second part is the
"initializer" which sets the row and column filters and does namespace
processing.  The third part is the "FetchRow" logic.  It seems to me
much cleaner this way.

2. rename the "builder" stuff to use the "routine" terminology.  This is
in line with what we do for other function-pointer-filled structs, such
as FdwRoutine, IndexAmRoutine etc.  I also cleaned up the names a bit
more.

3. Added a magic number to the table builder context struct, so that we
can barf appropriately.  This is in line with PgXmlErrorContext --
mostly for future-proofing.  I didn't test this too hard.  Also, moved
the XmlTableContext struct declaration nearer the top of the file, as is
customary.  (We don't really need it that way, since the functions are
all declared taking void *, but it seems cleaner to me anyway).

4. I added, edited, and fixed a large number of code comments.

This is looking much better now, but it still needs at least the
following changes.

First, we need to fix is the per_rowset_memcxt thingy.  I think the way
it's currently being used is rather ugly; it looks to me like the memory
context does not belong into the XmlTableContext struct at all.
Instead, the executor code should keep the memcxt pointer in a state
struct of its own, and it should be the executor's responsibility to
change to the appropriate context before calling the table builder
functions.  In particular, this means that the table context can no
longer be a void * pointer; it needs to be a struct that's defined by
the executor (probably a primnodes.h one).  The void * pointer is
stashed inside that struct.  Also, the "routine" pointer should not be
part of the void * struct, but of the executor's struct.  So the
execQual code can switch to the memory context, and destroy it
appropriately.

Second, we should make gram.y set a new "function type" value in the
TableExpr it creates, so that the downstream code (transformTableExpr,
ExecInitExpr, ruleutils.c) really knows that the given function is
XmlTableExpr, instead of guessing just because it's the only implemented
case.  Probably this "function type" is an enum (currently with a single
value TableExprTypeXml or something like that) in primnodes.

It has sense - I was not sure about it - because currently it is only one value, you mentioned it.
 

Finally, there's the pending task of renaming and moving
ExecTypeFromTableExpr to some better place.  Not sure that moving it
back to nodeFuncs is a nice idea.  Looks to me like calling it from
ExprTypmod is a rather ugly idea.

The code is related to prim nodes - it is used more times than in executor.

Hmm, ruleutils ... not sure what to think of this one.

it is little bit more complex - but it is related to complexity of XMLTABLE
 

The typedefs.list changes are just used to pgindent the affected code
correctly.  It's not for commit.

The documentation is very precious. Nice

+    /* XXX OK to do this?  looks a bit out of place ... */
+    assign_record_type_typmod(typeInfo);

I am thinking it is ok. It is tupdesc without fixed typid, typname used in returned value - you should to register this tupdesc in typcache.

Regards

Pavel



--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy.  I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions.  In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one).  The void * pointer is
> > stashed inside that struct.  Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct.  So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case.  Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
> 
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.

True.  This is a minor point.

Are you able to do the memory context change I describe?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-03 16:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy.  I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions.  In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one).  The void * pointer is
> > stashed inside that struct.  Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct.  So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case.  Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
>
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.

True.  This is a minor point.

Are you able to do the memory context change I describe?

I am not sure if I understand well to your ideas - please, check attached patch.

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-12-03 16:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy.  I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions.  In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one).  The void * pointer is
> > stashed inside that struct.  Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct.  So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case.  Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
>
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.

True.  This is a minor point.

Are you able to do the memory context change I describe?

I am not sure if I understand well to your ideas - please, check attached patch.

attached patch without your patch  0001

Regards

Pavel
 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:
> 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

> > I am not sure if I understand well to your ideas - please, check attached
> > patch.

Thanks, that's what I meant, but I think you went a bit overboard
creating new functions in execQual -- seems to me it would work just
fine to have the memory switches in the same function, rather than
having a number of separate functions just to change the context then
call the method.  Please remove these shim functions.

Also, you forgot to remove the now-unused per_rowset_memcxt struct member.

Also, please rename "rc" to something more meaningful -- maybe
"rowcount" is good enough.  And "doc" would perhaps be better as
"document".

I'm not completely sure the structs are really sensible yet.  I may do
some more changes tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-05 0:45 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

> > I am not sure if I understand well to your ideas - please, check attached
> > patch.

Thanks, that's what I meant, but I think you went a bit overboard
creating new functions in execQual -- seems to me it would work just
fine to have the memory switches in the same function, rather than
having a number of separate functions just to change the context then
call the method.  Please remove these shim functions.

done
 

Also, you forgot to remove the now-unused per_rowset_memcxt struct member.

done
 

Also, please rename "rc" to something more meaningful -- maybe
"rowcount" is good enough.  And "doc" would perhaps be better as
"document".

done

Regards

Pavel

I'm not completely sure the structs are really sensible yet.  I may do
some more changes tomorrow.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Here's v21.

* I changed the grammar by moving the NOT NULL to the column options,
and removing the IsNotNull production.  It wasn't nice that "NOT NULL
DEFAULT 0" was not accepted, which it is with the new representation.

* The tuple that's returned is natively a TupleTableSlot inside the
table builder, not directly a HeapTuple.  That stuff was ugly and wasn't
using the proper abstraction anyway.

* I changed the signatures of the methods so that they receive
TableExprState, and restructured the "opaque" data to be inside
TableExprState.  Now we don't need to have things such as the tupdesc or
the input functions be repeated in the opaque struct.  Instead they
belong to the TableExprState and the methods can read them from there.

I managed to break the case with no COLUMNS.  Probably related to the
tupdesc changes.  It now crashes the regression test.  Too tired to
debug now; care to take a look?  The other stuff seems to run fine,
though of course the regression test crashes in the middle, so perhaps
there are other problems.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-07 8:14 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Here's v21.

* I changed the grammar by moving the NOT NULL to the column options,
and removing the IsNotNull production.  It wasn't nice that "NOT NULL
DEFAULT 0" was not accepted, which it is with the new representation.

* The tuple that's returned is natively a TupleTableSlot inside the
table builder, not directly a HeapTuple.  That stuff was ugly and wasn't
using the proper abstraction anyway.

* I changed the signatures of the methods so that they receive
TableExprState, and restructured the "opaque" data to be inside
TableExprState.  Now we don't need to have things such as the tupdesc or
the input functions be repeated in the opaque struct.  Instead they
belong to the TableExprState and the methods can read them from there.

I managed to break the case with no COLUMNS.  Probably related to the
tupdesc changes.  It now crashes the regression test.  Too tired to
debug now; care to take a look?  The other stuff seems to run fine,
though of course the regression test crashes in the middle, so perhaps
there are other problems.

I fixed two issues.

1. there are not columns data when there are not any explicit column - fixed

2. there was reverse setting in NOT NULL flag

all tests passed now

Regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> I fixed two issues.
> 
> 2. there was reverse setting in NOT NULL flag

Ah-hah, that was silly, thanks.

> 1. there are not columns data when there are not any explicit column - fixed

Hmm.  Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much.  It seems way too magical.  I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL.  This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.

Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag

Ah-hah, that was silly, thanks.

> 1. there are not columns data when there are not any explicit column - fixed

Hmm.  Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much.  It seems way too magical.  I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL.  This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.

I prefer this way against second described. The implementation should be in table builder routines, not in executor.

sending new update

Regards

Pavel
 

Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).


 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:
> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > Hmm.  Now that I see how this works, by having the GetValue "guess" what
> > is going on and have a special case for it, I actually don't like it
> > very much.  It seems way too magical.  I think we should do away with
> > the "if column is NULL" case in GetValue, and instead inject a column
> > during transformTableExpr if columns is NIL.  This has implications on
> > ExecInitExpr too, which currently checks for an empty column list -- it
> > would no longer have to do so.
> 
> I prefer this way against second described. The implementation should be in
> table builder routines, not in executor.

Well, given the way you have implemented it, I would prefer the original
too.  But your v23 is not what I meant.  Essentially what you do in v23
is to communicate the lack of COLUMNS clause in a different way --
previously it was "ncolumns = 0", now it's "is_auto_col=true".  It's
still "magic".  It's not an improvement.

What I want to happen is that there is no magic at all; it's up to
transformExpr to make sure that when COLUMNS is empty, one column
appears and it must not be a magic column that makes the xml.c code act
differently, but rather to xml.c it should appear that this is just a
normal column that happens to return the entire row.  If I say "COLUMNS
foo PATH '/'" I should be able to obtain a similar behavior (except that
in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get
XML at all but rather weird text where all tags have been stripped out,
which is very strange.  I would expect the tags to be preserved if the
output type is XML.  Maybe the tag-stripping behavior should occur if
the output type is some type of text.)


I still have to figure out how to fix the tupledesc thing.  What we have
now is not good.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-07 20:50 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > Hmm.  Now that I see how this works, by having the GetValue "guess" what
> > is going on and have a special case for it, I actually don't like it
> > very much.  It seems way too magical.  I think we should do away with
> > the "if column is NULL" case in GetValue, and instead inject a column
> > during transformTableExpr if columns is NIL.  This has implications on
> > ExecInitExpr too, which currently checks for an empty column list -- it
> > would no longer have to do so.
>
> I prefer this way against second described. The implementation should be in
> table builder routines, not in executor.

Well, given the way you have implemented it, I would prefer the original
too.  But your v23 is not what I meant.  Essentially what you do in v23
is to communicate the lack of COLUMNS clause in a different way --
previously it was "ncolumns = 0", now it's "is_auto_col=true".  It's
still "magic".  It's not an improvement.

is_auto_col is used primary for asserting. The table builder has information for decision in parameter path, when path is NULL.

Hard to say, if this info should be assigned to column or to table. In both locations has sense. But somewhere should be some flag.
 

What I want to happen is that there is no magic at all; it's up to
transformExpr to make sure that when COLUMNS is empty, one column
appears and it must not be a magic column that makes the xml.c code act
differently, but rather to xml.c it should appear that this is just a
normal column that happens to return the entire row.  If I say "COLUMNS
foo PATH '/'" I should be able to obtain a similar behavior (except that
in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get
XML at all but rather weird text where all tags have been stripped out,
which is very strange.  I would expect the tags to be preserved if the
output type is XML.  Maybe the tag-stripping behavior should occur if
the output type is some type of text.)

I am doing this. Just I using NULL for PATH.
 


I still have to figure out how to fix the tupledesc thing.  What we have
now is not good.

cannot be moved to nodefunc?

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-07 20:37 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag

Ah-hah, that was silly, thanks.

> 1. there are not columns data when there are not any explicit column - fixed

Hmm.  Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much.  It seems way too magical.  I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL.  This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.

I prefer this way against second described. The implementation should be in table builder routines, not in executor.

sending new update

new update - no functional changes, just unbreaking after last changes in master.

Regards

Pavel
 

Regards

Pavel
 

Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).


 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Вложения

Re: [HACKERS] patch: function xmltable

От
Pavel Stehule
Дата:


2016-12-18 16:27 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-12-07 20:37 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag

Ah-hah, that was silly, thanks.

> 1. there are not columns data when there are not any explicit column - fixed

Hmm.  Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much.  It seems way too magical.  I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL.  This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.

I prefer this way against second described. The implementation should be in table builder routines, not in executor.

sending new update

new update - no functional changes, just unbreaking after last changes in master.

another update - lot of cleaning

Regards

Pavel
 

Regards

Pavel
 

Regards

Pavel
 

Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).


 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Вложения

Re: [HACKERS] patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> another update - lot of cleaning

Ah, the tupledesc stuff in this one appears much more reasonable to me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

От
Alvaro Herrera
Дата:
Pavel Stehule wrote:

> another update - lot of cleaning

Thanks.

The more I look at this, the less I like using NameArgExpr for
namespaces.  It looks all wrong to me, and it causes ugly code all over.
Maybe I just need to look at it a bit longer.



-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> The more I look at this, the less I like using NameArgExpr for
> namespaces.  It looks all wrong to me, and it causes ugly code all over.
> Maybe I just need to look at it a bit longer.

I think it'd be cleaner to use ResTarget for the namespaces, like
xml_attribute_el does, and split the names from actual exprs in the same
way.  So things like ExecInitExpr become simpler because you just
recurse to initialize the list, without having to examine each element
individually.  tabexprInitialize can just do forboth().

The main reason I don't like abusing NamedArgExpr is that the whole
comment that explains it becomes a lie.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения