Re: Extract numeric filed in JSONB more effectively

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Extract numeric filed in JSONB more effectively
Дата
Msg-id CAKU4AWrxBXomhGcubsLqBj4FRbOf1AEXqpwBsm-mGJEJ0G_ypA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Extract numeric filed in JSONB more effectively  (Chapman Flack <chap@anastigmatix.net>)
Ответы Re: Extract numeric filed in JSONB more effectively  (Chapman Flack <chap@anastigmatix.net>)
Список pgsql-hackers
(Sorry for leaving this discussion for such a long time,  how times fly!) 

On Sun, Aug 27, 2023 at 6:28 AM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-22 08:16, Chapman Flack wrote:
> On 2023-08-22 01:54, Andy Fan wrote:
>> After we label it, we will get error like this:
>>
>> select (a->'a')::int4 from m;
>> ERROR:  cannot display a value of type internal
>
> Without looking in depth right now, I would double-check
> what relabel node is being applied at the result. The idea,
> of course, was to relabel the result as the expected result

as I suspected, looking at v10-0003, there's this:

+   fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
+                                       0, InvalidOid,
COERCE_IMPLICIT_CAST);

compared to the example I had sent earlier:

On 2023-08-18 17:02, Chapman Flack wrote:
>     expr = (Expr *)makeRelabelType((Expr *)fexpr,
>       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);

The key difference: this is the label going on the result type
of the function we are swapping in.

I'm feeling we have some understanding gap in this area, let's
see what it is.  Suppose the original query is:

numeric(jsonb_object_field(v_jsonb, text)) -> numeric.

without the patch 003,  the rewritten query is:
jsonb_object_field_type(NUMERICOID,  v_jsonb, text) -> NUMERIC. 

However the declared type of jsonb_object_field_type is:

jsonb_object_field_type(internal, jsonb, text) -> internal. 

So the situation is:  a).  We input a CONST(type=OIDOID, ..) for an
internal argument.  b).  We return a NUMERIC type which matches
the original query c).  result type NUMERIC doesn't match the declared
type  'internal'  d).  it doesn't match the  run-time type of internal 
argument which is OID. 

case a) is fixed by RelableType.  case b) shouldn't be treat as an
issue.  I thought you wanted to address the case c), and patch 
003 tries to fix it, then the ERROR above.  At last I realized case 
c) isn't the one you want to fix.  case d) shouldn't be requirement
at the first place IIUC. 

So your new method is:
1. jsonb_{op}_start() ->  internal  (internal actually JsonbValue). 
2. jsonb_finish_{type}(internal, ..) -> type.   (internal is JsonbValue ). 

This avoids the case a) at the very beginning.  I'd like to provides
patches for both solutions for comparison.  Any other benefits of
this method I am missing? 
 
Two more minor differences: (1) the node you get from
makeRelabelType is an Expr, but not really a FuncExpr. Casting
it to FuncExpr is a bit bogus. Also, the third argument to
makeRelabelType is a typmod, and I believe the "not-modified"
typmod is -1, not 0.

My fault, you are right. 
 

On 2023-08-21 06:50, Andy Fan wrote:
> I'm not very excited with this manner, reasons are: a).  It will have
> to emit more steps in ExprState->steps which will be harmful for
> execution. The overhead  is something I'm not willing to afford.

I would be open to a performance comparison, but offhand I am not
sure whether the overhead of another step or two in an ExprState
is appreciably more than some of the overhead in the present patch,
such as the every-time-through fcinfo initialization buried in
DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
new argument values slammed into it each time through.

fcinfo initialization in DirectFunctionCall1 is an interesting point!
so  I am persuaded the extra steps in  ExprState may not be 
worse than the current way due to the "every-time-through 
fcinfo initialization" (in which case the memory is allocated 
once in heap rather than every time in stack).   I can do a 
comparison at last to see if we can find some other interesting
findings.  

 
I would not underestimate the benefit of reducing the code
duplication and keeping the patch as clear as possible.
The key contributions of the patch are getting a numeric or
boolean efficiently out of the JSON operation. Getting from
numeric to int or float are things the system already does
well.

True, reusing the casting system should be better than hard-code
the casting function manually.  I'd apply this on both methods. 
 
A patch that focuses on what it contributes, and avoids
redoing things the system already can do--unless the duplication
can be shown to have a strong performance benefit--is easier to
review and probably to get integrated.

Agreed.  

At last, thanks for the great insights and patience!

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: persist logical slots to disk during shutdown checkpoint
Следующее
От: Peter Smith
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node