Обсуждение: libpq: PQfnumber overload for not null-terminated strings

Поиск
Список
Период
Сортировка

libpq: PQfnumber overload for not null-terminated strings

От
Ivan Trofimov
Дата:
Hi!

This one comes from C++'s `std::string_view`: being just a `const char*` 
plus the `size`, it's a very convenient type to use in interfaces which 
don't need an ownership of the data passed in.

Unfortunately, `PQfnumber` expects a null-terminated string, which 
`std::string_view` can not guarantee, and this limitations affects the 
interfaces built on top of libpq.

Would you be willing to review a patch that adds an `PQfnumber` overload 
that takes a `field_name` size as well?



Re: libpq: PQfnumber overload for not null-terminated strings

От
Tom Lane
Дата:
Ivan Trofimov <i.trofimow@yandex.ru> writes:
> Would you be willing to review a patch that adds an `PQfnumber` overload 
> that takes a `field_name` size as well?

I'm a little skeptical of this idea.  If you need counted strings
for PQfnumber, wouldn't you need them for every single other
string-based API in libpq as well?  That's not a lift that I think
we'd want to undertake.

            regards, tom lane



Re: libpq: PQfnumber overload for not null-terminated strings

От
Ivan Trofimov
Дата:
Thanks for the quick reply.

> If you need counted strings
> for PQfnumber, wouldn't you need them for every single other
> string-based API in libpq as well?

No, not really.

Thing is, out of all the functions listed in "34.3.2. Retrieving Query 
Result Information" and "34.3.3. Retrieving Other Result Information" 
PQfnumber is the only (well, technically also PQprint) one that takes a 
string as an input.

As far as I know PQfnumber is the only portable way to implement "give 
me the the value at this row with this 'column_name'", which is an 
essential feature for any kind of client-side parsing.
Right now as a library writer in a higher-level language I'm forced to 
either
* Sacrifice performance to ensure 'column_name' is null-terminated 
(that's what some bindings in Rust do)
* Sacrifice interface quality by requiring a null-terminated string, 
which is not necessary idiomatic (that's what we do)
* Sacrifice usability by requiring a user to guarantee that the 
'string_view' provided is null-terminated (that's what libpqxx does, for 
example)

I don't think it's _that_ big of a deal, but could it be QoL improvement 
nice enough to be worth of a tiny addition into libpq interface?



Re: libpq: PQfnumber overload for not null-terminated strings

От
Tom Lane
Дата:
Ivan Trofimov <i.trofimow@yandex.ru> writes:
>> If you need counted strings
>> for PQfnumber, wouldn't you need them for every single other
>> string-based API in libpq as well?

> No, not really.

> Thing is, out of all the functions listed in "34.3.2. Retrieving Query 
> Result Information" and "34.3.3. Retrieving Other Result Information" 
> PQfnumber is the only (well, technically also PQprint) one that takes a 
> string as an input.

I think that's a mighty myopic definition of which APIs would need
counted-string support if we were to make that a thing in libpq.
Just for starters, why are you only concerned with processing a
query result, and not with the functions needed to send the query?

> Right now as a library writer in a higher-level language I'm forced to 
> either
> * Sacrifice performance to ensure 'column_name' is null-terminated 
> (that's what some bindings in Rust do)

I'd go with that.  You would have a very hard time convincing me that
the per-query overhead that building a null-terminated string adds
is even measurable compared to the time needed to send, process, and
receive the query.

            regards, tom lane



Re: libpq: PQfnumber overload for not null-terminated strings

От
Ivan Trofimov
Дата:
>> Right now as a library writer in a higher-level language I'm forced to
>> either
>> * Sacrifice performance to ensure 'column_name' is null-terminated
>> (that's what some bindings in Rust do)
> 
> I'd go with that.  You would have a very hard time convincing me that
> the per-query overhead

I see now that I failed to express myself clearly: it's not a per-query 
overhead, but rather a per-result-field one.


Given a code like this (in pseudo-code)

result = ExecuteQuery(some_query)
for (row in result):
     a = row["some_column_name"]
     b = row["some_other_column_name"]
     ...

a field-name string should be null-terminated for every field accessed.


There absolutely are ways to write the same in a more performant way and 
avoid repeatedly calling PQfnumber altogether, but that I as a library 
writer can't control.

In my quickly-hacked-together test just null-terminating a user-provided 
string takes ~14% of total CPU time (and PQfnumber itself takes ~30%, 
but oh well), please see the code and flamegraph attached.

Вложения

Re: libpq: PQfnumber overload for not null-terminated strings

От
Jelte Fennema-Nio
Дата:
On Tue, 27 Feb 2024 at 00:31, Ivan Trofimov <i.trofimow@yandex.ru> wrote:
> I see now that I failed to express myself clearly: it's not a per-query
> overhead, but rather a per-result-field one.

I'm fairly sympathetic to decreasing the overhead of any per-row
operation. And looking at the code, it doesn't surprise me that
PQfnumber shows up so big in your profile. I think it would probably
make sense to introduce a PQfnumber variant that does not do the
downcasing/quote handling (called e.g. PQfnumberRaw).

However, I do think you could convert this per-row overhead in your
case to per-query overhead by caching the result of PQfnumber for each
unique C++ string_view. Afaict that should solve your performance
problem.



Re: libpq: PQfnumber overload for not null-terminated strings

От
Ivan Trofimov
Дата:
> However, I do think you could convert this per-row overhead in your
> case to per-query overhead by caching the result of PQfnumber for each
> unique C++ string_view. Afaict that should solve your performance
> problem.

Absolutely, you're right.

The problem here is not that it's impossible to write it in a performant 
way, but rather that it's impossible to do so in a performant _and_ 
clean way given the convenient abstractions wrapper-libraries provide: 
here's a `Result`, which consists of `Row`s, which in turn consist of 
`Field`s.
The most natural and straightforward way to iterate over a `Result` 
would be in the lines of that loop, and people do write code like that 
because it's what they expect to just work given the abstractions (and 
it does, it's just slow).
Caching the result of PQfnumber could be done, but would result in 
somewhat of a mess on a call-site.


I like your idea of 'PQfnumberRaw': initially i was only concerned about 
a null-terminated string requirement affecting my interfaces (because 
users complained about that to me, 
https://github.com/userver-framework/userver/issues/494), but I think 
PQfnumberRaw could solve both problems (PQfnumber being a bottleneck 
when called repeatedly and a null-terminated string requirement) 
simultaneously.



Re: libpq: PQfnumber overload for not null-terminated strings

От
Jelte Fennema-Nio
Дата:
On Tue, 27 Feb 2024 at 15:49, Ivan Trofimov <i.trofimow@yandex.ru> wrote:
> Caching the result of PQfnumber could be done, but would result in
> somewhat of a mess on a call-site.

To be clear I meant your wrapper around libpq could internally cache
this, then the call sites of users of your wrapper would not need to
be changed. i.e. your Result could contain a cache of
columnname->columnumber mapping that you know because of previous
calls to PQfnumber on the same Result.

> I like your idea of 'PQfnumberRaw': initially i was only concerned about
> a null-terminated string requirement affecting my interfaces (because
> users complained about that to me,
> https://github.com/userver-framework/userver/issues/494), but I think
> PQfnumberRaw could solve both problems (PQfnumber being a bottleneck
> when called repeatedly and a null-terminated string requirement)
> simultaneously.

Feel free to send a patch for this.