Обсуждение: libpq: PQfnumber overload for not null-terminated strings
Hi! This one comes from C++'s `std::string_view`: being just a `const char*` plus the `size`, it's a very convenient type to use in interfaces which don't need an ownership of the data passed in. Unfortunately, `PQfnumber` expects a null-terminated string, which `std::string_view` can not guarantee, and this limitations affects the interfaces built on top of libpq. Would you be willing to review a patch that adds an `PQfnumber` overload that takes a `field_name` size as well?
Ivan Trofimov <i.trofimow@yandex.ru> writes: > Would you be willing to review a patch that adds an `PQfnumber` overload > that takes a `field_name` size as well? I'm a little skeptical of this idea. If you need counted strings for PQfnumber, wouldn't you need them for every single other string-based API in libpq as well? That's not a lift that I think we'd want to undertake. regards, tom lane
Thanks for the quick reply. > If you need counted strings > for PQfnumber, wouldn't you need them for every single other > string-based API in libpq as well? No, not really. Thing is, out of all the functions listed in "34.3.2. Retrieving Query Result Information" and "34.3.3. Retrieving Other Result Information" PQfnumber is the only (well, technically also PQprint) one that takes a string as an input. As far as I know PQfnumber is the only portable way to implement "give me the the value at this row with this 'column_name'", which is an essential feature for any kind of client-side parsing. Right now as a library writer in a higher-level language I'm forced to either * Sacrifice performance to ensure 'column_name' is null-terminated (that's what some bindings in Rust do) * Sacrifice interface quality by requiring a null-terminated string, which is not necessary idiomatic (that's what we do) * Sacrifice usability by requiring a user to guarantee that the 'string_view' provided is null-terminated (that's what libpqxx does, for example) I don't think it's _that_ big of a deal, but could it be QoL improvement nice enough to be worth of a tiny addition into libpq interface?
Ivan Trofimov <i.trofimow@yandex.ru> writes: >> If you need counted strings >> for PQfnumber, wouldn't you need them for every single other >> string-based API in libpq as well? > No, not really. > Thing is, out of all the functions listed in "34.3.2. Retrieving Query > Result Information" and "34.3.3. Retrieving Other Result Information" > PQfnumber is the only (well, technically also PQprint) one that takes a > string as an input. I think that's a mighty myopic definition of which APIs would need counted-string support if we were to make that a thing in libpq. Just for starters, why are you only concerned with processing a query result, and not with the functions needed to send the query? > Right now as a library writer in a higher-level language I'm forced to > either > * Sacrifice performance to ensure 'column_name' is null-terminated > (that's what some bindings in Rust do) I'd go with that. You would have a very hard time convincing me that the per-query overhead that building a null-terminated string adds is even measurable compared to the time needed to send, process, and receive the query. regards, tom lane
>> Right now as a library writer in a higher-level language I'm forced to >> either >> * Sacrifice performance to ensure 'column_name' is null-terminated >> (that's what some bindings in Rust do) > > I'd go with that. You would have a very hard time convincing me that > the per-query overhead I see now that I failed to express myself clearly: it's not a per-query overhead, but rather a per-result-field one. Given a code like this (in pseudo-code) result = ExecuteQuery(some_query) for (row in result): a = row["some_column_name"] b = row["some_other_column_name"] ... a field-name string should be null-terminated for every field accessed. There absolutely are ways to write the same in a more performant way and avoid repeatedly calling PQfnumber altogether, but that I as a library writer can't control. In my quickly-hacked-together test just null-terminating a user-provided string takes ~14% of total CPU time (and PQfnumber itself takes ~30%, but oh well), please see the code and flamegraph attached.
Вложения
On Tue, 27 Feb 2024 at 00:31, Ivan Trofimov <i.trofimow@yandex.ru> wrote: > I see now that I failed to express myself clearly: it's not a per-query > overhead, but rather a per-result-field one. I'm fairly sympathetic to decreasing the overhead of any per-row operation. And looking at the code, it doesn't surprise me that PQfnumber shows up so big in your profile. I think it would probably make sense to introduce a PQfnumber variant that does not do the downcasing/quote handling (called e.g. PQfnumberRaw). However, I do think you could convert this per-row overhead in your case to per-query overhead by caching the result of PQfnumber for each unique C++ string_view. Afaict that should solve your performance problem.
> However, I do think you could convert this per-row overhead in your > case to per-query overhead by caching the result of PQfnumber for each > unique C++ string_view. Afaict that should solve your performance > problem. Absolutely, you're right. The problem here is not that it's impossible to write it in a performant way, but rather that it's impossible to do so in a performant _and_ clean way given the convenient abstractions wrapper-libraries provide: here's a `Result`, which consists of `Row`s, which in turn consist of `Field`s. The most natural and straightforward way to iterate over a `Result` would be in the lines of that loop, and people do write code like that because it's what they expect to just work given the abstractions (and it does, it's just slow). Caching the result of PQfnumber could be done, but would result in somewhat of a mess on a call-site. I like your idea of 'PQfnumberRaw': initially i was only concerned about a null-terminated string requirement affecting my interfaces (because users complained about that to me, https://github.com/userver-framework/userver/issues/494), but I think PQfnumberRaw could solve both problems (PQfnumber being a bottleneck when called repeatedly and a null-terminated string requirement) simultaneously.
On Tue, 27 Feb 2024 at 15:49, Ivan Trofimov <i.trofimow@yandex.ru> wrote: > Caching the result of PQfnumber could be done, but would result in > somewhat of a mess on a call-site. To be clear I meant your wrapper around libpq could internally cache this, then the call sites of users of your wrapper would not need to be changed. i.e. your Result could contain a cache of columnname->columnumber mapping that you know because of previous calls to PQfnumber on the same Result. > I like your idea of 'PQfnumberRaw': initially i was only concerned about > a null-terminated string requirement affecting my interfaces (because > users complained about that to me, > https://github.com/userver-framework/userver/issues/494), but I think > PQfnumberRaw could solve both problems (PQfnumber being a bottleneck > when called repeatedly and a null-terminated string requirement) > simultaneously. Feel free to send a patch for this.