Обсуждение: Bytea network traffic: binary vs text result format

Поиск
Список
Период
Сортировка

Bytea network traffic: binary vs text result format

От
"Miha D. Puc"
Дата:
Hi!

There was some debate recently about using text or binary format.
There's people who would like to use it but have trouble converting
binary encoded results into native types and there's people that say
there's not much performance difference.

I'd like to stress that performance is very different over slow
network.  The biggest difference is for byte where the text format
performance is about 3.5 times worse at inserts and updates and about
2.9 times worse at selects . Here's the reasoning:

In text format bytea are escaped using PQescapeBytea. In an average
binary stream about 2/3 would be escaped. Each escaped byte becomes of
form \\ooo at upload and of form \ooo for download, so the size of the
escaped stream is 1/3 + 2/3 * 5 = 11/3 = 3.6 and 1/3 + 2/3 * 4 = 3
respectively.

Here are the results of my test. I inserted and selected an OpenOffice
document of size 2Mb over a 2M/512K cable.
text format: insert: 120.1s select: 24.9s
binary format: insert: 33.5s select: 8.6s
factor:  insert: 3.6 select: 2.9

The difference between the test and the above calculation comes from the
estimate that 2/3 of bytes are escaped where in fact 95 out of 256 are
escaped (63%).

So there is a need (people asking) and reason (performance) to use
binary format. But there's a huge drawback - the conversions. It's easy
for varchar, not too bad for basic types (int, float, bool), effort is
needed for timestamp, date, time and numeric is a pain.

So with all the above there should be a utility for conversion between
binary format and native types and/or string format in libpq.

Regards,
Miha Puc




Re: Bytea network traffic: binary vs text result format

От
Markus Schiltknecht
Дата:
Hi,

Miha D. Puc wrote:
> So there is a need (people asking) and reason (performance) to use
> binary format.

You are aware that PostgreSQL itself *can* transfer values in binary 
format? Check the Documentation: "43.1.3. Formats and Format Codes":

http://www.postgresql.org/docs/8.1/interactive/protocol.html

This works since protocol version 3, AFAICT. The client needs to support 
that, though. But the php pgsql binding, just as an example, doesn't use 
that feature.

> So with all the above there should be a utility for conversion between
> binary format and native types and/or string format in libpq.

There already are. Not in libpq, though. Most (if not all) internal 
types have those functions. See for example:
  # SELECT typinput, typoutput, typreceive, typsend  #   FROM pg_type WHERE typname='int4';
   typinput | typoutput | typreceive | typsend  ----------+-----------+------------+----------   int4in   | int4out   |
int4recv  | int4send  (1 row)
 


The input and output functions deal with the textual representation, 
while send and receive convert to a binary representation in network 
byte order.

Hope that helps.

Regards

Markus



Re: Bytea network traffic: binary vs text result format

От
"Wilhansen Li"
Дата:


On 6/4/07, Markus Schiltknecht <markus@bluegap.ch> wrote:

There already are. Not in libpq, though. Most (if not all) internal
types have those functions.

Well, I don't think it would be puzzling that someone will ask for them even if those functions already exist in PostgreSQL because, as you stated, it's not in libpq. People will actually have to dig in the source to pinpoint where in the source tree those functions exist and how to include them in the program, which, I assume, is less ideal than when it's already included in libpq in the first place. There was a post before on a user who got disappointed because of the "crappy" support of libpq (not PostgreSQL) for binary formats.

In line with this, I'm aware of the issues that this may pose: http://archives.postgresql.org/pgsql-hackers/1999-08/msg00374.php this is already very old (8 years ago!). And they discuss about the issue that the representation might change from version to version so it's not done. There have been plans to incorporate CORBA, which IMHO is an overkill, to solve this problem but I don't think it's done yet because it's probably too complex (?). I'd rather recommend them to use  ASN.1 (if that's feasible..).


--
Life is too short for dial-up.

Re: Bytea network traffic: binary vs text result format

От
Andrew McNamara
Дата:
>There was some debate recently about using text or binary format.
>There's people who would like to use it but have trouble converting
>binary encoded results into native types and there's people that say
>there's not much performance difference.
>
>I'd like to stress that performance is very different over slow
>network.  The biggest difference is for byte where the text format
>performance is about 3.5 times worse at inserts and updates and about
>2.9 times worse at selects . Here's the reasoning:

You're referring to the worst case of the text format - handling pure
binary data - and yes, in this case, up to 4 times as many bytes can
flow over the network. If the network is your constraining factor, then
this will be significant. But in many other cases (fast local network),
the disks are the limiting factor, or other data types dominate, and in
those cases, the text format can actually be smaller.

>So there is a need (people asking) and reason (performance) to use
>binary format. But there's a huge drawback - the conversions. It's easy
>for varchar, not too bad for basic types (int, float, bool), effort is
>needed for timestamp, date, time and numeric is a pain.
>
>So with all the above there should be a utility for conversion between
>binary format and native types and/or string format in libpq.

And there's the rub... the "basic" types you mention all have standard C
representations, but there is no standard C type for timestamp, date, time
and numeric, so what would libpq convert to? Any chosen format will only
suite a subset of users, and will result in double conversions for others
who are constrained by the existing types used within their application.

I'm not sure what the answer is - certainly documenting the wire format
would be a good first step.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/