Re: libpq compression (part 3)

Поиск

Список

Период

Сортировка

От	Jacob Burroughs
Тема	Re: libpq compression (part 3)
Дата	20 декабря 2023 г. 21:48:13
Msg-id	CACzsqT5Y2cVES09bm4audEeh10bhsawSReeEgHwHA6NT2NV+BQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: libpq compression (part 3) (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: libpq compression (part 3)
Список	pgsql-hackers

Дерево обсуждения

> I'm having difficulty understanding the details of this handshaking
> algorithm from this description. It seems good that the handshake
> proceeds in each direction somewhat separately from the other, but I
> don't quite understand how the whole thing fits together. If the
> client tells the server that 'none,gzip' is supported, and I elect to
> start using gzip, how does the client know that I picked gzip rather
> than none? Are the compressed packets self-identifying?

I agree I could have spelled this out more clearly.  I forgot to
mention that I added a byte to the CompressedMessage message type that
specifies the chosen algorithm.  So if the server receives
'none,gzip', it can either keep sending uncompressed regular messages,
or it can compress them in CompressedMessage packets which now look
like "z{len}{format}{data}" (format just being a member of the
pg_compress_algorithm enum, so `1` in the case of gzip).  Overall the
intention is that both the client and the server can just start
sending CompressedMessages once they receive the list of ones other
party supports without any negotiation or agreement needed and without
an extra message type to first specify the compression algorithm. (One
byte per message seemed to me like a reasonable overhead for the
simplicity, but it wouldn't be hard to bring back SetCompressionMethod
if we prefer.)

> It's also slightly odd to me that the same parameter seems to specify
> both what we want to send, and what we're able to receive. I'm not
> really sure we should have separate parameters for those things, but I
> don't quite understand how this works without it. The "none" thing
> seems like a bit of a hack. It lets you say "I'd like to receive
> compressed data but send uncompressed data" ... but what about the
> reverse? How do you say "don't bother compressing what you receive
> from the server, but please lz4 everything you send to the server"? Or
> how do you say "use zstd from server to client, but lz4 from client to
> server"? It seems like you can't really say that kind of thing.

When I came up with the protocol I was imagining that basically both
server admins and clients might want a decent bit more control over
the compression they do rather than the decompression they do, since
compression is generally much more computationally expensive than
decompression.  Now that you point it out though, I don't think that
actually makes that much sense.

> What if we had, on the server side, a GUC saying what compression to
> accept and a GUC saying what compression to be willing to do? And then
> let the client request whatever it wants for each direction.

Here's two proposals:
Option 1:
GUCs:
libpq_compression (default "off")
libpq_decompression (default "auto", which is defined to be equal to
libpq_compression)
Connection parameters:
compression (default "off")
decompression (default "auto", which is defined to be equal to compression)

I think we would only send the decompression fields over the wire to
the other side, to be used to filter for the first chosen compression
field.  We would send the `_pq_.libpq_decompression` protocol
extension even if only compression was enabled and not decompression
so that the server knows to enable compression processing for the
connection (I think this would be the only place we would still use
`none`, and not as part of a list in this case.)  I think we also
would want to add libpq functions to allow a client to check the
last-used compression algorithm in each direction for any
monitoring/inspection purposes (actually that's probably a good idea
regardless, so a client application that cares doesn't need to/try to
implement the intersection and assumption around choosing the first
algorithm in common).  Also I'm open to better names than "auto", I
just would like it to avoid unnecessary verbosity for the common case
of "I just want to enable bidirectional compression with whatever
algorithms are available with default parameters".

Option 2:
This one is even cleaner in the common case but a bit worse in the
uncommon case: just use one parameter and have
compression/decompression enabling be part of the compression detail
(e.g. "libpq_compression='gzip:no_decompress;lz4:level=2,no_compress;zstd'"
or something like that, in which case the "none,gzip" case would
become "'libpq_compression=gzip:no_compress'").  See
https://www.postgresql.org/docs/current/app-pgbasebackup.html ,
specifically the `--compress` flag, for how specifying compression
algorithms and details works.

I'm actually not sure which of the two I prefer; opinions are welcome :)

Thanks,
Jacob

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: libpq compression (part 3)