Re: A varint implementation for PG?
От | Tomas Vondra |
---|---|
Тема | Re: A varint implementation for PG? |
Дата | |
Msg-id | cdfe5ab1-f3f0-7d1d-51f1-6493072df9ed@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: A varint implementation for PG? (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: A varint implementation for PG?
|
Список | pgsql-hackers |
On 8/5/21 1:05 AM, Andres Freund wrote: > Hi, > > On 2021-08-04 23:44:10 +0200, Tomas Vondra wrote: >> How is that better than the two varint flavors that are already out there, >> i.e. the bitcoin [1] and protocol buffers [2]? > > The protobuf one is *terrible* for CPU efficiency. You need to go through each > byte, do masking and shifting for each byte and then have a conditional > branch. That's bad from the the amount of instructions you need to execute, > and *really* bad for the branch predictor. > Yeah, probably true - particularly for longer values. No argument here. > >> The first one seems quite efficient in how it encodes the length into very >> few bits (which matters especially for small values). It's designed for >> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary lengths >> fairly easily, I think: > >> Look at the first byte, and >> >> 0 - 243 - encoded as is >> 244 - 1 byte >> 245 - 2 bytes >> 246 - 3 bytes >> 247 - 4 bytes >> 248 - 5 bytes >> 249 - 6 bytes >> 250 - 7 bytes >> 251 - 8 bytes >> 252 - next 1 byte is length >> 253 - next 2 bytes are length >> 254 - next 3 bytes are length >> 255 - next 4 bytes are length > >> If we want to support longer lengths, we'd have to reserve an extra value >> (which reduces the number of values that require a single byte). > > I think that's not a bad scheme. I think it may end up being a bit more > expensive to decode because you need more branches instead of using > find-first-set type instructions. > I don't think it requires many branches, because you can essentially do if (byte[0] <= 243) length = 0 else if (byte[0] <= 251) length = byte[0] - 243 else { length_bytes = byte[0] - 251; ... read length_bytes, decode length } but I haven't tried implementing it and maybe my intuition is wrong. Or maybe it'd be a good scheme for on-disk format, but poor for memory. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: