Re: Improve compression speeds in pg_lzcompress.c
От | Benedikt Grundmann |
---|---|
Тема | Re: Improve compression speeds in pg_lzcompress.c |
Дата | |
Msg-id | CADbMkNPrKe2P7Oku=2sNGyLrd8+wQad_YBpvJtmJBtV17Tmf4A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve compression speeds in pg_lzcompress.c (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Personally, my biggest gripe about the way we do compression is that
it's easy to detoast the same object lots of times. More generally,
our in-memory representation of user data values is pretty much a
mirror of our on-disk representation, even when that leads to excess
conversions. Beyond what we do for TOAST, there's stuff like numeric
where not only toast but then post-process the results into yet
another internal form before performing any calculations - and then of
course we have to convert back before returning from the calculation
functions. And for things like XML, JSON, and hstore we have to
repeatedly parse the string, every time someone wants to do anything
to do. Of course, solving this is a very hard problem, and not
solving it isn't a reason not to have more compression options - but
more compression options will not solve the problems that I personally
have in this area, by and large.
At the risk of saying something totally obvious and stupid as I haven't looked at the actual representation this sounds like a memoisation problem. In ocaml terms:
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene
В списке pgsql-hackers по дате отправления: