Re: EXTERNAL storage and substring on long strings
От | Scott Cain |
---|---|
Тема | Re: EXTERNAL storage and substring on long strings |
Дата | |
Msg-id | 1059682839.5415.53.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | EXTERNAL storage and substring on long strings (Scott Cain <cain@cshl.org>) |
Ответы |
Re: EXTERNAL storage and substring on long strings
Re: EXTERNAL storage and substring on long strings |
Список | pgsql-performance |
On Thu, 2003-07-31 at 15:44, Tom Lane wrote: > Scott Cain <cain@cshl.org> writes: > > explain analyze select substring(residues from 1000000 for 20000) > > from feature where feature_id=1; > > > where feature is a table with ~3 million rows, and residues is a text > > column, where for the majority of the rows of feature, it is null, for a > > large minority, it is shortish strings (a few thousand characters), and > > for 6 rows, residues contains very long strings (~20 million characters > > (it's chromosome DNA sequence from fruit flies)). > > I think the reason uncompressed storage loses here is that the runtime > is dominated by the shortish strings, and you have to do more I/O to get > at those if they're uncompressed, negating any advantage from not having > to fetch all of the longish strings. I'm not sure I understand what that paragraph means, but it sounds like, if PG is working the way it is supposed to, tough for me, right? > > Or it could be that there's a bug preventing John Gray's substring-slice > optimization from getting used. The only good way to tell that I can > think of is to rebuild PG with profiling enabled and try to profile the > execution both ways. Are you up for that? I am not against recompiling. I am currently using an RPM version, but I could probably recompile; the compilation is probably straight forward (adding something like `--with_profiling` to ./configure), but how straight forward is actually doing the profiling? Is there a document somewhere that lays it out? > > (BTW, if you are using a multibyte database encoding, then that's your > problem right there --- the optimization is practically useless unless > character and byte indexes are the same.) I shouldn't be, but since it is an RPM, I can't be sure. It sure would be silly since the strings consist only of [ATGCN]. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory
В списке pgsql-performance по дате отправления: