Обсуждение: GROUP BY / ORDER BY string is very slow

Поиск
Список
Период
Сортировка

GROUP BY / ORDER BY string is very slow

От
Oleg Broytmann
Дата:
Hello!
  I use a modified version of contrib/apache_logginig. The table I am
using:

CREATE TABLE combinedlog (  host text,  accdate abstime,  request text,  authuser text,  cookie text,  referer text,
useragenttext,  stime int2,  status int2,  bytes int4
 
);
  Once a month I run a very simple script to put WWW logs into the table.
The very loading is not fast (I am running postmaster with -F, I use
BEGIN/END and I drop indicies before loading. I remember when I started
using BEGIN/END loading speed up a bit, but not significantly), but is not
my biggest concern. What is worse is spped of my queries.  After inserting into the table, I run this shell script:

sel_f() {  field=$1  psql -d ce_wwwlog -c "SELECT COUNT($field), $field FROM raw_combinedlog GROUP BY 2 ORDER BY 1
DESC;"
}

for i in host request referer useragent; do  sel_f $i > by-$i
done
  This works very, very slow. I tried to use indicies:

CREATE INDEX host ON combinedlog (host);
CREATE INDEX request ON combinedlog (request);
CREATE INDEX referer ON combinedlog (referer);
CREATE INDEX useragent ON combinedlog (useragent);
  but indicies do not help much, and indexing time is so big, that sum of
CREATE INDEX + SELECT is even bigger :(
  Why is it so slow? How can I speed it up?
  I am running postgres compiled with --enable-locale. that is, for every
string comparision there are 2 (two) malloc calls and one strcoll. Can I
increase speed turning strcoll off? If so, postgres need a SET command to
turn localization temporary off. I can hack in, as I already rewrote
localization stuff a year ago. The only thing I want to hear from postgres
community (hackers, actually :) is how it should be named:  SET STRCOLL=off
or such?  I remember when I submitted my locale patch there was a discussion on
how to do it the Right Way, but I didn't remember the conlusion. What
finally we decided? I want to add command (if I should to) that is
compliant with other stuff here.

Oleg.
----  Oleg Broytmann     http://members.tripod.com/~phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] GROUP BY / ORDER BY string is very slow

От
"Thomas G. Lockhart"
Дата:
> Once a month I run a very simple script to put WWW logs into the 
> table.
> What is worse is spped of my queries.
> Why is it so slow? How can I speed it up?

Are you running vacuum after removing your indices? If you don't then
the table storage area does not actually shrink.

> I am running postgres compiled with --enable-locale. that is, for 
> every string comparision there are 2 (two) malloc calls and one 
> strcoll. Can I increase speed turning strcoll off? If so, postgres 
> need a SET command to turn localization temporary off.

How can you "turn localization off" if you have localized strings in
your database? If you build indices without localization, then turn
localization back on, the things are probably hopelessly out of order.

> I remember when I submitted my locale patch there was a discussion on
> how to do it the Right Way, but I didn't remember the conlusion. What
> finally we decided? I want to add command (if I should to) that is
> compliant with other stuff here.

Is the Right Way to implement the NATIONAL CHARACTER type rather than
having the CHAR type be localized? That way, you could have both types
in the same database. Or is that SQL92 feature not widely used or
useful?
                   - Tom


Re: [HACKERS] GROUP BY / ORDER BY string is very slow

От
Oleg Broytmann
Дата:
Hello!

On Fri, 22 Jan 1999, Thomas G. Lockhart wrote:

> > Once a month I run a very simple script to put WWW logs into the 
> > table.
> > What is worse is spped of my queries.
> > Why is it so slow? How can I speed it up?
> 
> Are you running vacuum after removing your indices? If you don't then
> the table storage area does not actually shrink.
  Yes, I did a dozen of experiments, running VACUUM with and without
indices. VACUUM helped a bit, but not much...

> > I am running postgres compiled with --enable-locale. that is, for 
> > every string comparision there are 2 (two) malloc calls and one 
> > strcoll. Can I increase speed turning strcoll off? If so, postgres 
> > need a SET command to turn localization temporary off.
> 
> How can you "turn localization off" if you have localized strings in
> your database? If you build indices without localization, then turn
> localization back on, the things are probably hopelessly out of order.
  What are "localized strings"?  In this particular database there are only strings from WWW-log. If I
could turn localization off, I would turn it off for this entire db
forever.

> Is the Right Way to implement the NATIONAL CHARACTER type rather than
> having the CHAR type be localized? That way, you could have both types
> in the same database. Or is that SQL92 feature not widely used or
> useful?
  In this particular case NATIONAL CHARACTER (actually, non-NATIONAL
CHARACTER) is a solution. Not sure about other cases.

Oleg.
----    Oleg Broytmann  National Research Surgery Centre  http://sun.med.ru/~phd/          Programmers don't die, they
justGOSUB without RETURN.