Performance for seq. scans
От | Jules Bean |
---|---|
Тема | Performance for seq. scans |
Дата | |
Msg-id | 20000726115132.C29809@grommit.office.vi.net обсуждение исходный текст |
Ответы |
Re: Performance for seq. scans
|
Список | pgsql-general |
Hi all, I've had a look over the docs and the FAQ and I can't see anything answering this, so here goes: I'm in the (slightly unusual, in a relational world) situation that the dominant query on my database is a wildcard search, so that no indexes can be used. (E.g. select * from table_a where foo like '%bar%'). Without some /very/ clever (and disk-space intensive) subword indexing, this query is doomed to be a sequential scan, which I'm resigned to. It's a question of making that as fast as possible. My dataset is around 500M as a text file on disk, and around 1500M as postgres data. The machine I'm working on at the moment does the search in around 90 seconds. (For comparision, MS SQL 7, the other solution being considered here, takes around 75 seconds on identical hardware). Interestingly, using 'vmstat' shows that the CPU is maxxed out at 50% (this being a dual CPU machine), while the disk access is a mere 4M/sec --- bonnie claims this machine is capable of around 25M/sec to this particular disk. So it would seem that the bottleneck is the CPU. [I understand why both CPUs aren't used] My previous feeling had been that the bottleneck was going to be the disk, in which case I was going to recommend installing enough memory in the machine that the kernel disk cache could cache the whole file, and thus speeding up the search. In the current situtation, it seems like the only improvement would be to install a faster CPU (and since we're currently using a PIII 600, I couldn't expect much more than a 60% improvement or so that way). It seems slightly surprising that postgres can only "service" a 4M/sec stream of data from the disk with a LIKE query -- not such a complex query. Is there some unnecessary data copying in the critical path for the search? I almost forgot -- this is debian package 7.0.2-2. Any pointers to whether or not this performance can be improved upon, welcomed. Currently I'm feeling like the right solution may be to dump the 500M text file periodically and run 'grep' on a machine with enough memory to cache the text file ;-) Jules Bean
В списке pgsql-general по дате отправления: