Full Text Index disk space requirements
От | |
---|---|
Тема | Full Text Index disk space requirements |
Дата | |
Msg-id | 61258.12.249.229.112.1038303419.squirrel@www.l-i-e.com обсуждение исходный текст |
Список | pgsql-performance |
So, I'm trying to create a full text index as described here: http://techdocs.postgresql.org/techdocs/fulltextindexing.php Everything was going mostly okay... I had to hack a quick PHP script instead of using the Perl once since I didn't have a working Pg.pm, but that was a minor speed bump. Then I hit a real road-block... \copy article_fti from fulltext.sorted \. ERROR: copy: line 34635390, cannot extend article_fti: No space left on device. Check free disk space. PQendcopy: resetting connection archive=> \q [root@rm-004-24 utilities]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 15G 15G 0 100% / /dev/sda1 48M 6.1M 39M 14% /boot none 439M 0 439M 0% /dev/shm Oh. Yeah. I guess that *IS* going to be kind of big... Any SWAGs how much disk space is required for a 500 M fulltext.sorted file? IE, the ASCII file of string/OID pairs, in tab-delimited form, is 500 M -- How much PostgreSQL space does that turn into with the tables/indices as described the URL above? When I deleted all the fti rows, and did a VACUUM, there was almost 2G available... ALSO: Wouldn't using f1.string = 'perth' be faster than f1.string ~ '^perth' and equally useful? Or is ~ with ^ somehow actually faster than the seemingly simple = comparison? AND: Would using OR for the individual word comparisons be a big drag on speed? I'd kind of like to give ranked results based on how many of the terms were present rather than a complete filter. I'd be happy to try the EXPLAIN queries, but I don't think they're going to be accurate without the data in the FTI table... I got some weird results when I did a test run with a very small dataset in the FTI table -- But I also think I was doing it in the middle of a train-wreck between daily VACUUM and pg_dump, which were thrashing each other with all the FTI data I had imported just for the small test... I've altered the cron jobs to have more time in between. THANKS IN ADVANCE!
В списке pgsql-performance по дате отправления: