Re: [HACKERS] Brain-Dead Sort Algorithm??
От | Thomas Lockhart |
---|---|
Тема | Re: [HACKERS] Brain-Dead Sort Algorithm?? |
Дата | |
Msg-id | 3847EF05.9A74386@alumni.caltech.edu обсуждение исходный текст |
Ответ на | Brain-Dead Sort Algorithm?? ("Tim Perdue" <archiver@db.geocrawler.com>) |
Ответы |
Re: [HACKERS] Brain-Dead Sort Algorithm??
|
Список | pgsql-hackers |
> serial_half is a 1-column list of 10-digit > numbers. I'm doing a select distinct because I > believe there may be duplicates in that column. > > The misunderstanding on my end came because > serial_half was a 60MB text file, but when it was > inserted into postgres, it became 345MB (6.8 > million rows has a lot of bloat apparently). > > So the temp-sort space for 345MB could easily > surpass the 1GB I had on my hard disk. Although > how anyone can take a 60MB text file and turn it > into > 1GB is beyond me. Sigh. Y'all like the sweeping statement, which got you in a bit of trouble the first time too :) Without knowing your schema, I can't say why you have *exactly* the storage requirement you see. But, you have chosen the absolute worst case for *any* relational database: a schema with only a single, very small column. For Postgres (and other DBs, but the details will vary) there is a 36 byte overhead per row to manage the tuple and the transaction behavior. So if you stored your data as int8 (int4 is too small for 10 digits, right?) I see an average usage of slightly over 44 bytes per row (36+8). So, for 6.8 million rows, you will require 300MB. I'm guessing that you are using char(10) fields, which gives 50 bytes/row or a total of 340MB, which matches your number to two digits. Note that the tuple header size will stay the same (with possibly some modest occasional bumps) for rows with more columns, so the overhead decreases as you increase the number of columns in your tables. By the way, I was going to say to RTFM, but I see a big blank spot on this topic (I could have sworn that some of the info posted to the mailing lists on this topic had made it into the manual, but maybe not). Does anyone see where this is in the docs, or have an interest in writing a bit? The place is doc/src/sgml/storage.sgml and page.sgml ... Good luck. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
В списке pgsql-hackers по дате отправления: