A speed comparison with sqlite
От | Shane Ambler |
---|---|
Тема | A speed comparison with sqlite |
Дата | |
Msg-id | 47903F61.8000904@Sheeky.Biz обсуждение исходный текст |
Ответы |
Re: A speed comparison with sqlite
|
Список | pgsql-advocacy |
Just thought I would share some rough numbers here. A bit of an unusual edge case but a big time difference... A guy using RealBasic (GUI development IDE) which uses sqlite as it's inbuilt db engine wanted to generate a series of unique codes and decided sql will help (wanting 30 million codes starts to rule out ram based solutions) This is a 7 character alphanumeric code. His program generated 30M codes in about 15 minutes into the sqlite db file without any duplicate checks. Select distinct(pincode) from codes; returned the results (that is to his client not out to file) after 22 hours and he was after a faster solution. Using a unique index to check as he went, inserts dropped from 18,000 per second to about 200 a second after 8 hours (without completing) (the following times are taken from psql's timing option) With a P4 3Ghz - 80GB IDE drive - running XP pro - pg 8.2.6 - Using postgresql (with psql not Realbasic) I created the table and inserted 30M codes in 5.9 mins select count(distinct(pincode)) from codes; took about 7.2 minutes (29,993,182 unique codes of the 30M created) setting output to a text file - select distinct(pincode) from codes took 18.45 minutes to send them all out (283MB file). The same hardware running FreeBSD 7.0RC1 GENERIC Kernel - create the table and insert 30M codes in 5.6 mins select count(distinct(pincode)) from codes; took 5.9 minutes (29,993,279 unique codes of the 30M created) setting output to a text file - select distinct(pincode) from codes took 9.7 minutes to send them all out to file. It would appear that the method of locating distinct results is the key here - postgresql sorts the distinct column allowing it to identify a duplicate or new value quicker - sqlite returns the distinct columns in the order they are inserted - so it must be scanning all previous entries in the output to ensure distinct. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
В списке pgsql-advocacy по дате отправления: