Re: Test database for new installs?
От | Greg Stark |
---|---|
Тема | Re: Test database for new installs? |
Дата | |
Msg-id | 87ekilzljr.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: Test database for new installs? (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: > Also, for tables where the text key is required in the table, *adding* an > additional INT column as a key is no improvement in performance. Not true. Every table which references the varchar-keyed table needs to have a complete copy of the varchar key. Plus the indexes on the table (and often those referencing tables) are all bigger too. > > Additionally, I regretted *every* case where I decided to use some text > > data as key, sooner or later. > > Well, obviously you and I have had different workloads then. My experience agrees with his. A good example was using user provided text usernames as a primary key. The application guaranteed they would be unique, and they couldn't be changed. Had those requirements changed things would have gotten very nasty. No, ON UPDATE CASCADE doesn't solve things when you have a few hundred million records referencing the table on a 24x7 application. Especially when you have a few hundred million more archived records on tape and in your data warehouse for doing statistical analyses. In fact those requirements never changed. And yet we still ended up regretting that decision for multiple reasons: . The varchar field spread throughout the database like a slow rot to tables that referenced users. Some of the largest tablesin the database ended up 10-30% inflated in size due to that field alone. Their indexes were even more inflated. . Later we had to export data to a third party and receive data back from them. Their mainframe uppercased all the text weprovided for them in the key. It was also fixed position so any trailing spaces were effectively lost. Because of the latter problem we added an integer field. Afterwards we started using that to reference the users table on any new table. The speed difference on index scans was noticable. Actually this was Oracle. I don't have empirical tests for postgres to know if it would be the same. Actually I would expect the difference for Postgres would be even greater. Postgres stores integers directly in Datums but varchars require a palloc, and any comparison involves strcoll calls which can be quite slow compared to an integer == call. > > We will probably find that *everthing* each of us does will offend somebody > > else. I'm not too keen to get into "best practice" wars. That's what IRC is > > for ;-) Now can we discuss naming conventions for primary keys? :) -- greg
В списке pgsql-hackers по дате отправления: