Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
От | Joshua Tolley |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |
Дата | |
Msg-id | e7e0a2570811021641s560a7c27r6816946e766102f3@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>) |
Список | pgsql-hackers |
On Sun, Nov 2, 2008 at 4:48 PM, Lawrence, Ramon <ramon.lawrence@ubc.ca> wrote: > Joshua, > > Thank you for offering to review the patch. > > The easiest way to test would be to generate your own TPC-H data and > load it into a database for testing. I have posted the TPC-H generator > at: > > http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip > > The generator can produce skewed data sets. It was produced by > Microsoft Research. > > After unzipping, on a Windows machine, you can just run the command: > > dbgen -s 1 -z 1 > > This will produce a TPC-H database of scale 1 GB with a Zipfian skew of > z=1. More information on the generator is in the document README-S.DOC. > Source is provided for the generator, so you should be able to run it on > other operating systems as well. > > The schema DDL is at: > > http://people.ok.ubc.ca/rlawrenc/tpch_pg_ddl.txt > > Note that the load time for 1G data is 1-2 hours and for 10G data is > about 24 hours. I recommend you do not add the foreign keys until after > the data is loaded. > > The other alternative is to do a pgdump on our data sets. However, the > download size would be quite large, and it will take a couple of days > for us to get you the data in that form. > > -- > Dr. Ramon Lawrence > Assistant Professor, Department of Computer Science, University of > British Columbia Okanagan > E-mail: ramon.lawrence@ubc.ca I'll try out the TPC-H generator first :) Thanks. - Josh
В списке pgsql-hackers по дате отправления: