Re: designing time dimension for star schema
| От | Mark Wong |
|---|---|
| Тема | Re: designing time dimension for star schema |
| Дата | |
| Msg-id | CAE+TzGrsY7ai7VuGpx6RhSVU5QJX0n27iwz6eUPjnogMyqowhg@mail.gmail.com обсуждение исходный текст |
| Ответ на | designing time dimension for star schema (Mark Wong <markwkm@gmail.com>) |
| Ответы |
Re: designing time dimension for star schema
|
| Список | pgsql-general |
On Mon, Feb 10, 2014 at 9:20 AM, CS DBA <cs_dba@consistentstate.com> wrote: > I've done a lot of DSS architecture. A couple of thoughts: > > - in most cases the ETL process figures out the time id's as part of the > preparation and then does bulk loads into the fact tables > I would be very concerned about performance of a trigger that > fired for every row on the fact table > > you mention you want to do data streaming instead of bulk loads, > can you elaborate? We have processes inserting data from log files as they are written. > - When querying a star schema one of the performance features is the > fact that all joins to the dimension tables are performed via a numeric > key, such as: > "select * from fact, time_dim, geo_dim > where fact.time_id = time_dim.time_id..." > > In the case of this being a timestamp I suspect the performance would > take a hit, depending on the size of your fact table and the > scope/volume of your DSS queries this could easily be a show stopper > based on the assumption that the database can do a numeric binary search > much faster than a timestamp search I guess I was hoping the extra 4 bytes from a timestamp, compared to a bigint, wouldn't be too significant yet I didn't consider postgres might do a binary search faster on an integer type than a timestamp. Even with 1 billion rows, but maybe that's wishful thinking. Maybe a regular integer at 4 bytes would be good enough. I would estimate a query would touch up to an order of 1 million rows at a time. Regards, Mark
В списке pgsql-general по дате отправления: