Re: Performant queries on table with many boolean columns
От | Jeff Janes |
---|---|
Тема | Re: Performant queries on table with many boolean columns |
Дата | |
Msg-id | CAMkU=1yiaPZAuePwbD_=GS-JZ7r5hzpc9FhZbMX2DEBqTU=BRw@mail.gmail.com обсуждение исходный текст |
Ответ на | Performant queries on table with many boolean columns (Rob Imig <rimig88@gmail.com>) |
Список | pgsql-performance |
On Wed, Apr 20, 2016 at 11:41 AM, Rob Imig <rimig88@gmail.com> wrote: > Hey all, > > New to the lists so please let me know if this isn't the right place for > this question. > > I am trying to understand how to structure a table to allow for optimal > performance on retrieval. The data will not change frequently so you can > basically think of it as static and only concerned about optimizing reads > from basic SELECT...WHERE queries. > > The data: > > ~20 million records > Each record has 1 id and ~100 boolean properties > Each boolean property has ~85% of the records as true > > > The retrieval will always be something like "SELECT id FROM <table> WHERE > <conditions>. > > <conditions> will be some arbitrary set of the ~100 boolean columns and you > want the ids that match all of the conditions (true for each boolean > column). Example: > WHERE prop1 AND prop18 AND prop24 Is 3 a typical number of conditions to have? 85%^3 is 61.4%, so you are fetching most of the table. At that point, I think I would give up on indexes and just expect to do a full table scan each time. Which means a single column bit-string data type might be the way to go, although the construction of the queries would then be more cumbersome, especially if you will do by hand. I think the only way to know for sure is to write a few scripts to benchmark it. Cheers, Jeff
В списке pgsql-performance по дате отправления: