Re: WIP: cross column correlation ...
От | Greg Stark |
---|---|
Тема | Re: WIP: cross column correlation ... |
Дата | |
Msg-id | AANLkTinD_vPt_d5tzGKXfJURYzq5=5mCa8K8_G1Sr8+O@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: WIP: cross column correlation ... (PostgreSQL - Hans-Jürgen Schönig<postgres@cybertec.at>) |
Ответы |
Re: WIP: cross column correlation ...
|
Список | pgsql-hackers |
2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. > The problem is that we haven't figured out how to usefully store a histogram for <a,b>. Consider the oft-quoted example of a <city,postal-code> -- or <city,zip code> for Americans. A histogram of the tuple is just the same as a histogram on the city. It doesn't tell you how much extra selectivity the postal code or zip code gives you. And if you happen to store a histogram of <postal code, city> by mistake then it doesn't tell you anything at all. We need a data structure that lets us answer the bayesian question "given a city of New York how selective is zip-code = 02139". I don't know what that data structure would be. Heikki and I had a wacky hand-crafted 2D histogram data structure that I suspect doesn't actually work. And someone else did some research on list and came up with a fancy sounding name of a statistics concept that might be what we want. -- greg
В списке pgsql-hackers по дате отправления: