Re: analyze.c
От | Tiago Antão |
---|---|
Тема | Re: analyze.c |
Дата | |
Msg-id | Pine.LNX.4.21.0008231742420.5111-100000@eros.si.fct.unl.pt обсуждение исходный текст |
Ответ на | Re: analyze.c (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: analyze.ct
Re: analyze.c |
Список | pgsql-hackers |
On Wed, 23 Aug 2000, Tom Lane wrote: > > What's the big reason not to do that? I know that > > there is some code in analyze.c (like comparing) that uses other parts of > > pg, but that seems to be easily fixed. > > Are you proposing not to do any comparisons? It will be interesting to > see how you can compute a histogram without any idea of equality or > ordering. But if you want that, then you still need the function-call > manager as well as the type-specific comparison routines for every > datatype that you might be asked to operate on (don't forget > user-defined types here). I forgot user defined data types :-(, but regarding histograms I think the code can be made external (at least for testing purposes): 1. I was not suggesting not to do any comparisons, but Ithink the only comparison I need is equality, I don't need order as I don't need to calculate mins or maxs (I just need mins and maxes on frequencies, NOT on dat itself) to make a histogram. 2. The mapping to text guarantees that I have (PQgetvalue returns always char* and pg_statistics keeps a "text" anyway) a way of knowing about equality regardless of type. But at least anything relating to order has to be in. > > I'm leaning toward the implementation of end-biased histograms. There is > > an introductory reference in the IEEE Data Engineering Bulletin, september > > 1995 (available on microsoft research site). > > Sounds interesting. Can you give us an exact URL? http://www.research.microsoft.com/research/db/debull/default.htm BTW, you can get access to SIGMOD CDs with lots of goodies for a very low price (at least in 1999 it was a bargain), check out ACM membership for sigmod. I've been reading something about implementation of histograms, and, AFAIK, in practice histograms is just a cool name for no more than: 1. top ten with frequency for each 2. the same fortop ten worse 3. average for the rest I'm writing code get this info (outside pg for now - for testing purposes). Best Regards, Tiago PS - again: I'm starting, so, some of my comments can be completly dumb.
В списке pgsql-hackers по дате отправления: