statistics
От | Tiago Antão |
---|---|
Тема | statistics |
Дата | |
Msg-id | Pine.LNX.4.21.0008232034030.6540-100000@eros.si.fct.unl.pt обсуждение исходный текст |
Список | pgsql-hackers |
Hi! I've made a small (2 hours work) program to make "histograms" on data. Giving a table and column he tries to put in bucketsspecial values (special means the most/least used values). It tries to be a little smart, ie, if there are lots of guys with most/least values it will not put them in buckets. Example: bucket size=10 $ ./a.out "dbname=teste" d_pags uid Distinct values: 1028 Number of tuples: 6880 On a uniform distribution: 6.692607 tuples/value # of values with more references 1 - 1 2 - 1 3 - 1 [...] This means that there is only one value as the most referenced (110 times) # of values with less references 1 - 253 2 - 153 3 - 109 [..] This means that there ara 253 values that have the least references (once) Best case buckets 1 - u805156 (110) 2 - u1503927 (103) 3 - u110525 (82) 4 - u91106009 (78) 5 - u1106837 (60) 6 - u1714112 (55) 7 - u1414335 (53) 8 - u1105732 (50) 9 - u302719 (49) Worst case buckets [there are so many guys with only one ref... nothing can be put in buckets] Removed values: 9 Removed tuples: 640 Expected tuples for each of unclassified values: 6.123651 tuples/value If all values have equal prob of being chosen: Error in normal case: 5.464981 Error in hist case: 4.905326 If my calculations are right: For 9 of 1028 values (9.5% of relation) there is a very precise idea, but in general (for a random value being selected) that's not a great advance. At least for this case ... Tiago
В списке pgsql-hackers по дате отправления: