statistics

Поиск

Список

Период

Сортировка

От	Tiago Antão
Тема	statistics
Дата	23 августа 2000 г. 16:13:21
Msg-id	Pine.LNX.4.21.0008232034030.6540-100000@eros.si.fct.unl.pt обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi!
 I've made a small (2 hours work) program to make "histograms" on data. Giving a table and column he tries to put in
bucketsspecial values
 
(special means the most/least used values). It tries to be a little smart,
ie, if there are lots of guys with most/least values it will not put them
in buckets.
 Example: bucket size=10
$ ./a.out "dbname=teste"  d_pags uid
Distinct values: 1028
Number of tuples: 6880
On a uniform distribution: 6.692607 tuples/value

# of values with more references
1 - 1
2 - 1
3 - 1
[...]
This means that there is only one value as the most referenced (110 times)
# of values with less references
1 - 253
2 - 153
3 - 109
[..]
This means that there ara 253 values that have the least references (once)
Best case buckets
1 - u805156 (110) 
2 - u1503927 (103) 
3 - u110525 (82) 
4 - u91106009 (78) 
5 - u1106837 (60) 
6 - u1714112 (55) 
7 - u1414335 (53) 
8 - u1105732 (50) 
9 - u302719 (49) 
Worst case buckets
[there are so many guys with only one ref... nothing can be put in
buckets]
Removed values: 9
Removed tuples: 640
Expected tuples for each of unclassified values: 6.123651 tuples/value

If all values have equal prob of being chosen:
Error in normal case: 5.464981
Error in hist case: 4.905326

If my calculations are right:
For 9 of 1028 values (9.5% of relation) there is a very precise idea,
but in general (for a random value being selected) that's not a great
advance. At least for this case ...


Tiago

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

statistics