Re: comment regarding double timestamps; and, infinite timestamps and NaN
От | Tom Lane |
---|---|
Тема | Re: comment regarding double timestamps; and, infinite timestamps and NaN |
Дата | |
Msg-id | 7411.1577733497@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: comment regarding double timestamps; and, infinite timestampsand NaN (Justin Pryzby <pryzby@telsasoft.com>) |
Ответы |
infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN)
|
Список | pgsql-hackers |
Justin Pryzby <pryzby@telsasoft.com> writes: > On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote: >> Uh, what? This seems completely wrong to me. We could possibly >> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but >> I don't really see the point. They'll compare to other timestamp >> values correctly without that, cf timestamp_cmp_internal(). >> The example you give seems to me to be working sanely, or at least >> as sanely as it can given the number of histogram points available, >> with the existing code. In any case, shoving NaNs into the >> computation is not going to make anything better. > As I see it, the problem is that the existing code tests for isnan(), but > infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's > absurdly large values being used as if they were isnormal(). I still say that (1) you're confusing NaN with Infinity, and (2) you haven't actually shown that there's a problem to fix. These endpoint values are *not* NaNs. > On v12, my test gives: > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); > |INSERT INTO t VALUES('-infinity'); > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; > |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; > | Seq Scan on t (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1) This is what it should do. There's only one histogram bucket, and it extends down to -infinity, so the conclusion is going to be that the WHERE clause excludes all but a small part of the bucket. This is the correct answer based on the available stats; the problem is not with the calculation, but with the miserable granularity of the available stats. > vs patched master: > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); > |INSERT INTO t VALUES('-infinity'); > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; > |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; > | Seq Scan on t (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1) This answer is simply broken. You've caused it to estimate half of the bucket, which is an insane estimate for the given bucket boundaries and WHERE constraint. > IMO 146 rows is a reasonable estimate given a single histogram bucket of > infinite width, No, it isn't. regards, tom lane
В списке pgsql-hackers по дате отправления: