Обсуждение: Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
On Tue, 2022-07-12 at 14:25 -0400, MichaelDBA Vitale wrote: > On 07/12/2022 2:13 PM Pierson Patricia L (Contractor) <patricia.l.pierson@irs.gov> wrote: > > Do a count on the primary key. Will force index access and you don’t access the entire row which may be very long. > > LIKE : select count(ID) from my_table; > > That is not true: doing the select on the primary key will still result in a table scan, > not an index scan. The heap always gets accessed for select counts. I'd say that both statements are wrong: - count(id) is *slower* than count(*), because it has to check each "id" if it is NULL or not (NULL values are not counted). count(*) is just the SQL standard's weird way of writing a parameterless aggregate; it has nothing to do with the * in "SELECT * FROM ". - Both "SELECT count(id) FROM tab" and "SELECT count(*) FROM tab" can result in an index-only scan. You just need the table to be recently VACUUMed, you need a table that is wide enough that a sequential scan is actually slower than an index-only scan, and perhaps you need "random_page_cost" to be low enough. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Laurenz,
Oh really? Even if it did an INDEX ONLY SCAN, it would still be slower because it still has to access the heap in a random I/O kinda way (like I said-->"The heap always gets accessed for select counts") because visibility info is only found in the HEAP not the index.
Laurenz Albe wrote on 7/12/2022 3:13 PM:
Oh really? Even if it did an INDEX ONLY SCAN, it would still be slower because it still has to access the heap in a random I/O kinda way (like I said-->"The heap always gets accessed for select counts") because visibility info is only found in the HEAP not the index.
Laurenz Albe wrote on 7/12/2022 3:13 PM:
I'd say that both statements are wrong:
Regards,
Michael Vitale
703-600-9343
Вложения
> On Jul 12, 2022, at 8:12 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote: > > Oh really? Even if it did an INDEX ONLY SCAN, it would still be slower because it still has to access the heap in a randomI/O kinda way (like I said-->"The heap always gets accessed for select counts") because visibility info is only foundin the HEAP not the index. This changed in 9.something. There is now a visibility map, which can, for data that hasn't changed recently, greatly reducethe amount of access required to the heap to determine visibility.
On 7/12/22 21:28, Scott Ribe wrote: >> On Jul 12, 2022, at 8:12 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote: >> >> Oh really? Even if it did an INDEX ONLY SCAN, it would still be slower because it still has to access the heap in a randomI/O kinda way (like I said-->"The heap always gets accessed for select counts") because visibility info is only foundin the HEAP not the index. > This changed in 9.something. There is now a visibility map, which can, for data that hasn't changed recently, How recently is "recently"? And does VACUUM clean it up? -- Angular momentum makes the world go 'round.