Обсуждение: Record last SELECT on a row?

Поиск
Список
Период
Сортировка

Record last SELECT on a row?

От
Matthias Leisi
Дата:
An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in
thattable were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows
whichhave not been accessed in a certain period of time. 

The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers.
Almostall accesses will result in zero, one or very few records returned. Given the modest size of the table,
performanceconsiderations are not top priority. 

If we had full control over the application, we could eg use a function to select the records and then update some
„lastread“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full
controlover the database, so we could put some other „object“ in lieu of the direct table. 

Any other ways this could be achieved?

Thanks,
Matthias




Re: Record last SELECT on a row?

От
Laurenz Albe
Дата:
On Wed, 2025-12-17 at 08:40 +0100, Matthias Leisi wrote:
> An application (which we can’t change) is accessing some Postgres table, and we would
> like to record when the rows in that table were last read (meaning: appeared in a
> SELECT result). The ultimate goal would be that we can „age out“ rows which have not
> been accessed in a certain period of time.
>
> If we had full control over the application, we could eg use a function to select the
> records and then update some „last read“ column. But since we don’t control the
> application, that’s not an option. On the other hand, we have full control over the
> database, so we could put some other „object“ in lieu of the direct table.
>
> Any other ways this could be achieved?

I don't think that is possible.  You could log all statements, but that won't show
which rows are accessed.

Yours,
Laurenz Albe



Re: Record last SELECT on a row?

От
Greg Sabino Mullane
Дата:
On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:
The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers.
 ...
Any other ways this could be achieved?

It depends a lot on how the table is accessed, but you could use a function (or group of functions) that returns the information from the table, and along the way, updates the requested metadata.

 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: Record last SELECT on a row?

От
"David G. Johnston"
Дата:
On Wednesday, December 17, 2025, Matthias Leisi <matthias@leisi.net> wrote:
Any other ways this could be achieved?

Without the ability to modify the application which uses a direct database connection, you’d need to modify the server code or run some kind of proxy/man-in-the-middle server you can program.  I would avoid updating the table on write though - just keep a cache of sorts somewhere and periodically compare the cache to the live table.

David J.

Re: Record last SELECT on a row?

От
Ron Johnson
Дата:
On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:
An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in that table were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows which have not been accessed in a certain period of time.

The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almost all accesses will result in zero, one or very few records returned. Given the modest size of the table, performance considerations are not top priority.

If we had full control over the application, we could eg use a function to select the records and then update some „last read“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full control over the database, so we could put some other „object“ in lieu of the direct table.

Any other ways this could be achieved?

pgaudit might satisfy your needs, since it would only log SELECT statements on that one table.  You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Record last SELECT on a row?

От
Matthias Leisi
Дата:

pgaudit might satisfy your needs, since it would only log SELECT statements on that one table.  You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.

That’s a viable suggestion, thanks a lot. Real-time is indeed not necessary, a daily (or even a weekly) cleaning of unused data is sufficient. pgaudit was anyway on the table for some other use cases, so that would fit in nicely.

Thanks,
— Matthias

Re: Record last SELECT on a row?

От
Adrian Klaver
Дата:
On 12/16/25 23:40, Matthias Leisi wrote:
> An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in
thattable were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows
whichhave not been accessed in a certain period of time.
 

Why?

Given the small size of the table, what is the gain expected?

Also is it assured that the reading of a row equals importance of a row?

I would expect any solution would impose more overhead then simply 
leaving the rows alone.

> 
> The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers.
Almostall accesses will result in zero, one or very few records returned. Given the modest size of the table,
performanceconsiderations are not top priority.
 
> 
> If we had full control over the application, we could eg use a function to select the records and then update some
„lastread“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full
controlover the database, so we could put some other „object“ in lieu of the direct table.
 
> 
> Any other ways this could be achieved?
> 
> Thanks,
> Matthias
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Record last SELECT on a row?

От
Joe Conway
Дата:
On 12/17/25 11:25, Matthias Leisi wrote:
> 
>> pgaudit might satisfy your needs, since it would only log SELECT 
>> statements on that one table.  You'd still have to grep the log file, 
>> so the information wouldn't be real-time, but that's /probably/ not 
>> important.
> 
> That’s a viable suggestion, thanks a lot. Real-time is indeed not 
> necessary, a daily (or even a weekly) cleaning of unused data is 
> sufficient. pgaudit was anyway on the table for some other use cases, so 
> that would fit in nicely.

Possibly try using/abusing RLS?

8<-----------------
psql test
psql (19devel)
Type "help" for help.

create table t1(c1 int, c2 text);
insert into t1 values(1,'a'),(2,'b'),(3,'c'),(42,'zp');
grant select on table t1 to public;

create table a1(c1 int, t1 timestamptz);
create or replace function audit(int)
returns bool as
$$
   insert into a1 values($1, now()) returning true
$$ security definer language sql;
create policy audit_t1 ON t1 for select using (audit(c1));
alter table t1 enable row level security;

create user joe;
set session authorization joe;
select * from t1 where c1=42;

  c1 | c2
----+----
  42 | zp
(1 row)

reset session authorization;
select * from a1;

  c1 |              t1
----+-------------------------------
  42 | 2025-12-17 11:42:51.871843-05
(1 row)
8<-----------------

HTH,

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



Re: Record last SELECT on a row?

От
Greg Sabino Mullane
Дата:
On Wed, Dec 17, 2025 at 10:24 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
pgaudit might satisfy your needs, since it would only log SELECT statements on that one table.  You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.

That would only give you table-level information, unless it was a dirt-simple query with no joins and an easy to parse where clause.
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Record last SELECT on a row?

От
Thiemo Kellner
Дата:
Hi

I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.

Cheers

Thiemo

Re: Record last SELECT on a row?

От
Greg Sabino Mullane
Дата:
On Wed, Dec 17, 2025 at 12:14 PM Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:
I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.

The view would be sucking in the entire table every time . Maybe not a big deal as this is a very tiny table.

I think Joe wins this thread with the RLS idea however.


Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: Record last SELECT on a row?

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> Possibly try using/abusing RLS?

Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read.  In your toy
example it'd work, because the generated plan is

regression=> explain verbose select * from t1 where c1=42;
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.t1  (cost=0.00..343.38 rows=2 width=36)
   Output: c1, c2
   Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)

so the WHERE clause is applied before the RLS filter.  But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.

If the application's behavior is simple and well-defined, this might
be good enough, of course.

I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:

1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view).  SELECT
FOR SHARE might be good enough, not sure.

2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.

This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.

            regards, tom lane



Re: Record last SELECT on a row?

От
Joe Conway
Дата:
On 12/17/25 13:37, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> Possibly try using/abusing RLS?
> 
> Cute idea, but I think it doesn't reliably address the problem of
> wanting to identify the specific rows that were read.  In your toy
> example it'd work, because the generated plan is
> 
> regression=> explain verbose select * from t1 where c1=42;
>                           QUERY PLAN
> ------------------------------------------------------------
>   Seq Scan on public.t1  (cost=0.00..343.38 rows=2 width=36)
>     Output: c1, c2
>     Filter: ((t1.c1 = 42) AND audit(t1.c1))
> (3 rows)
> 
> so the WHERE clause is applied before the RLS filter.  But in any
> slightly-more-complicated situation, like a non-leakproof WHERE
> condition, the order would be reversed so the log would overstate
> which rows were read.


Sure, of course we have had requests for a leakproofness check bypass[1] 
for some use cases, and this could be one more such case.


> If the application's behavior is simple and well-defined, this might
> be good enough, of course.


FWIW when I read the original email in the thread I got the impression 
that the application behavior was pretty simple WRT this table. But of 
course I could easily be wrong...

> I thought of a way that could possibly do this reliably, but it's
> vastly more work than the use-case seems worth:
> 
> 1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
> without changing the application, by interposing a view).  SELECT
> FOR SHARE might be good enough, not sure.
> 
> 2. Write a logical replication output plugin that parses the WAL log
> well enough to identify the tuple locks taken by FOR UPDATE.


Yeah this seems like a pretty heavy lift.


> This should work to log only the rows actually read, because FOR
> UPDATE is postponed to the top of the query plan, unlike RLS.



[1] 

https://www.postgresql.org/message-id/flat/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX%3D90vg%40mail.gmail.com#4a03eafc8c9660177874e11811c8f410

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



Re: Record last SELECT on a row?

От
Matthias Leisi
Дата:

If the application's behavior is simple and well-defined, this might
be good enough, of course.


FWIW when I read the original email in the thread I got the impression that the application behavior was pretty simple WRT this table. But of course I could easily be wrong...

You are not wrong. The use case is in fact the `userpref` table used by Spamassassin. Left unmaintained, and given a large-enough user base, this has the tendency to grow considerably over time, so we want to gradually remove entries not actually used any more. (And we don’t want to patch Spamassassin core code to do this by itself…)

I’ll give it a try with ab/using RLS, which we use for some other use cases anyway already.

Thanks a lot for the suggestions in the thread, much appreciated!

— Matthias


Re: Record last SELECT on a row?

От
Kris Deugau
Дата:
Matthias Leisi wrote:
> 
>>> If the application's behavior is simple and well-defined, this might
>>> be good enough, of course.
>>
>>
>> FWIW when I read the original email in the thread I got the impression 
>> that the application behavior was pretty simple WRT this table. But of 
>> course I could easily be wrong...
> 
> You are not wrong. The use case is in fact the `userpref` table used by 
> Spamassassin. Left unmaintained, and given a large-enough user base, 
> this has the tendency to grow considerably over time, so we want to 
> gradually remove entries not actually used any more. (And we don’t want 
> to patch Spamassassin core code to do this by itself…)

For this particular use case, would it be easier to periodically compare 
the list of usernames with userpref data against active accounts on the 
mail system?

-kgd