Обсуждение: Record last SELECT on a row?
An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in thattable were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows whichhave not been accessed in a certain period of time. The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almostall accesses will result in zero, one or very few records returned. Given the modest size of the table, performanceconsiderations are not top priority. If we had full control over the application, we could eg use a function to select the records and then update some „lastread“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full controlover the database, so we could put some other „object“ in lieu of the direct table. Any other ways this could be achieved? Thanks, Matthias
On Wed, 2025-12-17 at 08:40 +0100, Matthias Leisi wrote: > An application (which we can’t change) is accessing some Postgres table, and we would > like to record when the rows in that table were last read (meaning: appeared in a > SELECT result). The ultimate goal would be that we can „age out“ rows which have not > been accessed in a certain period of time. > > If we had full control over the application, we could eg use a function to select the > records and then update some „last read“ column. But since we don’t control the > application, that’s not an option. On the other hand, we have full control over the > database, so we could put some other „object“ in lieu of the direct table. > > Any other ways this could be achieved? I don't think that is possible. You could log all statements, but that won't show which rows are accessed. Yours, Laurenz Albe
On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:
The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers.
...
Any other ways this could be achieved?
It depends a lot on how the table is accessed, but you could use a function (or group of functions) that returns the information from the table, and along the way, updates the requested metadata.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Wednesday, December 17, 2025, Matthias Leisi <matthias@leisi.net> wrote:
Any other ways this could be achieved?
Without the ability to modify the application which uses a direct database connection, you’d need to modify the server code or run some kind of proxy/man-in-the-middle server you can program. I would avoid updating the table on write though - just keep a cache of sorts somewhere and periodically compare the cache to the live table.
David J.
On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:
An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in that table were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows which have not been accessed in a certain period of time.
The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almost all accesses will result in zero, one or very few records returned. Given the modest size of the table, performance considerations are not top priority.
If we had full control over the application, we could eg use a function to select the records and then update some „last read“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full control over the database, so we could put some other „object“ in lieu of the direct table.
Any other ways this could be achieved?
pgaudit might satisfy your needs, since it would only log SELECT statements on that one table. You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
pgaudit might satisfy your needs, since it would only log SELECT statements on that one table. You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.
Thanks,
— Matthias
On 12/16/25 23:40, Matthias Leisi wrote: > An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in thattable were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows whichhave not been accessed in a certain period of time. Why? Given the small size of the table, what is the gain expected? Also is it assured that the reading of a row equals importance of a row? I would expect any solution would impose more overhead then simply leaving the rows alone. > > The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almostall accesses will result in zero, one or very few records returned. Given the modest size of the table, performanceconsiderations are not top priority. > > If we had full control over the application, we could eg use a function to select the records and then update some „lastread“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full controlover the database, so we could put some other „object“ in lieu of the direct table. > > Any other ways this could be achieved? > > Thanks, > Matthias > > > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/17/25 11:25, Matthias Leisi wrote: > >> pgaudit might satisfy your needs, since it would only log SELECT >> statements on that one table. You'd still have to grep the log file, >> so the information wouldn't be real-time, but that's /probably/ not >> important. > > That’s a viable suggestion, thanks a lot. Real-time is indeed not > necessary, a daily (or even a weekly) cleaning of unused data is > sufficient. pgaudit was anyway on the table for some other use cases, so > that would fit in nicely. Possibly try using/abusing RLS? 8<----------------- psql test psql (19devel) Type "help" for help. create table t1(c1 int, c2 text); insert into t1 values(1,'a'),(2,'b'),(3,'c'),(42,'zp'); grant select on table t1 to public; create table a1(c1 int, t1 timestamptz); create or replace function audit(int) returns bool as $$ insert into a1 values($1, now()) returning true $$ security definer language sql; create policy audit_t1 ON t1 for select using (audit(c1)); alter table t1 enable row level security; create user joe; set session authorization joe; select * from t1 where c1=42; c1 | c2 ----+---- 42 | zp (1 row) reset session authorization; select * from a1; c1 | t1 ----+------------------------------- 42 | 2025-12-17 11:42:51.871843-05 (1 row) 8<----------------- HTH, -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com
On Wed, Dec 17, 2025 at 10:24 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
pgaudit might satisfy your needs, since it would only log SELECT statements on that one table. You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.
That would only give you table-level information, unless it was a dirt-simple query with no joins and an easy to parse where clause.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Hi
I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.
Cheers
Thiemo
I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.
Cheers
Thiemo
On Wed, Dec 17, 2025 at 12:14 PM Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:
I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.
The view would be sucking in the entire table every time . Maybe not a big deal as this is a very tiny table.
I think Joe wins this thread with the RLS idea however.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Joe Conway <mail@joeconway.com> writes:
> Possibly try using/abusing RLS?
Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read. In your toy
example it'd work, because the generated plan is
regression=> explain verbose select * from t1 where c1=42;
QUERY PLAN
------------------------------------------------------------
Seq Scan on public.t1 (cost=0.00..343.38 rows=2 width=36)
Output: c1, c2
Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)
so the WHERE clause is applied before the RLS filter. But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.
If the application's behavior is simple and well-defined, this might
be good enough, of course.
I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:
1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view). SELECT
FOR SHARE might be good enough, not sure.
2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.
This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.
regards, tom lane
On 12/17/25 13:37, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> Possibly try using/abusing RLS? > > Cute idea, but I think it doesn't reliably address the problem of > wanting to identify the specific rows that were read. In your toy > example it'd work, because the generated plan is > > regression=> explain verbose select * from t1 where c1=42; > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on public.t1 (cost=0.00..343.38 rows=2 width=36) > Output: c1, c2 > Filter: ((t1.c1 = 42) AND audit(t1.c1)) > (3 rows) > > so the WHERE clause is applied before the RLS filter. But in any > slightly-more-complicated situation, like a non-leakproof WHERE > condition, the order would be reversed so the log would overstate > which rows were read. Sure, of course we have had requests for a leakproofness check bypass[1] for some use cases, and this could be one more such case. > If the application's behavior is simple and well-defined, this might > be good enough, of course. FWIW when I read the original email in the thread I got the impression that the application behavior was pretty simple WRT this table. But of course I could easily be wrong... > I thought of a way that could possibly do this reliably, but it's > vastly more work than the use-case seems worth: > > 1. Convert the SELECTs into SELECT FOR UPDATE (you could do this > without changing the application, by interposing a view). SELECT > FOR SHARE might be good enough, not sure. > > 2. Write a logical replication output plugin that parses the WAL log > well enough to identify the tuple locks taken by FOR UPDATE. Yeah this seems like a pretty heavy lift. > This should work to log only the rows actually read, because FOR > UPDATE is postponed to the top of the query plan, unlike RLS. [1] https://www.postgresql.org/message-id/flat/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX%3D90vg%40mail.gmail.com#4a03eafc8c9660177874e11811c8f410 -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com
If the application's behavior is simple and well-defined, this might
be good enough, of course.
FWIW when I read the original email in the thread I got the impression that the application behavior was pretty simple WRT this table. But of course I could easily be wrong...
I’ll give it a try with ab/using RLS, which we use for some other use cases anyway already.
Thanks a lot for the suggestions in the thread, much appreciated!
— Matthias
Matthias Leisi wrote: > >>> If the application's behavior is simple and well-defined, this might >>> be good enough, of course. >> >> >> FWIW when I read the original email in the thread I got the impression >> that the application behavior was pretty simple WRT this table. But of >> course I could easily be wrong... > > You are not wrong. The use case is in fact the `userpref` table used by > Spamassassin. Left unmaintained, and given a large-enough user base, > this has the tendency to grow considerably over time, so we want to > gradually remove entries not actually used any more. (And we don’t want > to patch Spamassassin core code to do this by itself…) For this particular use case, would it be easier to periodically compare the list of usernames with userpref data against active accounts on the mail system? -kgd