Обсуждение: Read table rows in chunks

Поиск
Список
Период
Сортировка

Read table rows in chunks

От
Sushrut Shivaswamy
Дата:
Hey,

I"m trying to read the rows of a table in chunks to process them in a background worker.
I want to ensure that each row is processed only once.

I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT {limit_size}` functionality for this but I"m running into issues.

Some approaches I had in mind that aren't working out:
 - Try to use the transaction id to query rows created since the last processed transaction id 
      - It seems Postgres does not expose row transaction ids so this approach is not feasible
 - Rely on OFFSET / LIMIT combination to query the next chunk of data 
      - SELECT * does not guarantee ordering of rows so it's possible older rows repeat or newer rows are missed in a chunk

Can you please suggest any alternative to periodically read rows from a table in chunks while processing each row exactly once.

Thanks,
Sushrut



Re: Read table rows in chunks

От
Kashif Zeeshan
Дата:
Hi

You can also use the following approaches.

1. Cursors
2. FETCH with OFFSET clause

Regards
Kashif Zeeshan
Bitnine Global

On Sat, Apr 27, 2024 at 12:47 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:
Hey,

I"m trying to read the rows of a table in chunks to process them in a background worker.
I want to ensure that each row is processed only once.

I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT {limit_size}` functionality for this but I"m running into issues.

Some approaches I had in mind that aren't working out:
 - Try to use the transaction id to query rows created since the last processed transaction id 
      - It seems Postgres does not expose row transaction ids so this approach is not feasible
 - Rely on OFFSET / LIMIT combination to query the next chunk of data 
      - SELECT * does not guarantee ordering of rows so it's possible older rows repeat or newer rows are missed in a chunk

Can you please suggest any alternative to periodically read rows from a table in chunks while processing each row exactly once.

Thanks,
Sushrut



Re: Read table rows in chunks

От
"David G. Johnston"
Дата:
On Sat, Apr 27, 2024 at 12:47 AM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:

I"m trying to read the rows of a table in chunks to process them in a background worker.

This list really isn't the place for this kind of discussion.  You are doing application-level stuff, not working on patches for core.  General discussions and questions like this should be directed to the -general mailing list.

I want to ensure that each row is processed only once.

Is failure during processing possible?


I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT {limit_size}` functionality for this but I"m running into issues.

FOR UPDATE and SKIPPED LOCKED clauses usually come into play for this use case.
 
Can you please suggest any alternative to periodically read rows from a table in chunks while processing each row exactly once.


I think you are fooling yourself if you think you can do this without writing back to the row the fact it has been processed.  At which point ensuring that you only retrieve and process unprocessed rows is trivial - just don't select ones with a status of processed.

If adding a column to the data is not possible, record processed row identifiers into a separate table and inspect that.

DavId J.