Обсуждение: Read data from Postgres table pages
Hey,
I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.
Can you please point me to the postgres source header / cc files that encapsulate this functionality?
- List all pages for a table
- Read a given page for a table
Any pointers to the relevant source code would be appreciated.
Thanks,
Sushrut
Hi On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote: > I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like toavoid manual commands like pg_dump, I need access to the raw data. > > Can you please point me to the postgres source header / cc files that encapsulate this functionality? > - List all pages for a table > - Read a given page for a table > > Any pointers to the relevant source code would be appreciated. Why do you need to work on the source code level? Please, check this about having a binary copy of the database on the filesystem level. https://www.postgresql.org/docs/current/backup-file.html ------ Regards, Alexander Korotkov
I'd like to read individual rows from the pages as they are updated and stream them to a server to create a copy of the data.
The data will be rewritten to columnar format for analytics queries.
On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi
On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.
>
> Can you please point me to the postgres source header / cc files that encapsulate this functionality?
> - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.
Why do you need to work on the source code level?
Please, check this about having a binary copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html
------
Regards,
Alexander Korotkov
The binary I"m trying to create should automatically be able to read data from a postgres instance without users having to
run commands for backup / pg_dump etc.
Having access to the appropriate source headers would allow me to read the data.
On Tue, Mar 19, 2024 at 8:03 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:
I'd like to read individual rows from the pages as they are updated and stream them to a server to create a copy of the data.The data will be rewritten to columnar format for analytics queries.On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:Hi
On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.
>
> Can you please point me to the postgres source header / cc files that encapsulate this functionality?
> - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.
Why do you need to work on the source code level?
Please, check this about having a binary copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html
------
Regards,
Alexander Korotkov
On Tue, Mar 19, 2024 at 4:35 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote: > The binary I"m trying to create should automatically be able to read data from a postgres instance without users havingto > run commands for backup / pg_dump etc. > Having access to the appropriate source headers would allow me to read the data. Please, avoid the top-posting. https://en.wikipedia.org/wiki/Posting_style#Top-posting If you're looking to have a separate binary, why can't your binary just *connect* to the postgres database and query the data? This is what pg_dump does, you can just do the same directly. pg_dump doesn't access the raw data. Trying to read raw postgres data from the separate binary looks flat wrong for your purposes. First, you would have to replicate pretty much postgres internals inside. Second, you can read the consistent data only when postgres is stopped or didn't do any modifications since the last checkpoint. ------ Regards, Alexander Korotkov
If we query the DB directly, is it possible to know which new rows have been added since the last query? Is there a change pump that can be latched onto? I’m assuming the page data structs are encapsulated in specific headers which can be used to list / read pages. Why would Postgres need to be stopped to read the data? The read / query path in Postgres would also be reading these pageswhen the instance is running?
On Tue, Mar 19, 2024 at 4:48 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote: > > If we query the DB directly, is it possible to know which new rows have been added since the last query? > Is there a change pump that can be latched onto? Please, check this. https://www.postgresql.org/docs/current/logicaldecoding.html > I’m assuming the page data structs are encapsulated in specific headers which can be used to list / read pages. > Why would Postgres need to be stopped to read the data? The read / query path in Postgres would also be reading these pageswhen the instance is running? I think this would be a good point to start studying. https://www.interdb.jp/ The information there should be more than enough to forget this idea forever :) ------ Regards, Alexander Korotkov
lol, thanks for the inputs Alexander :)!