Re: Columnar format export in Postgres
От | Sutou Kouhei |
---|---|
Тема | Re: Columnar format export in Postgres |
Дата | |
Msg-id | 20240616.063220.999225191405879719.kou@clear-code.com обсуждение исходный текст |
Ответ на | Re: Columnar format export in Postgres (Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com>) |
Список | pgsql-hackers |
Hi, In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com> "Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530, Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote: > - To facilitate efficient querying it would help to export multiple > parquet files for the table instead of a single file. > Having multiple files allows queries to skip chunks if the key range in > the chunk does not match query filter criteria. > Even within a chunk it would help to be able to configure the size of a > row group. > - I'm not sure how these parameters will be exposed within `COPY TO`. > Or maybe the extension implementing the `COPY TO` handler will > allow this configuration? Yes. But adding support for custom COPY TO options is out-of-scope in the first version. We will focus on only the minimal features in the first version. We can improve it later based on use-cases. See also: https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com > - Regarding using file_fdw to read Apache Arrow and Apache Parquet file > because file_fdw is based on COPY FROM: > - I'm not too clear on this. file_fdw seems to allow creating a table > from data on disk exported using COPY TO. Correct. > But is the newly created table still using the data on disk(maybe in > columnar format or csv) or is it just reading that data to create a row > based table. The former. > I'm not aware of any capability in the postgres planner to read > columnar files currently without using an extension like parquet_fdw. Correct. We still need another approach such as parquet_fdw with the COPY format extensible feature to optimize query against Apache Parquet data. file_fdw can just read Apache Parquet data by SELECT. Sorry for confusing you. Thanks, -- kou
В списке pgsql-hackers по дате отправления: