Re: COPY FROM WHEN condition
От | Tomas Vondra |
---|---|
Тема | Re: COPY FROM WHEN condition |
Дата | |
Msg-id | 0623219e-c034-bc10-c546-a01da35e6349@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: COPY FROM WHEN condition (Corey Huinker <corey.huinker@gmail.com>) |
Список | pgsql-hackers |
On 11/02/2018 03:57 AM, Corey Huinker wrote: > > Are you thinking something like having a COPY command that provides > > results in such a way that they could be referenced in a FROM clause > > (perhaps a COPY that defines a cursor…)? > > That would also be nice, but what I was thinking of was that some > highly restricted subset of cases of SQL in general could lend > themselves to levels of optimization that would be impractical in > other contexts. > > > If COPY (or a syntactical equivalent) can return a result set, then the > whole of SQL is available to filter and aggregate the results and we > don't have to invent new syntax, or endure confusion whenCOPY-WHEN > syntax behaves subtly different from a similar FROM-WHERE. > > Also, what would we be saving computationally? The whole file (or > program output) has to be consumed no matter what, the columns have to > be parsed no matter what. At least some of the columns have to be > converted to their assigned datatypes enough to know whether or not to > filter the row, but we might be able push that logic inside a copy. I'm > thinking of something like this: > > SELECT x.a, sum(x.b) > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, > b numeric, c text, d date, e json) ) > WHERE x.d >= '2018-11-01' > > > In this case, there is the /opportunity/ to see the following optimizations: > - columns c and e are never referenced, and need never be turned into a > datum (though we might do so just to confirm that they conform to the > data type) > - if column d is converted first, we can filter on it and avoid > converting columns a,b > - whatever optimizations we can infer from knowing that the two > surviving columns will go directly into an aggregate > > If we go this route, we can train the planner to notice other > optimizations and add those mechanisms at that time, and then existing > code gets faster. > > If we go the COPY-WHEN route, then we have to make up new syntax for > every possible future optimization. IMHO those two things address vastly different use-cases. The COPY WHEN case deals with filtering data while importing them into a database, while what you're describing seems to be more about querying data stored in a CSV file. But we already do have a solution for that - FDW, and I'd say it's working pretty well. And AFAIK it does give you tools to implement most of what you're asking for. I don't see why should we bolt this on top of COPY, or how is it an alternative to COPY WHEN. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: