Обсуждение: Materialized views

Поиск
Список
Период
Сортировка

Materialized views

От
"Kevin Grittner"
Дата:
This is the time of year when the Wisconsin Courts formalize their
annual plan for where people will be spending the bulk of their time
in the coming year.  Two years ago at this time, managers decided
that serializable transactions were a big enough issue to justify
assigning about half of my 2011 time to working on PostgreSQL
enhancements for that.  This year our big database issue is
materialized views.
As we strive to create our next generation of software we find
ourselves wanting to provide "dashboard" type windows with graphs of
statistics which are insanely expensive to calculate on the fly. 
We've been creating ad hoc materialized views to deal with the
performance issues, but that is labor intensive.  I'm considering
submitting a proposal to management that I be assigned to work on
a declarative implementation in PostgreSQL to allow speedier
application development of software needing materialized views.
I'm posting to make sure that nobody else is already in the midst of
working on this, and to check regarding something on the Wiki page
for this topic:
http://wiki.postgresql.org/wiki/Materialized_Views
That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases.  I have high confidence that if time is
approved I could do the first two for the 9.3, but that last one
seems insanely complicated and not necessarily a good idea.  (That's
particularly true with some of the lazier strategies for maintaining
the data in the materialized view.)  I don't think we want to use
that 3rd component in our shop, anyway.  So the question is, would a
patch which does the first two without the third be accepted by the
community?
I'm not at the point of proposing specifics yet; the first phase
would be a close review of prior threads and work on the topic
(including the GSoC work).  Then I would discuss implementation
details here before coding.
The hope on our end, of course, is that the time spent on
implementing this would be more than compensated by application
programmer time savings as we work on our next generation of
application software, which seems like a pretty safe bet to me.
-Kevin


Re: Materialized views

От
Thom Brown
Дата:
On 8 November 2011 21:23, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> This is the time of year when the Wisconsin Courts formalize their
> annual plan for where people will be spending the bulk of their time
> in the coming year.  Two years ago at this time, managers decided
> that serializable transactions were a big enough issue to justify
> assigning about half of my 2011 time to working on PostgreSQL
> enhancements for that.  This year our big database issue is
> materialized views.
>
> As we strive to create our next generation of software we find
> ourselves wanting to provide "dashboard" type windows with graphs of
> statistics which are insanely expensive to calculate on the fly.
> We've been creating ad hoc materialized views to deal with the
> performance issues, but that is labor intensive.  I'm considering
> submitting a proposal to management that I be assigned to work on
> a declarative implementation in PostgreSQL to allow speedier
> application development of software needing materialized views.
>
> I'm posting to make sure that nobody else is already in the midst of
> working on this, and to check regarding something on the Wiki page
> for this topic:
>
> http://wiki.postgresql.org/wiki/Materialized_Views
>
> That page describes three components: creating MVs, updating MVs, and
> having the planner automatically detect when an MV matches some
> portion of a regular query and using the MV instead of the specified
> tables in such cases.  I have high confidence that if time is
> approved I could do the first two for the 9.3, but that last one
> seems insanely complicated and not necessarily a good idea.  (That's
> particularly true with some of the lazier strategies for maintaining
> the data in the materialized view.)  I don't think we want to use
> that 3rd component in our shop, anyway.  So the question is, would a
> patch which does the first two without the third be accepted by the
> community?
>
> I'm not at the point of proposing specifics yet; the first phase
> would be a close review of prior threads and work on the topic
> (including the GSoC work).  Then I would discuss implementation
> details here before coding.
>
> The hope on our end, of course, is that the time spent on
> implementing this would be more than compensated by application
> programmer time savings as we work on our next generation of
> application software, which seems like a pretty safe bet to me.

+1

I was pleased to see the subject of this thread.  I definitely think
it's worth it, especially if you're able to make it also work for
foreign tables (saving expense of seeking external data so can also
act as a local cache, but that's me getting carried away).  And I
agree regarding the planner detection.  If that ever were desired, it
certainly would't need implementing in the first phase.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Materialized views

От
Josh Berkus
Дата:
On 11/8/11 1:23 PM, Kevin Grittner wrote:
> So the question is, would a
> patch which does the first two without the third be accepted by the
> community?

AFAIC, yes.

For that matter, Part 3 is useful without parts 1 and 2, if someone
wanted to work on that.  I recall some academic work on automated
materialized view matching back in the 7.2 days.

And I can help test whatever you come up with ... I do a lot of matviews.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Materialized views

От
Greg Jaskiewicz
Дата:
On 8 Nov 2011, at 21:23, Kevin Grittner wrote:

> This is the time of year when the Wisconsin Courts formalize their
> annual plan for where people will be spending the bulk of their time
> in the coming year.  Two years ago at this time, managers decided
> that serializable transactions were a big enough issue to justify
> assigning about half of my 2011 time to working on PostgreSQL
> enhancements for that.  This year our big database issue is
> materialized views.


+1 for that too. Too many hacked together MVs here, and more coming. 



Re: Materialized views

От
Stephen Frost
Дата:
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
> So the question is, would a
> patch which does the first two without the third be accepted by the
> community?

I'm about 99% sure the answer to that is 'yes'.  Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..?  Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)
Thanks!
    Stephen

Re: Materialized views

От
"Kevin Grittner"
Дата:
Stephen Frost <sfrost@snowman.net> wrote:
> Are you thinking of having a background scheduler which handles
> the updating of schedule-driven (instead of trigger-driven) MVs..?
I'm trying not to get to far into discussing design in advance of
reviewing previous work and any papers on the topic.  That said, it
seems clear that the above is required but not sufficient.
> Not to try to feature-creep this on you, but you might consider
> how a new backend process which handles scheduled tasks could be
> generalized to go beyond handling just MV updates.. :)
I'll keep that in mind.  :-)
-Kevin


Re: Materialized views

От
"David E. Wheeler"
Дата:
On Nov 8, 2011, at 2:54 PM, Stephen Frost wrote:

>> So the question is, would a
>> patch which does the first two without the third be accepted by the
>> community?

+1 Definitely.

> I'm about 99% sure the answer to that is 'yes'.  Are you thinking of
> having a background scheduler which handles the updating of
> schedule-driven (instead of trigger-driven) MVs..?  Not to try to
> feature-creep this on you, but you might consider how a new backend
> process which handles scheduled tasks could be generalized to go beyond
> handling just MV updates.. :)

+1 That sure would be nice. Might be some useful stuff in pgAgent to pull into this (license permitting).

Best,

David



Re: Materialized views

От
Josh Berkus
Дата:
> I'm about 99% sure the answer to that is 'yes'.  Are you thinking of
> having a background scheduler which handles the updating of
> schedule-driven (instead of trigger-driven) MVs..?  Not to try to
> feature-creep this on you, but you might consider how a new backend
> process which handles scheduled tasks could be generalized to go beyond
> handling just MV updates.. :)

No feature-creeping; the two features described will be hard enough.
Besides, we have pg_agent for that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Materialized views

От
"David E. Wheeler"
Дата:
On Nov 8, 2011, at 3:25 PM, Josh Berkus wrote:

>> I'm about 99% sure the answer to that is 'yes'.  Are you thinking of
>> having a background scheduler which handles the updating of
>> schedule-driven (instead of trigger-driven) MVs..?  Not to try to
>> feature-creep this on you, but you might consider how a new backend
>> process which handles scheduled tasks could be generalized to go beyond
>> handling just MV updates.. :)
> 
> No feature-creeping; the two features described will be hard enough.
> Besides, we have pg_agent for that.

Well, there has to be some way to refresh an MV at regular intervals, no?

Best,

David



Re: Materialized views

От
"David E. Wheeler"
Дата:
On Nov 8, 2011, at 4:23 PM, Josh Berkus wrote:

>> Well, there has to be some way to refresh an MV at regular intervals, no?
>
> For periodic update matviews, yes.  However, Kevin needs only produce
> the command/function call for updating a generic periodic matview.  He
> doesn't need to provide a scheduling utility.  For simple cases, cron works.

Oh, I see. I was assuming it would be automatic-ish, with the declaration of the matview including update interval
information.Could certainly come later, though. 

Best,

David



Re: Materialized views

От
Josh Berkus
Дата:
>> No feature-creeping; the two features described will be hard enough.
>> Besides, we have pg_agent for that.
> 
> Well, there has to be some way to refresh an MV at regular intervals, no?

For periodic update matviews, yes.  However, Kevin needs only produce
the command/function call for updating a generic periodic matview.  He
doesn't need to provide a scheduling utility.  For simple cases, cron works.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Materialized views

От
Andrew Dunstan
Дата:

On 11/08/2011 06:25 PM, Josh Berkus wrote:
>> I'm about 99% sure the answer to that is 'yes'.  Are you thinking of
>> having a background scheduler which handles the updating of
>> schedule-driven (instead of trigger-driven) MVs..?  Not to try to
>> feature-creep this on you, but you might consider how a new backend
>> process which handles scheduled tasks could be generalized to go beyond
>> handling just MV updates.. :)
> No feature-creeping; the two features described will be hard enough.
> Besides, we have pg_agent for that.
>

Yeah. I'd be plenty happy with greedy MVs plus lazy MVs that need a 
manual refresh. These are both patterns I use a lot (the buildfarm 
dashboard is in effect a greedy MV although it's not updated by a trigger).

cheers

andrew




Re: Materialized views

От
Stephen Frost
Дата:
* Josh Berkus (josh@agliodbs.com) wrote:
> > Well, there has to be some way to refresh an MV at regular intervals, no?
>
> For periodic update matviews, yes.  However, Kevin needs only produce
> the command/function call for updating a generic periodic matview.  He
> doesn't need to provide a scheduling utility.  For simple cases, cron works.

Perhaps that would be an acceptable initial version, but I'd be pretty
disappointed and I certainly don't think we should stop there.

I'm quite aware of cron and as aware of the difficulties that many DBAs
are going to have getting cronjobs implemented.  There's permission
issues (we see this already with the requests to provide an in-PG DBA
interface for pg_hba.conf..), locking issues (writing decent scripts
that don't destroy the box if they take a bit too long, like >5m on a
5m scheduled job), authentication issues (we don't really want these
running as superuser if we can avoid it..), and probably other things
I'm not thinking of.
Thanks,
    Stephen

Re: Materialized views

От
Robert Haas
Дата:
On Tue, Nov 8, 2011 at 8:31 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Perhaps that would be an acceptable initial version, but I'd be pretty
> disappointed and I certainly don't think we should stop there.

I agree that a built-in job scheduler would be pretty awesome, but I
think it's a completely separate project from materialized views.
Each of the two is a major project all by itself; making one into a
dependency of the other is a recipe for failure.

In view of Kevin's success with SSI, I'm very pleased to see him
picking this out as his next target.  If and when it gets done, this
will be a great, great feature.  And we shouldn't be afraid to start
small and move incrementally toward what we really want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Materialized views

От
Greg Smith
Дата:
On 11/08/2011 04:23 PM, Kevin Grittner wrote:
> http://wiki.postgresql.org/wiki/Materialized_Views
> That page describes three components: creating MVs, updating MVs, and
> having the planner automatically detect when an MV matches some
> portion of a regular query and using the MV instead of the specified
> tables in such cases.

So far that page is just my initial notes on this subject based on some 
research I did, don't read too much into it.  Ignoring MV substitution 
is certainly the right plan for an initial development target.  An 
implementation that didn't update automatically at all is also still a 
useful step.  It's very easy to pick too big of a chunk of this project 
to chew on at once.

When I wrote that, it wasn't completely clear to me yet that doing the 
updates would involve so many of the same tricky bits that stalled 
progress on the MERGE command.  Nowadays I think of working out the 
concurrency issues in both this and MERGE, in the wake of true 
serialization, as another step in this implementation plan.  It's not 
strictly necessary, but there's some shared issues that might be tackled 
easier in the MERGE context instead.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: Materialized views

От
Simon Riggs
Дата:
On Tue, Nov 8, 2011 at 9:23 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

> That page describes three components: creating MVs, updating MVs, and
> having the planner automatically detect when an MV matches some
> portion of a regular query and using the MV instead of the specified
> tables in such cases.  I have high confidence that if time is
> approved I could do the first two for the 9.3, but that last one
> seems insanely complicated and not necessarily a good idea.  (That's
> particularly true with some of the lazier strategies for maintaining
> the data in the materialized view.)  I don't think we want to use
> that 3rd component in our shop, anyway.  So the question is, would a
> patch which does the first two without the third be accepted by the
> community?

For me, yes. I support and encourage your work. It's a big topic and
we must approach it incrementally.

Having said that, we should assume that #3 will be implemented and
that we need to collect appropriate metadata and anything else
required. So the design should foresee #3 and not in any way optimise
for the case where #3 doesn't happen. It may occur that #3 is added
during next cycle concurrently with this development.

I would also caution that all other databases currently provide #3 as
a matter of course. That is the "sauce" as far as many people are
concerned. Everything else is already achievable using external
application code. So I would not want people to start saying "we have
MVs" when in fact all we did was add declarative syntax to support
what was already possible - we could easily publicise that incorrectly
at release time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Materialized views

От
"Kevin Grittner"
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> I'm considering submitting a proposal to management that I be
> assigned to work on a declarative implementation in PostgreSQL to
> allow speedier application development of software needing
> materialized views.
Thanks to all who provided feedback and support in response to my
post.
Based on the feedback here and off-list, I did submit a proposal. 
It was just approved by the appropriate steering committee
(consisting of our CIO, the Director of State Courts, District Court
Administrators, Judges, Clerks of Court, and other stake-holders) as
a low-priority project.  That means that I expect I'll have the time
to get a patch together in time for 9.3, but the times at which the
decks will be clear of other assignments to allow work on this will
not be very predictable.  I'll probably be on-again, off-again
throughout the year.  I apologize in advance for the fact that the
times when I will be able to work on it might not fit well with the
release cycle or CFs, but I kinda have to take what I can get in
that regard.
-Kevin