Обсуждение: Use of partial index

Поиск
Список
Период
Сортировка

Use of partial index

От
"Leif B. Kristensen"
Дата:
I'm a little confused about partial indexes. I have a couple of tables, 
like this:

CREATE TABLE events (   event_id    INTEGER PRIMARY KEY,   tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
place_fk   INTEGER REFERENCES places (place_id),   event_date  CHAR(18) NOT NULL DEFAULT '000000003000000001',
sort_date  DATE NOT NULL DEFAULT '40041024BC',   event_text  TEXT NOT NULL DEFAULT '',   sentence    TEXT NOT NULL
DEFAULT''
 
);

To this table I have created a partial index:

CREATE INDEX events_born   ON events (tag_type_fk)   WHERE tag_type_fk = 2;

Another table:

CREATE TABLE participants ( -- the TMG 'E' file   participant_id      INTEGER PRIMARY KEY,   person_fk
INTEGERREFERENCES persons (person_id),   event_fk            INTEGER REFERENCES events (event_id),   role_type_fk
INTEGER REFERENCES role_types (role_type_id),   is_principal        BOOLEAN NOT NULL DEFAULT 'f',   is_primary_event
BOOLEANNOT NULL DEFAULT 'f',   participant_note    TEXT NOT NULL DEFAULT '',   participant_name    TEXT NOT NULL
DEFAULT'',   age_mean            INTEGER NOT NULL DEFAULT 0,   age_devi            INTEGER NOT NULL DEFAULT 0,
CONSTRAINTperson_event UNIQUE (person_id, event_id)
 
);

And a view:

CREATE OR REPLACE VIEW principals AS
SELECT   participants.person_fk AS person,   events.event_id AS event,   events.place_fk AS place,   events.event_date
ASevent_date,   events.sort_date AS sort_date,   events.tag_type_fk AS tag_type
 
FROM   events, participants
WHERE   events.event_id = participants.event_fk
AND   participants.is_principal IS TRUE;

Now, here's an "explain select":

pgslekt=> explain select event_date, place from principals where 
person=2 and tag_type=2;                                      QUERY PLAN
-----------------------------------------------------------------------Nested Loop  (cost=0.00..23.15 rows=2 width=26)
-> Index Scan using person_event on participants                        (cost=0.00..13.63 rows=3 width=4)        Index
Cond:(person_fk = 2)        Filter: (is_principal IS TRUE)  ->  Index Scan using events_pkey on events
     (cost=0.00..3.16 rows=1 width=30)        Index Cond: (events.event_id = "outer".event_fk)        Filter:
(tag_type_fk= 2)
 
(7 rader)

Why doesn't this SELECT use the partial index "events_born" above? Is 
there any way to make this happen?
-- 
Leif Biberg Kristensen
http://solumslekt.org/


Re: Use of partial index

От
"Dmitri Bichko"
Дата:
As I understand it, partial indices are generally useful when you only
want to index a range of values, or if the select condition is on a
different field from the one being indexed (eg: ON foo (a) WHERE b IS
NOT NULL).

I am just guessing here, but it sounds like 'person_fk = 2' is going to
be a lot more selective (ie return less rows) than 'tag_type_fk = 2', so
it's quicker to use the pkey and then filter the results.

Depending on how many 'tag_type' values you have, indexing on it will
not help at all.  In other words, if more than a few percent of the rows
have the value '2' for 'tag_type_fg', postgres will tend to favour more
selective indices if you are doing a join, or a seqscan if you are doing
a straight select on that value.

Does that help?
Dmitri
PS Your query seems to be quite quick already, why don't you like this
plan?

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Leif B.
> Kristensen
> Sent: Wednesday, October 05, 2005 11:17 AM
> To: Pgsql-sql@postgresql.org
> Subject: [SQL] Use of partial index
>
>
> I'm a little confused about partial indexes. I have a couple
> of tables,
> like this:
>
> CREATE TABLE events (
>     event_id    INTEGER PRIMARY KEY,
>     tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
>     place_fk    INTEGER REFERENCES places (place_id),
>     event_date  CHAR(18) NOT NULL DEFAULT '000000003000000001',
>     sort_date   DATE NOT NULL DEFAULT '40041024BC',
>     event_text  TEXT NOT NULL DEFAULT '',
>     sentence    TEXT NOT NULL DEFAULT ''
> );
>
> To this table I have created a partial index:
>
> CREATE INDEX events_born
>     ON events (tag_type_fk)
>     WHERE tag_type_fk = 2;
>
> Another table:
>
> CREATE TABLE participants ( -- the TMG 'E' file
>     participant_id      INTEGER PRIMARY KEY,
>     person_fk           INTEGER REFERENCES persons (person_id),
>     event_fk            INTEGER REFERENCES events (event_id),
>     role_type_fk        INTEGER REFERENCES role_types (role_type_id),
>     is_principal        BOOLEAN NOT NULL DEFAULT 'f',
>     is_primary_event    BOOLEAN NOT NULL DEFAULT 'f',
>     participant_note    TEXT NOT NULL DEFAULT '',
>     participant_name    TEXT NOT NULL DEFAULT '',
>     age_mean            INTEGER NOT NULL DEFAULT 0,
>     age_devi            INTEGER NOT NULL DEFAULT 0,
>     CONSTRAINT person_event UNIQUE (person_id, event_id)
> );
>
> And a view:
>
> CREATE OR REPLACE VIEW principals AS
> SELECT
>     participants.person_fk AS person,
>     events.event_id AS event,
>     events.place_fk AS place,
>     events.event_date AS event_date,
>     events.sort_date AS sort_date,
>     events.tag_type_fk AS tag_type
> FROM
>     events, participants
> WHERE
>     events.event_id = participants.event_fk
> AND
>     participants.is_principal IS TRUE;
>
> Now, here's an "explain select":
>
> pgslekt=> explain select event_date, place from principals where
> person=2 and tag_type=2;
>                                        QUERY PLAN
> --------------------------------------------------------------
> ---------
>  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
>    ->  Index Scan using person_event on participants
>                        (cost=0.00..13.63 rows=3 width=4)
>          Index Cond: (person_fk = 2)
>          Filter: (is_principal IS TRUE)
>    ->  Index Scan using events_pkey on events
>                        (cost=0.00..3.16 rows=1 width=30)
>          Index Cond: (events.event_id = "outer".event_fk)
>          Filter: (tag_type_fk = 2)
> (7 rader)
>
> Why doesn't this SELECT use the partial index "events_born" above? Is
> there any way to make this happen?
> --
> Leif Biberg Kristensen
> http://solumslekt.org/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
The information transmitted is intended only for the person or entity to which it is addressed and may contain
confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any
actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you
receivedthis in error, please contact the sender and delete the material from any computer 


Re: Use of partial index

От
Tom Lane
Дата:
"Leif B. Kristensen" <leif@solumslekt.org> writes:
> I'm a little confused about partial indexes. I have a couple of tables, 
> like this:

> CREATE TABLE events (
>     event_id    INTEGER PRIMARY KEY,
>     tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
>     place_fk    INTEGER REFERENCES places (place_id),
>     event_date  CHAR(18) NOT NULL DEFAULT '000000003000000001',
>     sort_date   DATE NOT NULL DEFAULT '40041024BC',
>     event_text  TEXT NOT NULL DEFAULT '',
>     sentence    TEXT NOT NULL DEFAULT ''
> );

> To this table I have created a partial index:

> CREATE INDEX events_born
>     ON events (tag_type_fk)
>     WHERE tag_type_fk = 2;

> ...

> Now, here's an "explain select":

> pgslekt=> explain select event_date, place from principals where 
> person=2 and tag_type=2;
>                                        QUERY PLAN
> -----------------------------------------------------------------------
>  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
>    ->  Index Scan using person_event on participants  
>                        (cost=0.00..13.63 rows=3 width=4)
>          Index Cond: (person_fk = 2)
>          Filter: (is_principal IS TRUE)
>    ->  Index Scan using events_pkey on events  
>                        (cost=0.00..3.16 rows=1 width=30)
>          Index Cond: (events.event_id = "outer".event_fk)
>          Filter: (tag_type_fk = 2)
> (7 rader)

> Why doesn't this SELECT use the partial index "events_born" above?

Because the plan it did choose is better.  events_born could only serve
to select the rows with tag_type_fk = 2; assuming there's more than one
of those rows, there would be multiple fetches needed to see if any of
them have the desired event_id.  With this plan it's getting at most one
row, by definition (since event_id is the primary key).

Had you created the partial index as

CREATE INDEX events_born   ON events (event_id)   WHERE tag_type_fk = 2;

then it would be competitive for this query, since the index could
effectively handle both constraints not just one.  (THe way you did
define it, the actual content of the index keys is just dead weight,
since they obviously must all be "2".  It's often better to define
the index column(s) of a partial index as some other column than the
one involved in the index predicate...)
        regards, tom lane


Re: Use of partial index

От
Stephan Szabo
Дата:
On Wed, 5 Oct 2005, Leif B. Kristensen wrote:

> I'm a little confused about partial indexes. I have a couple of tables,
> like this:
>
> CREATE TABLE events (
>     event_id    INTEGER PRIMARY KEY,
>     tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
>     place_fk    INTEGER REFERENCES places (place_id),
>     event_date  CHAR(18) NOT NULL DEFAULT '000000003000000001',
>     sort_date   DATE NOT NULL DEFAULT '40041024BC',
>     event_text  TEXT NOT NULL DEFAULT '',
>     sentence    TEXT NOT NULL DEFAULT ''
> );
>
> To this table I have created a partial index:
>
> CREATE INDEX events_born
>     ON events (tag_type_fk)
>     WHERE tag_type_fk = 2;
>
> Another table:
>
> CREATE TABLE participants ( -- the TMG 'E' file
>     participant_id      INTEGER PRIMARY KEY,
>     person_fk           INTEGER REFERENCES persons (person_id),
>     event_fk            INTEGER REFERENCES events (event_id),
>     role_type_fk        INTEGER REFERENCES role_types (role_type_id),
>     is_principal        BOOLEAN NOT NULL DEFAULT 'f',
>     is_primary_event    BOOLEAN NOT NULL DEFAULT 'f',
>     participant_note    TEXT NOT NULL DEFAULT '',
>     participant_name    TEXT NOT NULL DEFAULT '',
>     age_mean            INTEGER NOT NULL DEFAULT 0,
>     age_devi            INTEGER NOT NULL DEFAULT 0,
>     CONSTRAINT person_event UNIQUE (person_id, event_id)
> );
>
> And a view:
>
> CREATE OR REPLACE VIEW principals AS
> SELECT
>     participants.person_fk AS person,
>     events.event_id AS event,
>     events.place_fk AS place,
>     events.event_date AS event_date,
>     events.sort_date AS sort_date,
>     events.tag_type_fk AS tag_type
> FROM
>     events, participants
> WHERE
>     events.event_id = participants.event_fk
> AND
>     participants.is_principal IS TRUE;
>
> Now, here's an "explain select":
>
> pgslekt=> explain select event_date, place from principals where
> person=2 and tag_type=2;
>                                        QUERY PLAN
> -----------------------------------------------------------------------
>  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
>    ->  Index Scan using person_event on participants
>                        (cost=0.00..13.63 rows=3 width=4)
>          Index Cond: (person_fk = 2)
>          Filter: (is_principal IS TRUE)
>    ->  Index Scan using events_pkey on events
>                        (cost=0.00..3.16 rows=1 width=30)
>          Index Cond: (events.event_id = "outer".event_fk)
>          Filter: (tag_type_fk = 2)
> (7 rader)
>
> Why doesn't this SELECT use the partial index "events_born" above? Is
> there any way to make this happen?

I would think you'd want an index ON events(event_id) WHERE tag_type_fk=2
for the query given.



Re: Use of partial index

От
"Leif B. Kristensen"
Дата:
On Wednesday 05 October 2005 18:44, you wrote:
> As I understand it, partial indices are generally useful when you
> only want to index a range of values, or if the select condition is
> on a different field from the one being indexed (eg: ON foo (a) WHERE
> b IS NOT NULL).
>
> I am just guessing here, but it sounds like 'person_fk = 2' is going
> to be a lot more selective (ie return less rows) than 'tag_type_fk =
> 2', so it's quicker to use the pkey and then filter the results.

That makes a lot of sense to me. As any person will usually participate 
in several events, I'll estimate that the ratio between person_fk=x and 
tag_type_fk=y is about 1:4.

> Depending on how many 'tag_type' values you have, indexing on it will
> not help at all.  In other words, if more than a few percent of the
> rows have the value '2' for 'tag_type_fg', postgres will tend to
> favour more selective indices if you are doing a join, or a seqscan
> if you are doing a straight select on that value.

I have only 53 different tag types, but most of them are rather rare. 
The most heavilyly used are birth=2, death=3, and marriage=4, and I've 
created similar partial indexes for the other two. In a previous MySQL 
project that's roughly sharing the same data model, I created a 
redundant table called "marriages" that speeded up the generation of a 
family view by a factor of four. That's why I thought that a partial 
index would have a similar effect here.

> Does that help?

Yessir, thank you very much!

> Dmitri
> PS Your query seems to be quite quick already, why don't you like
> this plan?

I didn't say that I didn't like it, - I just was a little perplexed :-)
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE


Re: Use of partial index

От
"Leif B. Kristensen"
Дата:
On Wednesday 05 October 2005 18:49, you wrote:
[Leif]
> > Now, here's an "explain select":
> >
> > pgslekt=> explain select event_date, place from principals where
> > person=2 and tag_type=2;
> >                                        QUERY PLAN
> >
-----------------------------------------------------------------------
> >  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
> >    ->  Index Scan using person_event on participants  
> >                        (cost=0.00..13.63 rows=3 width=4)
> >          Index Cond: (person_fk = 2)
> >          Filter: (is_principal IS TRUE)
> >    ->  Index Scan using events_pkey on events  
> >                        (cost=0.00..3.16 rows=1 width=30)
> >          Index Cond: (events.event_id = "outer".event_fk)
> >          Filter: (tag_type_fk = 2)
> > (7 rader)
>
> Because the plan it did choose is better.  events_born could only
> serve to select the rows with tag_type_fk = 2; assuming there's more
> than one of those rows, there would be multiple fetches needed to see
> if any of them have the desired event_id.  With this plan it's getting
> at most one row, by definition (since event_id is the primary key).
>
> Had you created the partial index as
>
> CREATE INDEX events_born
>     ON events (event_id)
>     WHERE tag_type_fk = 2;
>
> then it would be competitive for this query, since the index could
> effectively handle both constraints not just one.  (THe way you did
> define it, the actual content of the index keys is just dead weight,
> since they obviously must all be "2".  It's often better to define
> the index column(s) of a partial index as some other column than the
> one involved in the index predicate...)
>
pgslekt=> create index events_born2 on events (event_id) where
tag_type_fk=2;
CREATE INDEX
pgslekt=> explain select event_date, place from principals where
person=2 and tag_type=2;                                      QUERY PLAN
----------------------------------------------------------------------------------------Nested Loop  (cost=0.00..22.88
rows=2width=26)  ->  Index Scan using person_event on participants  (cost=0.00..13.63  
rows=3 width=4)        Index Cond: (person_fk = 2)        Filter: (is_principal IS TRUE)  ->  Index Scan using
events_born2on events  (cost=0.00..3.07 rows=1  
width=30)        Index Cond: (events.event_id = "outer".event_fk)        Filter: (tag_type_fk = 2)
(7 rader)

From 23.15 to 22.88 ... but now at least it used my partial index, as it
does a slightly better job. I'm starting to get it - I think.

Thank you for your explanation. With regards to optimization, it seems
that I'm still too hung up in MySQL issues. PostgreSQL seems to behave
a lot more intelligently with queries.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE