Decision support query inefficiencies ...
От | Gunther Schadow |
---|---|
Тема | Decision support query inefficiencies ... |
Дата | |
Msg-id | 3D459ED0.3030000@aurora.regenstrief.org обсуждение исходный текст |
Ответы |
Re: Decision support query inefficiencies ...
|
Список | pgsql-sql |
Hi, this is a more general question than about specific PostgreSQL implementation, but may be PostgreSQL has some potential specific answer, and hopefully you can give me some general hints. We have a large data base of, let's call them "Acts" (in our case it is clinical observations, but could just as well be purchase orders or account transactions, etc.) CREATE TABLE Act ( id OID NOT NULL PRIMARY KEY, subject_id OID NOT NULL FOREIGN KEY REFERENCES Subject(id), type_code VARCHAR NOT NULL FOREIGN KEY REFERENCES Act_type(code), time TIMESTAMP NOT NULL, /* detail... */ ); CREATE INDEX ON Act(subject_id, type_code, time); Now let's have 200 million of these acts on 2 million subjects in our test database. The production version will have about 100 times that amount. A very frequent decision support query we have to make is to get the last act of a certain type that occurred before a certain cut-off date. Worse, yet, the cutoff date is quite likely even a correlated subquery, but that's for later. For now, I'm always struck by the difficulty of asking the simple quesion of the most recent act of a certain type before a cutoff time: SELECT Act.* FROM (SELECT subject_id, type_code, MAX(time) AS time FROM Act WHERE Act.type_code = $ACT_TYPE AND Act.time < $CUTOFF_TIME GROUP BY subject_id, type_code) last INNER JOIN Act ON(Act.subject_id, Act.type_code, Act.time) =(last.subject_id, last.type_code, last.time); I'm not so worried about the complexity of writing this query but how inefficient it appears to be. 1. select the Acts of the specific type code 2. sort by subject_id, type_code and time 3. join the result again with the Act table, selecting the items again via random access Wouldn't it be nice to somehow retain the sort that happened for the GROUP BY operation, and just read the last item of each group with same group-by key? Can PostgreSQL do something like that? Is my desire for some optimization flawed? Am I overlooking an existing, well-known optimization one can do with plain SQL? I would greatly appreciate your advice or ideas -Gunther PS: The a materialized view or summary table would not help in this case because we're asking for the last value before a certain cutoff time. -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
В списке pgsql-sql по дате отправления: