custom crosstab question
От | Michael Swierczek |
---|---|
Тема | custom crosstab question |
Дата | |
Msg-id | 68b5b5880704250736v41261ffg5b506ecb7978af0d@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: custom crosstab question
|
Список | pgsql-novice |
Ladies and Gentlemen, This will be long, I apologize. I'm mostly looking for someone to tell me there's an obvious solution I'm missing. I have a table 'answer' with medical survey answers. Periodically the data must be exported to CSV files in a quasi-crosstab format so they can be examined by statisticians. The complication is that many of the questions in the medical surveys can be skipped entirely (if you report not using drugs at all, we don't ask how recently you used marijuana, and so forth...) For analysis, the exported data needs a 'category' column for each survey question even if it has no corresponding answer and not just the survey questions with answers that actually exist in the database. Here is a simplified example of the tables, showing sample data for one survey: survey_question: (id | survey_id | question_code | question_order) 1 | 1 | 'drug' | 1 2 | 1 | 'marijuana' | 2 3 | 1 | 'sick' | 3 answer: (id | survey_event_id | question_code | answer_order | answer_value) 1 | 1 | 'drug' | 1 | 1 2 | 1 | 'sick' | 2 | 1 Using tablefunc/crosstab, the resulting export would be (survey_event_id | 'drug' | 'sick' .... ) But I need (survey_event_id | 'drug' | 'marijuana' | 'sick' .... ) I wrote a program that takes a 'survey_id' input, queries PostgreSQL for the set of question_codes, and then generates a massive query in the form: --- begin huge query select an0.survey_event_id, an0.answer_value, an1.answer_value, an2.answer_value, an3.answer_value ..... from answer an0 left join answer an1 where an0.survey_event_id = an1.survey_event_id and an1.question_code = 'drug' left join answer an2 where an0.survey_event_id = an2.survey_event_id and an2.question_code = 'marijuana' left join.... --- end huge query Depending upon the survey, the resulting queries have between 10 and 250 joins on the answer_table. I run them through psql and pipe the results to a csv file. They work, but they're pretty slow. There are 14,300 total survey_event entries from 50 different surveys in the test database I'm using, and exporting all of the data in this way takes 3 hours. I'm using PostgreSQL 8.1.9 and 8.2.4 (the latter is faster, although I don't have exact figures how much). Thanks, Mike
В списке pgsql-novice по дате отправления: