Re: database design question, new to postgresql
От | Fei Liu |
---|---|
Тема | Re: database design question, new to postgresql |
Дата | |
Msg-id | 46326234.7000709@aepnetworks.com обсуждение исходный текст |
Ответ на | Re: database design question, new to postgresql (Fei Liu <fei.liu@aepnetworks.com>) |
Список | pgsql-novice |
Fei Liu wrote: > Fei Liu wrote: >> Hello group, I need to design and develop a web reporting system to >> let users query/view syslog files on a unix host. For now, I am >> concentrating on the authentication file that has user logon >> (success/failure) and logoff records. The log file is logrotated >> every week or so. My reporting system parses the log entries and put >> the result into a postgresql database (I am proposing to use >> postgresql as the backend). Since this deals with multi-year archive >> and I believe 'partitioing' is an ideal feature to handle this >> problem. So here is the design scheme: >> >> CREATE TABLE logon_success( >> name varchar(32) not null, >> srcip inet not null, >> date date not null, >> time time not null, >> ... >> ); >> >> >> CREATE TABLE logon_success_yy${year}mm${month}( >> CHECK (date >= DATE '$year-$month-01' AND date < DATE >> '$next_year-$next_month-1') >> ) >> INHERITS ($tname) >> ; >> >> As you can see from the sample code, I am using perl to dynamically >> generate children tables as I parse log files in a daily cron job >> script. Once the log file is analyzed and archived in the database, I >> have a simple web UI that sysadmin can select and view user logon >> events. I have built a sample framework and it works so far. Keep in >> mind, this reporting system is not limited to just user logon, it >> should also work with system events such as services >> failures/startup, hardware failures, etc >> >> Now here are my questions: >> 1) Should I use database to implement such a reporting system? Are >> there any alternatives, architects, designs? >> 2) Is partitioning a good approach to speed up log query/view? The >> user comment in partitioning in pgsql manual seems to indicate >> partitioning may be slower than non-partitioned table under certain >> circumstances. >> 3) How to avoid repetitive log entry scanning since my cron job >> script is run daily but logrotate runs weekly? This means everytime >> my script will be parsing duplicate entries. >> 4) When parsing log files, it's quite possible that there are >> identical entries, for example a user logins really fast, resulting 2 >> or more identical entries..In this case can I still use primary >> key/index at all? If I can, how do I design primary key or index to >> speed up query? >> 3) What are the most glaring limitations and flaws in my design? >> >> Thank you for taking time to review and answer my questions! Let me >> know if I am not clear on any specific detail.. >> >> Fei >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend > Let me add one more question, what are the best approaches to analyze > postgresql query performance and how to improve postgresql query > performance? > Fei My initial testing has not shown any significant difference between a partitioning approach and a plain (all entries in master) database approach... 2005-01-01 | 00:27:55 | firewood | ssh | Login Successful | None | local | user9819 | 192.168.1.31 My test was based on two artificial tables that has 1700 records per day from 2004-02-01 to 2007-04-27, around 2 million entries that are identical in both tables. My test script: echo Testing database $t1 time based time psql -p 5583 netilla postgres << EOF select count(date) from $t1 where date > '2005-03-01' and date < '2006-12-11'; \q EOF echo Testing database $t2 time based time psql -p 5583 netilla postgres << EOF select count(date) from $t2 where date > '2005-03-01' and date < '2006-12-11'; \q EOF Result: ./timing_test.sh Testing database logon_test time based count --------- 1121472 (1 row) 0.00user 0.00system 0:02.92elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+456minor)pagefaults 0swaps Testing database logon_test2 time based count --------- 1121472 (1 row) 0.00user 0.00system 0:02.52elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+456minor)pagefaults 0swaps But the numbers are really not static and logon_test2 (with partitioning) sometimes behave worse than logon_test... Fei
В списке pgsql-novice по дате отправления: