less than 2 sec for response - possible?
От | trafdev |
---|---|
Тема | less than 2 sec for response - possible? |
Дата | |
Msg-id | 38b8542e-e011-370d-3863-207126ac9b33@mail.ru обсуждение исходный текст |
Ответы |
Re: less than 2 sec for response - possible?
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: less than 2 sec for response - possible? (Torsten Zuehlsdorff <mailinglists@toco-domains.de>) |
Список | pgsql-performance |
Hi. I'm trying to build an OLAP-oriented DB based on PostgresSQL. User works with a paginated report in the web-browser. Interface allows to fetch data for a custom date-range selection, display individual rows (20-50 per page) and totals (for entire selection, even not visible on the current page) and sorting by any column. The main goal is to deliver results of the basic SELECT queries to the end-user in less than 2 seconds. I was able to achieve that except for one table (the biggest one). It consist of multiple dimensions (date, gran, aid, pid, sid, fid, subid) and metrics (see below). User can filter by any dimension and sort by any metric. Here is a CREATE script for this table: CREATE TABLE stats.feed_sub ( date date NOT NULL, gran interval NOT NULL, aid smallint NOT NULL, pid smallint NOT NULL, sid smallint NOT NULL, fid smallint NOT NULL, subid text NOT NULL, rev_est_pub real NOT NULL, rev_est_feed real NOT NULL, rev_raw real NOT NULL, c_total bigint NOT NULL, c_passed bigint NOT NULL, q_total bigint NOT NULL, q_passed bigint NOT NULL, q_filt_geo bigint NOT NULL, q_filt_browser bigint NOT NULL, q_filt_os bigint NOT NULL, q_filt_ip bigint NOT NULL, q_filt_subid bigint NOT NULL, q_filt_pause bigint NOT NULL, q_filt_click_cap_ip bigint NOT NULL, q_filt_query_cap bigint NOT NULL, q_filt_click_cap bigint NOT NULL, q_filt_rev_cap bigint NOT NULL, q_filt_erpm_floor bigint NOT NULL, c_filt_click_cap_ip bigint NOT NULL, c_filt_doubleclick bigint NOT NULL, c_filt_url_expired bigint NOT NULL, c_filt_fast_click bigint NOT NULL, c_filt_delay_clicks bigint NOT NULL, c_filt_ip_mismatch bigint NOT NULL, c_filt_ref_mismatch bigint NOT NULL, c_filt_lng_mismatch bigint NOT NULL, c_filt_ua_mismatch bigint NOT NULL, res_impr bigint NOT NULL, rev_ver_pub real, rev_ver_feed real, c_ver bigint, q_filt_ref bigint NOT NULL ) WITH ( OIDS=FALSE ); CREATE INDEX ix_feed_sub_date ON stats.feed_sub USING brin (date); CREATE UNIQUE INDEX ixu_feed_sub ON stats.feed_sub USING btree (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default"); Here is some sizing info (https://wiki.postgresql.org/wiki/Disk_Usage): relation,size stats.feed_sub,5644 MB stats.ixu_feed_sub,1594 MB row_estimate 15865627 Here is the typical query (for totals beige): SELECT sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip, sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target, sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period, sum(stats.feed_sub.c_filt_fast_click) AS fast_click, sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch, sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch, sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch, sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch, sum(stats.feed_sub.c_filt_url_expired) AS url_expired, stats.feed_sub.subid AS stats_feed_sub_subid, stats.feed_sub.sid AS stats_feed_sub_sid FROM stats.feed_sub WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND stats.feed_sub.gran = '1 day' AND stats.feed_sub.aid = 3 GROUP BY stats.feed_sub.subid, stats.feed_sub.sid; QUERY PLAN HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual time=7207.825..7335.473 rows=126044 loops=1) " Group Key: subid, sid" Buffers: shared hit=3635804 -> Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38 rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 loops=1) Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time zone) AND (gran = '1 day'::interval) AND (aid = 3)) Buffers: shared hit=3635804 Planning time: 0.150 ms Execution time: 7352.009 ms As I can see - it takes 3.6 seconds just for an index scan (which sits in RAM). +3 seconds for groupings +1-2 seconds for network transfers, so I'm completely out of my "sub 2 seconds" goal. Questions are: 1. Am I using the right DB\architecture for achieving my goal? Are there any better solution for that? 2. Have I reached some physical limits? Will installing faster RAM\CPU help? Thanks in advance! Server config: OS: > uname -a FreeBSD sqldb 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 CPU: Intel(R) Xeon(R) CPU E5-1630 v3 > sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu' hw.machine: amd64 hw.model: Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz hw.ncpu: 8 hw.machine_arch: amd64 MEM: 64GB > sysctl hw.physmem hw.physmem: 68572983296 HDD: 2x480GB SSD (ZFS mirror) > camcontrol devlist <INTEL SSDSC2BB480H4 D2010380> at scbus5 target 0 lun 0 (ada0,pass1) <INTEL SSDSC2BB480H4 D2010380> at scbus6 target 0 lun 0 (ada1,pass2) FS: > zfs list NAME USED AVAIL REFER MOUNTPOINT zroot 36.5G 390G 96K /zroot ... zroot/ara/sqldb/pgsql 33.7G 390G 33.7G /ara/sqldb/pgsql > zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql NAME PROPERTY VALUE SOURCE zroot/ara/sqldb/pgsql primarycache all local zroot/ara/sqldb/pgsql recordsize 8K local zroot/ara/sqldb/pgsql logbias latency local zroot/ara/sqldb/pgsql compression lz4 inherited from zroot Misc: > cat /etc/sysctl.conf vfs.zfs.metaslab.lba_weighting_enabled=0 Postgres: > /usr/local/bin/postgres --version postgres (PostgreSQL) 9.5.3 > cat postgresql.conf: ... listen_addresses = '*' max_connections = 100 shared_buffers = 16GB effective_cache_size = 48GB work_mem = 500MB maintenance_work_mem = 2GB min_wal_size = 4GB max_wal_size = 8GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 500 random_page_cost = 1 log_lock_waits = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_destination = 'csvlog' logging_collector = on log_min_duration_statement = 10000 shared_preload_libraries = 'pg_stat_statements' track_activity_query_size = 10000 track_io_timing = on
В списке pgsql-performance по дате отправления: