Interesting question for LARGE (> 10 Million Rows).

Поиск
Список
Период
Сортировка
От Larry Rosenman
Тема Interesting question for LARGE (> 10 Million Rows).
Дата
Msg-id 20010530065059.A28696@lerami.lerctr.org
обсуждение исходный текст
Список pgsql-general
I am loading (for the first time..) a VERY LARGE (to me) data base of
traffic statistics for the ISP I work for.

The schema looks as follows:


--
-- Selected TOC Entries:
--
\connect - neteng
--
-- TOC Entry ID 3 (OID 108735)
--
-- Name: traffic Type: TABLE Owner: neteng
--

CREATE TABLE "traffic" (
    "asn" integer,
    "protocol" integer,
    "pkts_src" bigint,
    "pkts_dst" bigint,
    "bytes_src" bigint,
    "bytes_dst" bigint,
    "secs_src" bigint,
    "secs_dst" bigint,
    "early" timestamp with time zone,
    "late" timestamp with time zone
);

--
-- TOC Entry ID 2 (OID 108754)
--
-- Name: protocol_id_seq Type: SEQUENCE Owner: neteng
--

CREATE SEQUENCE "protocol_id_seq" start 1 increment 1 maxvalue 2147483647 minvalue 1  cache 1 ;

--
-- TOC Entry ID 4 (OID 108773)
--
-- Name: protocol Type: TABLE Owner: neteng
--

CREATE TABLE "protocol" (
    "id" integer DEFAULT nextval('"protocol_id_seq"'::text) NOT NULL,
    "protocol" character varying(256) NOT NULL,
    Constraint "protocol_pkey" Primary Key ("id")
);

--
-- TOC Entry ID 10 (OID 7150793)
--
-- Name: "get_protocol_id" (character varying) Type: FUNCTION Owner: neteng
--

CREATE FUNCTION "get_protocol_id" (character varying) RETURNS integer AS 'SELECT id FROM protocol
    WHERE protocol = $1;' LANGUAGE 'sql' WITH ( iscachable );

--
-- TOC Entry ID 11 (OID 7150794)
--
-- Name: "get_protocol" (integer) Type: FUNCTION Owner: neteng
--

CREATE FUNCTION "get_protocol" (integer) RETURNS character varying AS 'SELECT protocol FROM protocol
    WHERE id = $1;' LANGUAGE 'sql' WITH ( iscachable );

--
-- TOC Entry ID 5 (OID 108735)
--
-- Name: "early_index" Type: INDEX Owner: neteng
--

CREATE  INDEX "early_index" on "traffic" using btree ( "early" "timestamp_ops" );

--
-- TOC Entry ID 6 (OID 108735)
--
-- Name: "late_index" Type: INDEX Owner: neteng
--

CREATE  INDEX "late_index" on "traffic" using btree ( "late" "timestamp_ops" );

--
-- TOC Entry ID 7 (OID 108735)
--
-- Name: "asn_index" Type: INDEX Owner: neteng
--

CREATE  INDEX "asn_index" on "traffic" using btree ( "asn" "int4_ops" );

--
-- TOC Entry ID 8 (OID 108735)
--
-- Name: "protocol_index" Type: INDEX Owner: neteng
--

CREATE  INDEX "protocol_index" on "traffic" using btree ( "protocol" "int4_ops" );

--
-- TOC Entry ID 9 (OID 108735)
--
-- Name: "asn_protocol_index" Type: INDEX Owner: neteng
--

CREATE  INDEX "asn_protocol_index" on "traffic" using btree ( "asn" "int4_ops", "protocol" "int4_ops" );

I collect 5 minute data from my routers using netflow, and drop
summary records for nntp, smtp, pop3, web, web/ssl, and other into
this table.

My question is what can I do to make date and asn/protocol queries
relatively fast?

What improvements would the guru's out there recommend.  This is with
7.1.2 or beter PG.  The data is on a NetApp, and the server has 512Meg
real running on FreeBSD.  I have *NOT* changed the default sizes of
SHMEM yet.

I have only loaded 2-3 days worth of data into it, and already have
7million rows, and  I have data for this entire year.

The type of things I want to do is find who I'm sending lots of data
to (to make peering decisions), and graph what protocols are common
for any given random period of time.

Any suggestions would be appreciated.

Any questions I'll be happy to supply what data I can.

Thanks!

LER


--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

В списке pgsql-general по дате отправления:

Предыдущее
От: "Karen Ellrick"
Дата:
Сообщение: RE: Problems with new data location
Следующее
От: newsreader@mediaone.net
Дата:
Сообщение: DBD::Pg 1.00