Big wide datasets

Поиск
Список
Период
Сортировка
От Michael Lush
Тема Big wide datasets
Дата
Msg-id CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Big wide datasets  ("Jean-Yves F. Barbier" <12ukwn@gmail.com>)
Re: Big wide datasets  ("Robert D. Schnabel" <schnabelr@missouri.edu>)
Re: Big wide datasets  (Steve Crawford <scrawford@pinpointresearch.com>)
Список pgsql-novice
I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the form

sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....

I'd like to take subsets of both columns and rows for analysis

Two approaches spring to mind either unpack it into something like an RDF triple

ie
CREATE TABLE long_table (
                               sample_id  varchar(20),
                               column_number int,
                               snp_data  varchar(3));

for a table with 20 billion rows

or use the array datatype

CREATE TABLE wide_table (
                                sample_id,
                                snp_data[]);

Does anyone have any experience of this sort of thing?

(1) http://en.wikipedia.org/wiki/Genome-wide_association_study

--
Michael Lush

В списке pgsql-novice по дате отправления:

Предыдущее
От: Ioannis Anagnostopoulos
Дата:
Сообщение: What is faster?
Следующее
От: "Jean-Yves F. Barbier"
Дата:
Сообщение: Re: Big wide datasets