Big wide datasets
От | Michael Lush |
---|---|
Тема | Big wide datasets |
Дата | |
Msg-id | CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Big wide datasets
Re: Big wide datasets Re: Big wide datasets |
Список | pgsql-novice |
I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the form
sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....
I'd like to take subsets of both columns and rows for analysis
Two approaches spring to mind either unpack it into something like an RDF triple
ie
CREATE TABLE long_table (
sample_id varchar(20),
column_number int,
snp_data varchar(3));
for a table with 20 billion rows
or use the array datatype
CREATE TABLE wide_table (
sample_id,
snp_data[]);
Does anyone have any experience of this sort of thing?
(1) http://en.wikipedia.org/wiki/Genome-wide_association_study
--
Michael Lush
sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....
I'd like to take subsets of both columns and rows for analysis
Two approaches spring to mind either unpack it into something like an RDF triple
ie
CREATE TABLE long_table (
sample_id varchar(20),
column_number int,
snp_data varchar(3));
for a table with 20 billion rows
or use the array datatype
CREATE TABLE wide_table (
sample_id,
snp_data[]);
Does anyone have any experience of this sort of thing?
(1) http://en.wikipedia.org/wiki/Genome-wide_association_study
--
Michael Lush
В списке pgsql-novice по дате отправления: