Re: automated 'discovery' of a table : potential primary key, columnsfunctional dependencies ...
От | Adrian Klaver |
---|---|
Тема | Re: automated 'discovery' of a table : potential primary key, columnsfunctional dependencies ... |
Дата | |
Msg-id | 80d60035-6c4a-4eac-df16-956fc49901e8@aklaver.com обсуждение исходный текст |
Ответ на | automated 'discovery' of a table : potential primary key, columnsfunctional dependencies ... (Rémi Cura <remi.cura@gmail.com>) |
Список | pgsql-general |
On 11/22/19 2:05 PM, Rémi Cura wrote: > Hello dear List, > I'm currently wondering about how to streamline the normalization of a > new table. > > I often have to import messy CSV files into the database, and making > clean normalized version of these takes me a lot of time (think dozens > of columns and millions of rows). To me messy means the information to do the below is not available. Personally I think you best bet is to get the data into tables and then use visualization tools to help you determine the below. My guess is there will be a lot of data cleaning going on before you can get to a well ordered table layout. > > I wrote some code to automatically import a CSV file and infer the type > of each column. > Now I'd like to quickly get an idea of > - what would be the most likely primary key > - what are the functional dependencies between the columns > > The goal is **not** to automate the modelling process, > but rather to automate the tedious phase of information collection > that is necessary for the DBA to make a good model. > > If this goes well, I'd like to automate further tedious stuff (like > splitting a table into several ones with appropriate foreign keys / > constraints) > > I'd be glad to have some feedback / pointers to tools in plpgsql or even > plpython. > > Thank you very much > Remi > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: