Re: Date format for bulk copy
От | David Rysdam |
---|---|
Тема | Re: Date format for bulk copy |
Дата | |
Msg-id | 416D6924.8090509@ll.mit.edu обсуждение исходный текст |
Ответ на | Re: Date format for bulk copy (Greg Stark <gsstark@mit.edu>) |
Список | pgsql-general |
Greg Stark wrote: >David Rysdam <drysdam@ll.mit.edu> writes: > > > >>In my brute force port, I just bulk copied the date >>fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY >>HH:MI:SS:MSAM'). >> >> > > > >>Again, I created a temporary table and did a decode(field, 'hex') to the >>real table. >> >> > >This is the standard approach. You're rather lucky these are the only >data representation changes you've had to do so far. I fear you'll run into >more and more complex changes over time and trying to avoid the temporary >table will get harder and harder. > > > No, I think I'm OK there. These are programmatically-generated values and I've already been through them all once. Just the millisecond issue and the hex binary issue AFAIK. >If it were me I would consider processing the files in perl. It should be >pretty easy to do both of these modifications very quickly. > > > Very quick and easy to do one time. A little trickier to handle in an elegant, maintainable way for the dozens of data reloads I do every month for GBs of data onto two different server types. >If you really want to go with a custom C code then you might be able to just >grab the byteain/byteaout functions from src/backend/util/adt/varlena into a >separate module and create new functions with modified names. Load it with >CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain'; > >Or maybe create the function as my_byteain in postgres and then update the >catalog entries somehow. I'm not sure how to do that but it shouldn't be too >hard. And it might make it easier to do the substitution for the data load and >then undo the change afterwards. > > > Why not create a type and then define the load function to be the equivalent of "decode('hex')"? >Doing the same for timmestamp is a bit trickier but you could copy >ParseDateTime from datetime.c as a static function for your module. > >Be careful though, test this out thoroughly on a test database. I'm not sure >of all the impacts of altering the in/out functions for data types. I expect >it would break pg_dump, for example. And I would worry about the statistics >tables too. > > > This is kind of a hybrid of my suggestions and the problems are a hybrid as well. :) 1) Just change the timestamp type so that it allows a ':' delimiter for milliseconds. Potential problems: Other parts of the code won't expect it. People don't want that. 2) Create a new type. Potential problem: Things like date ranges probably wouldn't work anymore, since the server wouldn't know it's a date now.
В списке pgsql-general по дате отправления: