Re: html to postgres...
От | Richard Huxton |
---|---|
Тема | Re: html to postgres... |
Дата | |
Msg-id | 001801c10e14$1df1b880$1001a8c0@archonet.com обсуждение исходный текст |
Ответ на | html to postgres... (Tony Grant <tony@animaproductions.com>) |
Список | pgsql-general |
From: "Tony Grant" <tony@animaproductions.com> > On 16 Jul 2001 11:07:55 -0400, Mitch Vincent wrote: > > You could put the entire HTML page directly into a text type field in > > PG..... That would give you limited flexibility as far as searching and > > indexing goes but you didn't mention any specifics of what you were > > attempting to do by having the pages in a database.... > > Yes I was vague - the heat is coming back... > > These are film and director pages in a movie site. I am looking at > HTML->XML tools then with a parser I should be able to create a tab > delimited text file. Did something similar myself a while ago. Assuming the pages were all generated from a template originally, I found the following the simplest. Construct your database structure, create some test data. Create output system (db=>xml=>html whatever) Build a (set of) perl script(s) to parse the HTML and strip the data out (if your pages are anything like mine, they're not *identical* formats so you'll end up needing something custom-built). Push the data into PostgreSQL. Publish the website from the database Run a "diff" of the old and new pages Tweak system as required and repeat until satisfied everything works. The key problem I found was that unless the pages were generated from a database to start with, they all seemed to have minor changes. The only way I could be satisfied I'd not missed data was to publish and compare. The first couple of "diff"s were scary, and I ended up cutting and pasting a few pieces manually, but I got everything out. HTH - Richard Huxton
В списке pgsql-general по дате отправления: