Re: PDF Parsing and Indexing
От | Mike Castle |
---|---|
Тема | Re: PDF Parsing and Indexing |
Дата | |
Msg-id | 20010615170202.I26165@thune.mrc-home.com обсуждение исходный текст |
Ответ на | Re: PDF Parsing and Indexing (Doug McNaught <doug@wireboard.com>) |
Список | pgsql-general |
On Fri, Jun 15, 2001 at 07:33:42PM -0400, Doug McNaught wrote: > "Raymond" <support@bigriverinfotech.com> writes: > > Has anybody had experience in doing this? Wonder if Google's solution to this is available. > provides for arbitrary placement of each glyph on the page. So the > word "this" might be encoded in the file as something like: > > moveto(100, 200) > draw("t") > moveto(105, 200) > draw("h") > moveto(112, 200) > draw("i") > moveto(115, 200) > draw("s") > > You can see that it would hard to index something like this in any > kind of useful way. PDF's generate from MS utilities (Word I think?) are notoriously bad for this. Big surprise. mrc -- Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/ We are all of us living in the shadow of Manhattan. -- Watchmen fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
В списке pgsql-general по дате отправления: