Re: PDF Parsing and Indexing
| От | Mike Castle |
|---|---|
| Тема | Re: PDF Parsing and Indexing |
| Дата | |
| Msg-id | 20010615170202.I26165@thune.mrc-home.com обсуждение исходный текст |
| Ответ на | Re: PDF Parsing and Indexing (Doug McNaught <doug@wireboard.com>) |
| Список | pgsql-general |
On Fri, Jun 15, 2001 at 07:33:42PM -0400, Doug McNaught wrote:
> "Raymond" <support@bigriverinfotech.com> writes:
> > Has anybody had experience in doing this?
Wonder if Google's solution to this is available.
> provides for arbitrary placement of each glyph on the page. So the
> word "this" might be encoded in the file as something like:
>
> moveto(100, 200)
> draw("t")
> moveto(105, 200)
> draw("h")
> moveto(112, 200)
> draw("i")
> moveto(115, 200)
> draw("s")
>
> You can see that it would hard to index something like this in any
> kind of useful way.
PDF's generate from MS utilities (Word I think?) are notoriously bad for
this. Big surprise.
mrc
--
Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
В списке pgsql-general по дате отправления: