« Economics link roundup | Main | Raise the price of vanilla enough and everyone will order the rocky-road »

July 3, 2009

Getting data out of PDF files is hard

I was helping a friend get data out of PDF files with limited tools yesterday. Your best bet is to use the text select tool while holding the ALT key, so you can select a column at a time. I use OCR software that came with my scanner, OmniPage, and in general I find it pretty easy to get data out. When I used to work in banking I didn't have access to such software and I used to dread extracting such data. My friend writes me this morning to mention he discovered PDF to Excel, which he really liked. I haven't tried it yet but I will give it a shot on my next data extraction project. Fingers crossed that it can do something intelligent with foot notes and Greek symbols.

Posted by OneEyedMan at July 3, 2009 8:20 AM

Comments

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?