PDF to text in python

PDFMiner is a suite of programs in python that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.

I downloaded a PDF from work which has my call stats in tabular form. It worked great.

[25] 21:56:47–> -t html test.pdf > test.htm
/Library/Python/2.6/site-packages/pdfminer-20091004-py2.6.egg/pdfminer/ DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5, struct

My project is to:
1. Email my weekly stats from work.
2. Construct a python program to read the data into a temporary textfile
3. Extract the data into a SQLite file
4. Produce statistics graphs on my average call times.


The Bike has arrived

A happy me after taking ownership of my new bike ... but nervous as hell because it has been 7 years since I last used a motorbike

Ready to go. You can find more pictures at flickr

10.10.2009 13-56-45 Suzki Vstrom First Day.jpg10.10.2009 13-56-30 Suzki Vstrom First Day.jpg

The only accessories I have purchased so far are:

1. Boots

2. New Jacket

3. Crash Guard for bike which you can see in the photos

4. New Gloves

The owner of the shop was saying it was a A0 model, i.e not A9 (2009) but A0 (2010) … but I have no way to confirm this. My model was manufactured in August 2009. I went to the shop recently and all the 2010 models are in now, they are all September 2009.