Multivalent - Multivalent works on numerous document types, including PDF, HTML, DVI, UNIX man pages and more. It is especially useful for PDF, because tries to paste together word fragments into whole words. Multivalent also integrates with Lucene.
PDFtoHTML - PDFtoHTML is a utility which converts PDF files into HTML and XML formats. It is based on XPDF.