I have been using the
PDFBox library when working with PDF files, most commonly to extract the text for
lucene indexing and generating thumbnails conveniently in Java. Recently I have started to see this error when generating thumbnails, which come out as properly sized plain white pages:
java.io.IOException: Unknown stream filter:COSName{JBIG2Decode}
Some of the latest PDFs, in this case from a new Xerox copier (which scans into PDF format), generates black and white images in PDFs using the
JBIG2 format. Java has no built-in support for JBIG2 yet, and building a handler for this image type into the PDFBox library isn't an option for the development team at this time.