DevBox 2.0
Other devbox machines should be used only as host machines to avoid any data conflict. VMs in devbox are created in 192.168.0.0/16 private subnet. So, VMs running on one devbox cannot connect to VM running in another devbox. Controller devbox machine will be referred to as controller & all other devbox machines will be referred to as hosts. SynQuacer™ E-Series is a software development environment compliant with Linaro’s 96Boards open hardware specification. This has been built jointly by Socionext Inc., Linaro and GIGABYTE.
Environment
PDFBox 2.0.0 requires at least Java 6
Packages
There are some significant changes to the package structure of PDFBox:
- Jempbox is no longer supported and was removed in favour of Xmpbox
- the package
org.apache.pdfbox.pdmodel.edit
was removed. The only class containedPDPageContentStream
was moved to the parent package. - all examples were moved to the new package “pdfbox-examples”
- all commandline tools were moved to the new package “pdfbox-tools”
- all debugger related stuff was moved to the new package “pdfbox-debugger”
- the new package “debugger-app” provides a standalone pre built binary for the debugger
Dependency Updates
All libraries on which PDFBox depends are updated to their latest stable versions:
- Bouncy Castle 1.53
- Apache Commons Logging 1.2
For test support the libraries are updated to
- JUnit 4.12
- JAI Image Core 1.3.1
- JAI JPEG2000 1.3.0
- Levigo JBIG ImageIO Plugin 1.6.3
For PDFBox Preflight
- Apache Commons IO 2.4
Breaking Changes to the Library
Deprecated API calls
Most deprecated API calls in PDFBox 1.8.x have been removed for PDFBox 2.0.0
API Changes
The API changes are reflected in the Javadoc for PDFBox 2.0.0. The most notable changes are:
getCOSDictionary()
is no longer used. InsteadgetCOSObject
now returns the matchingCOSBase
subtype.PDXObjectForm
was renamed toPDFormXObject
to be more in line with the PDF specification.PDXObjectImage
was renamed toPDImageXObject
to be more in line with the PDF specification.PDPage.getContents().createInputStream()
was simplified toPDPage.getContents()
.PDPageContentStream
was moved toorg.apache.pdfbox.pdmodel
.
General Behaviour
PDFBox 2.0.0 is now parsing PDF files following the Xref information in the PDF. This is similar to the functionality usingPDDocument.loadNonSeq
with PDFBox 1.8.x. Users still using PDDocument.load
with PDFBox 1.8.x might experience differentresults when switching to PDFBox 2.0.0.
Font Handling
Font handling now has full Unicode support and supports font subsetting.
TrueType fonts shall now be loaded using
to leverage that.
PDAfmPfbFont
has been removed. To load such a font pass the pfb file to PDType1Font
. Loading the afm file is no longer required.
PDF Resources Handling
The individual calls to add resources such as PDResources.addFont(PDFont font)
and PDResources.addXObject(PDXObject xobject, String prefix)
have been replaced with PDResources.add(resource type)
where resource type
represents the different resource classes such as PDFont
, PDAbstractPattern
and so on. The add
method now supports all the different type of resources available.
Instead of returning a Map
like with PDResources.getFonts()
or PDResources.getXObjects()
in 2.0 an Iterable<COSName>
of references shall be retrieved with PDResources.getFontNames()
orPDResources.getXObjectNames()
. The individual item can be retrieved with PDResources.getFont(COSName fontName)
or PDResources.getXObject(COSName xObjectName)
.
IFreeUp provides iOS users with simple and quick solutions to clean junk files, free up storage space, optimize iOS performance, transfer all kinds of media files, import games and apps. Ifree up 1 0 0 free. May 14, 2015 Download iFreeUp for Mac - Keep your iOS devices in tip top shape with the help of this straightforward app that cleans junk files, optimizes their performance.
Working with Images
The individual classes PDJpeg()
, PDPixelMap()
and PDCCitt()
to import images have been replaced with PDImageXObject.createFromFile
which works for JPG, TIFF (only G4 compression), PNG, BMP and GIF.
In addition there are some specialized classes:
JPEGFactory.createFromStream
which preserve the JPEG data and embed it in the PDF file without modification. (This is best if you have a JPEG file).CCITTFactory.createFromFile
(for bitonal TIFF images with G4 compression).LosslessFactory.createFromImage
(this is best if you start with a BufferedImage).
Parsing the Page Content
Getting the content for a page has been simplified.
Prior to PDFBox 2.0 parsing the page content was done using
With PDFBox 2.0 the code is reduced to
In addition this also works if the page content is defined as an array of content streams.
Iterating Pages
With PDFBox 2.0.0 the prefered way to iterate through the pages of a document is
PDF Rendering
With PDFBox 2.0.0 PDPage.convertToImage
and PDFImageWriter
have been removed. Instead the new PDFRenderer
class shall be used.
ImageIOUtil
has been moved into the org.apache.pdfbox.tools.imageio
package. This is in the pdfbox-tools
download. If you are using maven, the artifactId
has the same name.
Important notice when using PDFBox with Java 8
Due to the change of the java color management module towards “LittleCMS”, users can experience slow performance in color operations.Solution: disable LittleCMS in favour of the old KCMS (Kodak Color Management System):
- start with
-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
or call System.setProperty('sun.java2d.cmm', 'sun.java2d.cmm.kcms.KcmsServiceProvider');
Sources:
http://www.subshell.com/en/subshell/blog/Wrong-Colors-in-Images-with-Java8-100.html
https://bugs.openjdk.java.net/browse/JDK-8041125
Since PDFBox 2.0.4
PDFBox 2.0.4 introduced a new command line setting
-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
which may improve the performance of rendering PDFs on some systems especially if there are a lot of images on a page.
PDF Printing
With PDFBox 2.0.0 PDFPrinter
has been removed.
Users of PDFPrinter.silentPrint()
should now use this code:
While users of PDFPrinter.print()
should now use this code:
Advanced use case examples can be found in th examples package under org/apache/pdfbox/examples/printing/Printing.java
Text Extraction
In 1.8, to get the text colors, one method was to pass an expanded .properties file to the PDFStripper constructor. To achieve the samein PDFBox 2.0 you can extend PDFTextStripper
and add the following Operators
to the constructor:
Dev Box 2.0 Free
Interactive Forms
Large parts of the support for interactive forms (AcroForms) have been rewritten. The most notable change from 1.8.x is thatthere is a clear distinction between fields and the annotations representing them visually. Intermediate nodes in a fieldtree are now represented by the PDNonTerminalField
class.
With PDFBox 2.0.0 the prefered way to iterate through the fields is now
Most PDField
subclasses now accept Java generic types such as String
as parameters instead of the former COSBase
subclasses.
PDField.getWidget() removed
As form fields do support multiple annotations PDField.getWidget()
has been removed in favour of PDField.getWidgets()
which returns allannotations associated with a field.
PDUnknownField removed
The PDUnknownField
class has been removed, such fields are treated as null
see PDFBOX-2885.
Document Outline
Devbot 2.0
The method PDOutlineNode.appendChild()
has been renamed to PDOutlineNode.addLast()
. There is now also a complementary method PDOutlineNode.addFirst()
.
Why was the ReplaceText example removed?
The ReplaceText example has been removed as it gave the incorrect illusion that text can be replaced easily.Words are often split, as seen by this excerpt of a content stream:
Other problems will appear with font subsets: for example, if only the glyphs for a, b and c are used,these would be encoded as hex 0, 1 and 2, so you won’t find “abc”. Additionally, you can’t replace “c” with “d” because it isn’t part of the subset.
Devox 2020
You could also have problems with ligatures, e.g. “ff”, “fl”, “fi”, “ffi”, “ffl”, which can be represented by a single code in many fonts.To understand this yourself, view any file with PDFDebugger and have a look at the “Contents” entry of a page.
Roborace Devbot 2.0
See also https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text