module CombinePDF
This is a pure ruby library to combine/merge, stmap/overlay and number PDF
files - as well as to create tables (ment for indexing combined files).
You can also use this library for writing basic text content into new or existing PDF
files (For authoring new PDF
files look at the Prawn ruby library).
here is the most basic application for the library, a one-liner that combines the PDF
files and saves them:
(CombinePDF.new("file1.pdf") << CombinePDF.new("file2.pdf") << CombinePDF.new("file3.pdf")).save("combined.pdf")
Loading PDF
data¶ ↑
Loading PDF
data can be done from file system or directly from the memory.
Load data from a file:
pdf = CombinePDF.load("file.pdf")
parse PDF
files from memory:
pdf = CombinePDF.parse(pdf_data)
Combine/Merge PDF
files or Pages¶ ↑
To combine PDF
files (or data):
pdf = CombinePDF.new pdf << CombinePDF.load("file1.pdf") pdf << CombinePDF.load("file2.pdf") pdf.save "combined.pdf"
It is possible to add only specific pages. in this example, only even pages will be added:
pdf = CombinePDF.new i = 0 CombinePDF.load("file.pdf").pages.each do |page| i += 1 pdf << page if i.even? end pdf.save "even_pages.pdf"
Notice that adding the whole file is faster then adding each page seperately.
Add content to existing pages (Stamp / Watermark)¶ ↑
It is possible “stamp” one PDF
page using another PDF
page. In this example, a company logo will be stamped over each page:
company_logo = CombinePDF.load("company_logo.pdf").pages[0] pdf = CombinePDF.load "content_file.pdf" pdf.pages.each {|page| page << company_logo} pdf.save "content_with_logo.pdf"
Notice the << operator is on a page and not a PDF
object. The << operator acts differently on PDF
objects and on Pages.
Page Numbering¶ ↑
It is possible to number the pages. in this example we will add very simple numbering:
pdf = CombinePDF.load "file_to_number.pdf" pdf.number_pages pdf.save "file_with_numbering.pdf"
numbering can be done with many different options, with different formating, with or without a box object, different locations on each page and even with opacity values.
Writing Content¶ ↑
page numbering actually adds content using the PDFWriter
object (a very basic writer).
in this example, all the PDF
pages will be stamped, along the top, with a red box, with blue text, stating “Draft, page #”. here is the easy way (we can even use “number_pages” without page numbers, if we wish):
pdf = CombinePDF.load "file_to_stamp.pdf" pdf.number_pages number_format: " - Draft, page %d - ", number_location: [:top], font_color: [0,0,1], box_color: [0.4,0,0], opacity: 0.75, font_size:16 pdf.save "draft.pdf"
in this example we will add a first page with the word “Draft”, in red over a colored background:
pdf = CombinePDF.load "file.pdf" pdf_first_page = pdf.pages[0] mediabox = page[:CropBox] || page[:MediaBox] #copy page size title_page = CombinePDF.create_page mediabox #make title page same size as first page title_page.textbox "DRAFT", font_color: [0.8,0,0], font_size: :fit_text, box_color: [1,0.8,0.8], opacity: 1 pdf >> title_page # the >> operator adds pages at the begining pdf.save "draft.pdf"
font support for the writer is still in the works and is limited to extracting know fonts by location of the 14 standard fonts.
Resizing pages¶ ↑
Using the {www.prepressure.com/library/paper-size PDF
standards for page sizes}, it is now possible to resize existing PDF
pages, as well as stretch and shrink their content to the new size.
pdf = CombinePDF.load "file.pdf" a4_size = [0, 0, 595, 842] # keep aspect ratio intact pdf.pages.each {|p| p.resize a4_size} pdf.save "a4.pdf" pdf = CombinePDF.load "file.pdf" a4_squared = [0, 0, 595, 595] # stretch or shrink content to fit new size pdf.pages.each {|p| p.resize a4_squared, false} pdf.save "square.pdf"
Decryption & Filters¶ ↑
Some PDF
files are encrypted and some are compressed (the use of filters)… not all files can be opened, merged, stamped or used and stamps.
Comments and file structure¶ ↑
If you want to help with the code, please be aware:
The code itself should be very straight forward, but feel free to ask whatever you want.
Credit¶ ↑
Caige Nichols wrote an amazing RC4 gem which I reference in my code. Credit to his wonderful is given here. Please respect his license and copyright… and mine.
License¶ ↑
MIT
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Thoughts from reading the ISO 32000-1:2008 this file is part of the CombinePDF
library and the code is subject to the same license.
Constants
- ParsingError
- VERSION
Public Instance Methods
calculate a CTM value for a specific transformation.
this could be used to apply transformation in textbox and to convert visual rotation values into actual rotation transformation.
this method accepts a Hash containing any of the following parameters:
- deg
-
the clockwise rotation to be applied, in degrees
- tx
-
the x translation to be applied.
- ty
-
the y translation to be applied.
- sx
-
the x scaling to be applied.
- sy
-
the y scaling to be applied.
-
scaling will be applied after the transformation is applied.
# File lib/combine_pdf/api.rb, line 119 def calc_ctm(parameters) p = { deg: 0, tx: 0, ty: 0, sx: 1, sy: 1 }.merge parameters r = p[:deg] * Math::PI / 180 s = Math.sin(r) c = Math.cos(r) # start with tranlation matrix m = Matrix[[1, 0, 0], [0, 1, 0], [p[:tx], p[:ty], 1]] # then rotate m *= Matrix[[c, s, 0], [-s, c, 0], [0, 0, 1]] if parameters[:deg] # then scale m *= Matrix[[p[:sx], 0, 0], [0, p[:sy], 0], [0, 0, 1]] if parameters[:sx] || parameters[:sy] # flaten array and round to 6 digits m.to_a.flatten.values_at(0, 1, 3, 4, 6, 7).map! { |f| f.round 6 } end
makes a PDFWriter
object
PDFWriter
objects reresent an empty page and have the method “textbox” that adds content to that page.
PDFWriter
objects are used internally for numbering pages (by creating a PDF
page with the page number and “stamping” it over the existing page).
::mediabox an Array representing the size of the PDF
document. defaults to: [0.0, 0.0, 612.0, 792.0] (US Letter)
if the page is PDFWriter
object as a stamp, the final size will be that of the original page.
# File lib/combine_pdf/api.rb, line 53 def create_page(mediabox = [0, 0, 612.0, 792.0]) PDFWriter.new mediabox end
makes a PDF
object containing a table
all the pages in this PDF
object are PDFWriter
objects and are writable using the texbox function (should you wish to add a title, or more info)
the main intended use of this method is to create indexes (a table of contents) for merged data.
example:
pdf = CombinePDF.create_table headers: ["header 1", "another header"], table_data: [ ["this is one row", "with two columns"] , ["this is another row", "also two columns", "the third will be ignored"] ] pdf.save "table_file.pdf"
accepts a Hash with any of the following keys as well as any of the Page_Methods#textbox
options:
- headers
-
an Array of strings with the headers (will be repeated every page).
- table_data
-
as Array of Arrays, each containing a string for each column. the first row sets the number of columns. extra columns will be ignored.
- font
-
a registered or standard font name (see
Page_Methods
). defaults to nil (:Helvetica). - header_font
-
a registered or standard font name for the headers (see
Page_Methods
). defaults to nil (the font for all the table rows). - max_font_size
-
the maximum font size. if the string doesn't fit, it will be resized. defaults to 14.
- column_widths
-
an array of relative column widths ([1,2] will display only the first two columns, the second twice as big as the first). defaults to nil (even widths).
- header_color
-
the header color. defaults to [0.8, 0.8, 0.8] (light gray).
- main_color
-
main row color. defaults to nil (transparent / white).
- alternate_color
-
alternate row color. defaults to [0.95, 0.95, 0.95] (very light gray).
- font_color
-
font color. defaults to [0,0,0] (black).
- border_color
-
border color. defaults to [0,0,0] (black).
- border_width
-
border width in
PDF
units. defaults to 1. - header_align
-
the header text alignment within each column (:right, :left, :center). defaults to :center.
- row_align
-
the row text alignment within each column. defaults to :left (:right for RTL table).
- direction
-
the table's writing direction (:ltr or :rtl). this reffers to the direction of the columns and doesn't effect text (rtl text is automatically recognized). defaults to :ltr.
- max_rows
-
the number of rows per page, INCLUDING the header row. deafults to 25.
- page_size
-
the size of the page in
PDF
points. defaults to [0, 0, 595.3, 841.9] (A4).
# File lib/combine_pdf/api.rb, line 86 def create_table(options = {}) options[:max_rows] = options[:rows_per_page] if options[:rows_per_page] page_size = options[:page_size] || [0, 0, 595.3, 841.9] table = PDF.new page = nil until options[:table_data].empty? page = create_page page_size page.write_table options table << page end table end
Gets the equality depth limit. This is the point at which CombinePDF
will stop testing for nested items being equal.
# File lib/combine_pdf/api.rb, line 171 def eq_depth_limit @eq_depth_limit end
Sets the equality depth limit. This is the point at which CombinePDF
will stop testing for nested items being equal.
# File lib/combine_pdf/api.rb, line 175 def eq_depth_limit= value @eq_depth_limit = value end
Create an empty PDF
object or create a PDF
object from a file (parsing the file).
- file_name
-
is the name of a file to be parsed.
# File lib/combine_pdf/api.rb, line 7 def load(file_name = '', options = {}) raise TypeError, "couldn't parse data, expecting type String" unless file_name.is_a?(String) || file_name.is_a?(Pathname) return PDF.new if file_name == '' PDF.new(PDFParser.new(IO.read(file_name, mode: 'rb').force_encoding(Encoding::ASCII_8BIT), options)) end
creats a new PDF
object.
Combine PDF
will check to see if `string` is a filename. If it's a file name, it will attempt to load the PDF
file using `CombinePDF.load`. Otherwise it will attempt parsing `string` using `CombinePDF.parse`.
If the string is empty it will return a new PDF
object (the same as parse).
For both performance and code readability reasons, `CombinePDF.load` and `CombinePDF.parse` should be preffered unless creating a new PDF
object.
# File lib/combine_pdf/api.rb, line 21 def new(string = false) return PDF.new unless string raise TypeError, "couldn't create PDF object, expecting type String" unless string.is_a?(String) || string.is_a?(Pathname) begin (begin File.file? string rescue false end) ? load(string) : parse(string) rescue => _e raise 'General PDF error - Use CombinePDF.load or CombinePDF.parse for a non-general error message (the requested file was not found OR the string received is not a valid PDF stream OR the file was found but not valid).' end end
# File lib/combine_pdf/api.rb, line 100 def new_table(options = {}) create_table options end
Create a PDF
object from a raw PDF
data (parsing the data).
- data
-
is a string that represents the content of a
PDF
file.
# File lib/combine_pdf/api.rb, line 37 def parse(data, options = {}) raise TypeError, "couldn't parse and data, expecting type String" unless data.is_a? String PDF.new(PDFParser.new(data, options)) end
adds an existing font (from any PDF
Object) to the font library.
returns the font on success or false on failure.
example:
fonts = CombinePDF.new("japanese_fonts.pdf").fonts(true) CombinePDF.register_font_from_pdf_object :david, fonts[0]
VERY LIMITTED SUPPORT:
-
at the moment it only imports Type0 fonts.
-
also, to extract the Hash of the actual font object you were looking for, is not a trivial matter. I do it on the console.
- font_name
-
a Symbol with the name of the font registry. if the fonts exists in the library, it will be overwritten!
- font_object
-
a Hash in the internal format recognized by
CombinePDF
, that represents the font object.
# File lib/combine_pdf/api.rb, line 162 def register_existing_font(font_name, font_object) Fonts.register_font_from_pdf_object font_name, font_object end
adds a correctly formatted font object to the font library.
registered fonts will remain in the library and will only be embeded in PDF
objects when they are used by PDFWriter
objects (for example, for numbering pages).
this function enables plug-ins to expend the font functionality of CombinePDF
.
- font_name
-
a Symbol with the name of the font. if the fonts exists in the library, it will be overwritten!
- font_metrics
-
a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where char == character itself (i.e. “ ” for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
- font_pdf_object
-
a Hash in the internal format recognized by
CombinePDF
, that represents the font object. - font_cmap
-
a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {“a” => “61”, “z” => “7a” }).
# File lib/combine_pdf/api.rb, line 145 def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil) Fonts.register_font font_name, font_metrics, font_pdf_object, font_cmap end
# File lib/combine_pdf/api.rb, line 166 def register_font_from_pdf_object(font_name, font_object) register_existing_font font_name, font_object end