class CombinePDF::PDF
PDF
class is the PDF
object that can save itself to a file and that can be used as a container for a full PDF
file data, including version, information etc'.
PDF
objects can be used to combine or to inject data.
Combine/Merge PDF
files or Pages¶ ↑
To combine PDF
files (or data):
pdf = CombinePDF.new pdf << CombinePDF.load("file1.pdf") # one way to combine, very fast. pdf << CombinePDF.load("file2.pdf") pdf.save "combined.pdf"
or even a one liner:
(CombinePDF.load("file1.pdf") << CombinePDF.load("file2.pdf") << CombinePDF.load("file3.pdf")).save("combined.pdf")
you can also add just odd or even pages:
pdf = CombinePDF.new i = 0 CombinePDF.load("file.pdf").pages.each do |page| i += 1 pdf << page if i.even? end pdf.save "even_pages.pdf"
notice that adding all the pages one by one is slower then adding the whole file.
Add content to existing pages (Stamp / Watermark)¶ ↑
To add content to existing PDF
pages, first import the new content from an existing PDF
file. after that, add the content to each of the pages in your existing PDF
.
in this example, we will add a company logo to each page:
company_logo = CombinePDF.load("company_logo.pdf").pages[0] pdf = CombinePDF.load "content_file.pdf" pdf.pages.each {|page| page << company_logo} # notice the << operator is on a page and not a PDF object. pdf.save "content_with_logo.pdf"
Notice the << operator is on a page and not a PDF
object. The << operator acts differently on PDF
objects and on Pages.
The << operator defaults to secure injection by renaming references to avoid conflics. For overlaying pages using compressed data that might not be editable (due to limited filter support), you can use:
pdf.pages(nil, false).each {|page| page << stamp_page}
Page Numbering¶ ↑
adding page numbers to a PDF
object or file is as simple as can be:
pdf = CombinePDF.load "file_to_number.pdf" pdf.number_pages pdf.save "file_with_numbering.pdf"
numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values.
Loading PDF
data¶ ↑
Loading PDF
data can be done from file system or directly from the memory.
Loading data from a file is easy:
pdf = CombinePDF.load("file.pdf")
you can also parse PDF
files from memory:
pdf_data = IO.read 'file.pdf' # for this demo, load a file to memory pdf = CombinePDF.parse(pdf_data)
Loading from the memory is especially effective for importing PDF
data recieved through the internet or from a different authoring library such as Prawn.
Constants
- HASH_MERGE_NEW_NO_PAGE
@private JRuby Alternative this method reviews a Hash and updates it by merging Hash data, preffering the new over the old.
- POSSIBLE_NAME_TREES
- PRIVATE_HASH_KEYS
lists the Hash keys used for
PDF
objectsthe
CombinePDF
library doesn't use special classes for its objects (PDFPage class, PDFStream class or anything like that).there is only one
PDF
class which represents the whole of thePDF
file.this Hash lists the private Hash keys that the
CombinePDF
library uses to differentiate between complexPDF
objects.
Attributes
the form_data attribute is a Hash that corresponds to the PDF
form data (if any).
the info attribute is a Hash that sets the Info data for the PDF
. use, for example:
pdf.info[:Title] = "title"
Access the Names PDF
object Hash (or reference). Use with care.
the objects attribute is an Array containing all the PDF
sub-objects for te class.
Access the Outlines PDF
object Hash (or reference). Use with care.
set/get the PDF
version of the file (1.1-1.7) - shuold be type Float.
the viewer_preferences
attribute is a Hash that sets the ViewerPreferences data for the PDF
. use, for example:
pdf.viewer_preferences[:HideMenubar] = true
Public Class Methods
# File lib/combine_pdf/pdf_public.rb, line 89 def initialize(parser = nil) # default before setting @objects = [] @version = 0 @viewer_preferences = {} @info = {} parser ||= PDFParser.new('') raise TypeError, "initialization error, expecting CombinePDF::PDFParser or nil, but got #{parser.class.name}" unless parser.is_a? PDFParser @objects = parser.parse # remove any existing id's remove_old_ids # set data from parser @version = parser.version if parser.version.is_a? Float @info = parser.info_object || {} @names = parser.names_object || {} @forms_data = parser.forms_object || {} @outlines = parser.outlines_object || {} # rebuild the catalog, to fix wkhtmltopdf's use of static page numbers rebuild_catalog # general globals @set_start_id = 1 @info[:Producer] = "Ruby CombinePDF #{CombinePDF::VERSION} Library" @info.delete :CreationDate @info.delete :ModDate end
Public Instance Methods
add the pages (or file) to the PDF
(combine/merge) and RETURNS SELF, for nesting. for example:
pdf = CombinePDF.new "first_file.pdf" pdf << CombinePDF.new "second_file.pdf" pdf.save "both_files_merged.pdf"
# File lib/combine_pdf/pdf_public.rb, line 280 def <<(data) insert(-1, data) end
add the pages (or file) to the BEGINNING of the PDF
(combine/merge) and RETURNS SELF for nesting operators. for example:
pdf = CombinePDF.new "second_file.pdf" pdf >> CombinePDF.new "first_file.pdf" pdf.save "both_files_merged.pdf"
# File lib/combine_pdf/pdf_public.rb, line 293 def >>(data) insert 0, data end
Clears any existing form data.
# File lib/combine_pdf/pdf_public.rb, line 155 def clear_forms_data @forms_data.nil? || @forms_data.clear end
returns an array with the different fonts used in the file.
Type0 font objects ( “font == :Type0” ) can be registered with the font library for use in PDFWriter
objects (font numbering / table creation etc'). @param limit_to_type0 [true,false] limits the list to type0 fonts.
# File lib/combine_pdf/pdf_public.rb, line 256 def fonts(limit_to_type0 = false) fonts_array = [] pages.each do |pg| if pg[:Resources][:Font] pg[:Resources][:Font].values.each do |f| f = f[:referenced_object] if f[:referenced_object] if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f) fonts_array << f end end end end fonts_array end
add PDF
pages (or PDF
files) into a specific location.
returns the new pages Array! (unlike `#<<`, doesn't return self!)
- location
-
the location for the added page(s). Could be any number. negative numbers represent a count backwards (-1 being the end of the page array and 0 being the begining). if the location is beyond bounds, the pages will be added to the end of the
PDF
object (or at the begining, if the out of bounds was a negative number). - data
-
a
PDF
page, aPDF
file (CombinePDF.new
“filname.pdf”) or an array of pages (CombinePDF.new("filname.pdf").pages[0..3]
).
# File lib/combine_pdf/pdf_public.rb, line 303 def insert(location, data) pages_to_add = nil if data.is_a? PDF @version = [@version, data.version].max pages_to_add = data.pages actual_value(@names ||= {}.dup).update data.names, &HASH_MERGE_NEW_NO_PAGE merge_outlines((@outlines ||= {}.dup), actual_value(data.outlines), location) unless actual_value(data.outlines).empty? if actual_value(@forms_data) actual_value(@forms_data).update actual_value(data.forms_data), &HASH_MERGE_NEW_NO_PAGE if data.forms_data else @forms_data = data.forms_data end warn 'Form data might be lost when combining PDF forms (possible conflicts).' unless data.forms_data.nil? || data.forms_data.empty? elsif data.is_a?(Array) && (data.select { |o| !(o.is_a?(Hash) && o[:Type] == :Page) }).empty? pages_to_add = data elsif data.is_a?(Hash) && data[:Type] == :Page pages_to_add = [data] else warn "Shouldn't add objects to the file unless they are PDF objects or PDF pages (an Array or a single PDF page)." return false # return false, which will also stop any chaining. end # pages_to_add.map! {|page| page.copy } catalog = rebuild_catalog pages_array = catalog[:Pages][:referenced_object][:Kids] page_count = pages_array.length if location < 0 && (page_count + location < 0) location = 0 elsif location > 0 && (location > page_count) location = page_count end pages_array.insert location, pages_to_add pages_array.flatten! self end
adds a new page to the end of the PDF
object.
returns the new page object.
unless the media box is specified, it defaults to US Letter: [0, 0, 612.0, 792.0]
# File lib/combine_pdf/pdf_public.rb, line 121 def new_page(mediabox = [0, 0, 612.0, 792.0], _location = -1) p = PDFWriter.new(mediabox) insert(-1, p) p end
add page numbers to the PDF
For unicode text, a unicode font(s) must first be registered. the registered font(s) must supply the subset of characters used in the text. UNICODE IS AN ISSUE WITH THE PDF
FORMAT - USE CAUSION.
- options
-
a Hash of options setting the behavior and format of the page numbers:
-
:number_format a string representing the format for page number. defaults to ' - %s - ' (allows for letter numbering as well, such as “a”, “b”…).
-
:location an Array containing the location for the page numbers, can be :top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :bottom].
-
:start_at an Integer that sets the number for first page number. also accepts a letter (“a”) for letter numbering. defaults to 1.
-
:margin_from_height a number (
PDF
points) for the top and bottom margins. defaults to 45. -
:margin_from_side a number (
PDF
points) for the left and right margins. defaults to 15. -
:page_range a range of pages to be numbered (i.e. (2..-1) ) defaults to all the pages (nil). Remember to set the :start_at to the correct value.
the options Hash can also take all the options for {Page_Methods#textbox}. defaults to font: :Helvetica, font_size: 12 and no box (:border_width => 0, :box_color => nil).
# File lib/combine_pdf/pdf_public.rb, line 367 def number_pages(options = {}) opt = { number_format: ' - %s - ', start_at: 1, font: :Helvetica, margin_from_height: 45, margin_from_side: 15 } opt.update options opt[:location] ||= opt[:number_location] ||= opt[:stamp_location] ||= [:top, :bottom] opt[:location] = [opt[:location]] unless opt[:location].is_a? Array page_number = opt[:start_at] format_repeater = opt[:number_format].count('%') just_center = [:center] small_font_size = opt[:font_size] || 12 # some common computations can be done only once. from_height = opt[:margin_from_height] from_side = opt[:margin_from_side] left_position = from_side (opt[:page_range] ? pages[opt[:page_range]] : pages).each do |page| # Get page dimensions mediabox = page[:CropBox] || page[:MediaBox] || [0, 0, 595.3, 841.9] # set stamp text text = opt[:number_format] % (Array.new(format_repeater) { page_number }) if opt[:location].include? :center add_opt = {} if opt[:margin_from_height] && !opt[:height] && !opt[:y] add_opt[:height] = mediabox[3] - mediabox[1] - (2 * opt[:margin_from_height].to_f) add_opt[:y] = opt[:margin_from_height] end if opt[:margin_from_side] && !opt[:width] && !opt[:x] add_opt[:width] = mediabox[2] - mediabox[0] - (2 * opt[:margin_from_side].to_f) add_opt[:x] = opt[:margin_from_side] end page.textbox text, opt.merge(add_opt) end unless opt[:location] == just_center add_opt = { font_size: small_font_size }.merge(opt) # text = opt[:number_format] % page_number # compute locations for text boxes text_dimantions = Fonts.dimensions_of(text, opt[:font], small_font_size) box_width = text_dimantions[0] * 1.2 box_height = text_dimantions[1] * 2 page_width = mediabox[2] page_height = mediabox[3] add_opt[:width] ||= box_width add_opt[:height] ||= box_height center_position = (page_width - box_width) / 2 right_position = page_width - from_side - box_width top_position = page_height - from_height bottom_position = from_height + box_height if opt[:location].include? :top page.textbox text, { x: center_position, y: top_position }.merge(add_opt) end if opt[:location].include? :bottom page.textbox text, { x: center_position, y: bottom_position }.merge(add_opt) end if opt[:location].include? :top_left page.textbox text, { x: left_position, y: top_position, font_size: small_font_size }.merge(add_opt) end if opt[:location].include? :bottom_left page.textbox text, { x: left_position, y: bottom_position, font_size: small_font_size }.merge(add_opt) end if opt[:location].include? :top_right page.textbox text, { x: right_position, y: top_position, font_size: small_font_size }.merge(add_opt) end if opt[:location].include? :bottom_right page.textbox text, { x: right_position, y: bottom_position, font_size: small_font_size }.merge(add_opt) end end page_number = page_number.succ end end
this method returns all the pages cataloged in the catalog.
if no catalog is passed, it seeks the existing catalog(s) and searches for any registered Page objects.
Page objects are Hash class objects. the page methods are added using a mixin or inheritance.
- catalogs
-
a catalog, or an Array of catalog objects. defaults to the existing catalog.
# File lib/combine_pdf/pdf_public.rb, line 224 def pages(catalogs = nil) page_list = [] catalogs ||= get_existing_catalogs if catalogs.is_a?(Array) catalogs.each { |c| page_list.concat pages(c) unless c.nil? } elsif catalogs.is_a?(Hash) if catalogs[:is_reference_only] if catalogs[:referenced_object] page_list.concat pages(catalogs[:referenced_object]) else warn "couldn't follow reference!!! #{catalogs} not found!" end else case catalogs[:Type] when :Page page_list << catalogs when :Pages page_list.concat pages(catalogs[:Kids]) unless catalogs[:Kids].nil? when :Catalog page_list.concat pages(catalogs[:Pages]) unless catalogs[:Pages].nil? end end end page_list end
removes a PDF
page from the file and the catalog
returns the removed page.
returns nil if failed or if out of bounds.
- page_index
-
the page's index in the zero (0) based page array. negative numbers represent a count backwards (-1 being the end of the page array and 0 being the begining).
# File lib/combine_pdf/pdf_public.rb, line 345 def remove(page_index) catalog = rebuild_catalog pages_array = catalog[:Pages][:referenced_object][:Kids] removed_page = pages_array.delete_at page_index catalog[:Pages][:referenced_object][:Count] = pages_array.length removed_page end
Save the PDF
to file.
- file_name
-
is a string or path object for the output.
**Notice!** if the file exists, it WILL be overwritten.
# File lib/combine_pdf/pdf_public.rb, line 164 def save(file_name, options = {}) IO.binwrite file_name, to_pdf(options) end
This method stamps all (or some) of the pages is the PDF
with the requested stamp.
The method accept:
- stamp
-
either a String or a
PDF
page. If this is a String, you can add formating to add page numbering (i.e. “page number %i”). otherwise remember to escape any percent ('%') sign (i.e. “page %number not shown%”). - options
-
an options Hash.
If the stamp is a PDF
page, only :page_range and :underlay (to reverse-stamp) are valid options.
If the stamp is a String, than all the options used by {#number_pages} or {Page_Methods#textbox} can be used.
The default :location option is :center = meaning the stamp will be stamped all across the page unless the :x, :y, :width or :height options are specified.
# File lib/combine_pdf/pdf_public.rb, line 458 def stamp_pages(stamp, options = {}) case stamp when String options[:location] ||= [:center] number_pages({ number_format: stamp }.merge(options)) when Page_Methods # stamp = stamp.copy(true) if options[:underlay] (options[:page_range] ? pages[options[:page_range]] : pages).each { |p| p >> stamp } else (options[:page_range] ? pages[options[:page_range]] : pages).each { |p| p << stamp } end else raise TypeError, 'expecting a String or a PDF page as the stamp.' end end
get the title for the pdf The title is stored in the information dictionary and isn't required
# File lib/combine_pdf/pdf_public.rb, line 129 def title @info[:Title] end
set the title for the pdf The title is stored in the information dictionary and isn't required
- new_title
-
a string that is the new author value.
# File lib/combine_pdf/pdf_public.rb, line 136 def title=(new_title = nil) @info[:Title] = new_title end
Formats the data to PDF
formats and returns a binary string that represents the PDF
file content.
This method is used by the save(file_name) method to save the content to a file.
use this to export the PDF
file without saving to disk (such as sending through HTTP ect').
# File lib/combine_pdf/pdf_public.rb, line 173 def to_pdf(options = {}) # reset version if not specified @version = 1.5 if @version.to_f == 0.0 # set info for merged file @info[:ModDate] = @info[:CreationDate] = Time.now.strftime "D:%Y%m%d%H%M%S%:::z'00" @info[:Subject] = options[:subject] if options[:subject] @info[:Producer] = options[:producer] if options[:producer] # rebuild_catalog catalog = rebuild_catalog_and_objects # add ID and generation numbers to objects renumber_object_ids out = [] xref = [] indirect_object_count = 1 # the first object is the null object # write head (version and binanry-code) out << "%PDF-#{@version}\n%\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x00".force_encoding(Encoding::ASCII_8BIT) # collect objects and set xref table locations loc = 0 out.each { |line| loc += line.bytesize + 1 } @objects.each do |o| indirect_object_count += 1 xref << loc out << object_to_pdf(o) loc += out.last.bytesize + 1 end xref_location = loc # xref_location = 0 # out.each { |line| xref_location += line.bytesize + 1} out << "xref\n0 #{indirect_object_count}\n0000000000 65535 f \n" xref.each { |offset| out << (out.pop + ("%010d 00000 n \n" % offset)) } out << out.pop + 'trailer' out << "<<\n/Root #{false || "#{catalog[:indirect_reference_id]} #{catalog[:indirect_generation_number]} R"}" out << "/Size #{indirect_object_count}" out << "/Info #{@info[:indirect_reference_id]} #{@info[:indirect_generation_number]} R" out << ">>\nstartxref\n#{xref_location}\n%%EOF" # when finished, remove the numbering system and keep only pointers remove_old_ids # output the pdf stream out.join("\n".force_encoding(Encoding::ASCII_8BIT)).force_encoding(Encoding::ASCII_8BIT) end
Protected Instance Methods
@private Some PDF
objects contain references to other PDF
objects.
this function adds the references contained in these objects.
this is used for internal operations, such as injectng data using the << operator.
# File lib/combine_pdf/pdf_protected.rb, line 21 def add_referenced() # an existing object map resolved = {}.dup existing = {}.dup should_resolve = [].dup #set all existing objects as resolved and register their children for future resolution @objects.each { |obj| existing[obj] = obj ; resolved[obj.object_id] = obj; should_resolve << obj.values} # loop until should_resolve is empty while should_resolve.any? obj = should_resolve.pop next if resolved[obj.object_id] # the object exists if obj.is_a?(Hash) referenced = obj[:referenced_object] if referenced && referenced.any? tmp = resolved[referenced.object_id] if !tmp && referenced[:raw_stream_content] tmp = existing[referenced[:raw_stream_content]] # Avoid endless recursion by limiting it to a number of layers (default == 2) tmp = nil unless equal_layers(tmp, referenced) end if tmp obj[:referenced_object] = tmp else resolved[obj.object_id] = referenced # existing[referenced] = referenced existing[referenced[:raw_stream_content]] = referenced should_resolve << referenced @objects << referenced end else resolved[obj.object_id] = obj obj.keys.each { |k| should_resolve << obj[k] unless !obj[k].is_a?(Enumerable) || resolved[obj[k].object_id] } end elsif obj.is_a?(Array) resolved[obj.object_id] = obj should_resolve.concat obj end end resolved.clear existing.clear end
# File lib/combine_pdf/pdf_protected.rb, line 165 def get_existing_catalogs (@objects.select { |obj| obj.is_a?(Hash) && obj[:Type] == :Catalog }) || (@objects.select { |obj| obj.is_a?(Hash) && obj[:Type] == :Page }) end
Merges 2 outlines by appending one to the end or start of the other. old_data - the main outline, which is also the one that will be used in the resulting PDF
. new_data - the outline to be appended position - an integer representing the position where a PDF
is being inserted.
This method only differentiates between inserted at the beginning, or not. Not at the beginning, means the new outline will be added to the end of the original outline.
An outline base node (tree base) has :Type, :Count, :First, :Last Every node within the outline base node's :First or :Last can have also have the following pointers to other nodes: :First or :Last (only if the node has a subtree / subsection) :Parent (the node's parent) :Prev, :Next (previous and next node) Non-node-pointer data in these nodes: :Title - the node's title displayed in the PDF
outline :Count - Number of nodes in it's subtree (0 if no subtree) :Dest - node link destination (if the node is linking to something)
# File lib/combine_pdf/pdf_protected.rb, line 292 def merge_outlines(old_data, new_data, position) old_data = actual_object(old_data) new_data = actual_object(new_data) if old_data.nil? || old_data.empty? || old_data[:First].nil? # old_data is a reference to the actual object, # so if we update old_data, we're done, no need to take any further action old_data.update new_data elsif new_data.nil? || new_data.empty? || new_data[:First].nil? return old_data else new_data = new_data.dup # avoid old data corruption # number of outline nodes, after the merge old_data[:Count] = old_data[:Count].to_i + new_data[:Count].to_i # walk the Hash here ... # I'm just using the start / end insert-position for now... # first - is going to be the start of the outline base node's :First, after the merge # last - is going to be the end of the outline base node's :Last, after the merge # median - the start of what will be appended to the end of the outline base node's :First # parent - the outline base node of the resulting merged outline # FIXME implement the possibility to insert somewhere in the middle of the outline prev = nil pos = first = actual_object((position.nonzero? ? old_data : new_data)[:First]) last = actual_object((position.nonzero? ? new_data : old_data)[:Last]) median = { is_reference_only: true, referenced_object: actual_object((position.nonzero? ? new_data : old_data)[:First]) } old_data[:First] = { is_reference_only: true, referenced_object: first } old_data[:Last] = { is_reference_only: true, referenced_object: last } parent = { is_reference_only: true, referenced_object: old_data } while pos # walking through old_data here and updating the :Parent as we go, # this updates the inserted new_data :Parent's as well once it is appended and the # loop keeps walking the appended data. pos[:Parent] = parent if pos[:Parent] # connect the two outlines # if there is no :Next, the end of the outline base node's :First is reached and this is # where the new data gets appended, the same way you would append to a two-way linked list. if pos[:Next].nil? median[:referenced_object][:Prev] = { is_reference_only: true, referenced_object: prev } if median pos[:Next] = median # midian becomes 'nil' because this loop keeps going after the appending is done, # to update the parents of the appended tree and we wouldn't want to keep appending it infinitely. median = nil end # iterating over the outlines main nodes (this is not going into subtrees) # while keeping every rotations previous node saved prev = pos pos = actual_object(pos[:Next]) end # make sure the last object doesn't have the :Next and the first no :Prev property prev.delete :Next actual_object(old_data[:First]).delete :Prev end end
Deprecation Notice
# File lib/combine_pdf/pdf_protected.rb, line 140 def names_object puts "CombinePDF Deprecation Notice: the protected method `names_object` will be deprecated in the upcoming version. Use `names` instead." @names end
# File lib/combine_pdf/pdf_protected.rb, line 145 def outlines_object puts "CombinePDF Deprecation Notice: the protected method `outlines_object` will be deprecated in the upcoming version. Use `oulines` instead." @outlines end
Prints the whole outline hash to a file, with basic indentation and replacing raw streams with “RAW STREAM” (subbing doesn't allways work that great for big streams) outline - outline hash file - “filename.filetype” string
# File lib/combine_pdf/pdf_protected.rb, line 350 def print_outline_to_file(outline, file) outline_subbed_str = outline.to_s.gsub(/\:raw_stream_content=\>"(?:(?!"}).)*+"\}\}/, ':raw_stream_content=> RAW STREAM}}') brace_cnt = 0 formatted_outline_str = '' outline_subbed_str.each_char do |c| if c == '{' formatted_outline_str << "\n" << "\t" * brace_cnt << c brace_cnt += 1 elsif c == '}' brace_cnt -= 1 brace_cnt = 0 if brace_cnt < 0 formatted_outline_str << c << "\n" << "\t" * brace_cnt elsif c == '\n' formatted_outline_str << c << "\t" * brace_cnt else formatted_outline_str << c end end formatted_outline_str << "\n" * 10 File.open(file, 'w') { |f| f.write(formatted_outline_str) } end
@private
# File lib/combine_pdf/pdf_protected.rb, line 64 def rebuild_catalog(*with_pages) # # build page list v.1 Slow but WORKS # # Benchmark testing value: 26.708394 # old_catalogs = @objects.select {|obj| obj.is_a?(Hash) && obj[:Type] == :Catalog} # old_catalogs ||= [] # page_list = [] # PDFOperations._each_object(old_catalogs,false) { |p| page_list << p if p.is_a?(Hash) && p[:Type] == :Page } # build page list v.2 faster, better, and works # Benchmark testing value: 0.215114 page_list = pages # add pages to catalog, if requested page_list.concat(with_pages) unless with_pages.empty? # duplicate any non-unique pages - This is a special case to resolve Adobe Acrobat Reader issues (see issues #19 and #81) uniqueness = {}.dup page_list.each { |page| page = page[:referenced_object] || page; page = page.dup if uniqueness[page.object_id]; uniqueness[page.object_id] = page } page_list.clear page_list = uniqueness.values uniqueness.clear # build new Pages object page_object_kids = [].dup pages_object = { Type: :Pages, Count: page_list.length, Kids: page_object_kids } pages_object_reference = { referenced_object: pages_object, is_reference_only: true } page_list.each { |pg| pg[:Parent] = pages_object_reference; page_object_kids << ({ referenced_object: pg, is_reference_only: true }) } # rebuild/rename the names dictionary rebuild_names # build new Catalog object catalog_object = { Type: :Catalog, Pages: { referenced_object: pages_object, is_reference_only: true } } # pages_object[:Parent] = { referenced_object: catalog_object, is_reference_only: true } # causes AcrobatReader to fail catalog_object[:ViewerPreferences] = @viewer_preferences unless @viewer_preferences.empty? # point old Pages pointers to new Pages object ## first point known pages objects - enough? pages.each { |p| p[:Parent] = { referenced_object: pages_object, is_reference_only: true } } ## or should we, go over structure? (fails) # each_object {|obj| obj[:Parent][:referenced_object] = pages_object if obj.is_a?(Hash) && obj[:Parent].is_a?(Hash) && obj[:Parent][:referenced_object] && obj[:Parent][:referenced_object][:Type] == :Pages} # # remove old catalog and pages objects # @objects.reject! { |obj| obj.is_a?(Hash) && (obj[:Type] == :Catalog || obj[:Type] == :Pages) } # remove old objects list and trees @objects.clear # inject new catalog and pages objects @objects << @info if @info @objects << catalog_object # @objects << pages_object # rebuild/rename the forms dictionary if @forms_data.nil? || @forms_data.empty? @forms_data = nil else @forms_data = { referenced_object: (@forms_data[:referenced_object] || @forms_data), is_reference_only: true } catalog_object[:AcroForm] = @forms_data @objects << @forms_data[:referenced_object] end # add the names dictionary if @names && @names.length > 1 @objects << @names catalog_object[:Names] = { referenced_object: @names, is_reference_only: true } end # add the outlines dictionary if @outlines && @outlines.any? @objects << @outlines catalog_object[:Outlines] = { referenced_object: @outlines, is_reference_only: true } end catalog_object end
@private this is an alternative to the rebuild_catalog
catalog method this method is used by the to_pdf
method, for streamlining the PDF
output. there is no point is calling the method before preparing the output.
# File lib/combine_pdf/pdf_protected.rb, line 157 def rebuild_catalog_and_objects catalog = rebuild_catalog catalog[:Pages][:referenced_object][:Kids].each { |e| @objects << e[:referenced_object]; e[:referenced_object] } # adds every referenced object to the @objects (root), addition is performed as pointers rather then copies add_referenced() catalog end
# File lib/combine_pdf/pdf_protected.rb, line 187 def rebuild_names(name_tree = nil, base = 'CombinePDF_0000000') if name_tree return nil unless name_tree.is_a?(Hash) name_tree = name_tree[:referenced_object] || name_tree dic = [] # map a names tree and return a valid name tree. Do not recourse. should_resolve = [name_tree[:Kids], name_tree[:Names]] resolved = [].to_set while should_resolve.any? pos = should_resolve.pop if pos.is_a? Array next if resolved.include?(pos.object_id) if pos[0].is_a? String (pos.length / 2).times do |i| dic << (pos[i * 2].clear << base.next!) pos[(i * 2) + 1][0] = {is_reference_only: true, referenced_object: pages[pos[(i * 2) + 1][0]]} if(pos[(i * 2) + 1].is_a?(Array) && pos[(i * 2) + 1][0].is_a?(Numeric)) dic << (pos[(i * 2) + 1].is_a?(Array) ? { is_reference_only: true, referenced_object: { indirect_without_dictionary: pos[(i * 2) + 1] } } : pos[(i * 2) + 1]) # dic << pos[(i * 2) + 1] end else should_resolve.concat pos end elsif pos.is_a? Hash pos = pos[:referenced_object] || pos next if resolved.include?(pos.object_id) should_resolve << pos[:Kids] if pos[:Kids] should_resolve << pos[:Names] if pos[:Names] end resolved << pos.object_id end return { referenced_object: { Names: dic }, is_reference_only: true } end @names ||= @names[:referenced_object] new_names = { Type: :Names }.dup POSSIBLE_NAME_TREES.each do |ntree| if @names[ntree] new_names[ntree] = rebuild_names(@names[ntree], base) @names[ntree].clear end end @names.clear @names = new_names end
# File lib/combine_pdf/pdf_protected.rb, line 181 def remove_old_ids @objects.each { |obj| obj.delete(:indirect_reference_id); obj.delete(:indirect_generation_number) } end
end @private
# File lib/combine_pdf/pdf_protected.rb, line 171 def renumber_object_ids(start = nil) @set_start_id = start || @set_start_id start = @set_start_id # history = {} @objects.each do |obj| obj[:indirect_reference_id] = start start += 1 end end
Private Instance Methods
# File lib/combine_pdf/pdf_protected.rb, line 374 def equal_layers obj1, obj2, layer = CombinePDF.eq_depth_limit return true if obj1.object_id == obj2.object_id if obj1.is_a? Hash return false unless obj2.is_a? Hash return false unless obj1.length == obj2.length keys = obj1.keys; keys2 = obj2.keys; return false if (keys - keys2).any? || (keys2 - keys).any? return (warn("CombinePDF nesting limit reached") || true) if(layer == 0) keys.each {|k| return false unless equal_layers( obj1[k], obj2[k], layer-1) } elsif obj1.is_a? Array return false unless obj2.is_a? Array return false unless obj1.length == obj2.length (obj1-obj2).any? || (obj2-obj1).any? else obj1 == obj2 end end
# File lib/combine_pdf/pdf_protected.rb, line 403 def rename_object(object, _dictionary) case object when Array object.length.times { |i| } when Hash end end
# File lib/combine_pdf/pdf_protected.rb, line 393 def renaming_dictionary(object = nil, dictionary = {}) object ||= @names case object when Array object.length.times { |i| object[i].is_a?(String) ? (dictionary[object[i]] = (dictionary.last || 'Random_0001').next) : renaming_dictionary(object[i], dictionary) } when Hash object.values.each { |v| renaming_dictionary v, dictionary } end end