class Oxcelix::Workbook
A class that represents an Excel workbook. By default, it will open the excel file, and convert it to a collection of Matrix
objects @!attribute [rw] sheets
@return [Array] a collection of {Sheet} objects
Attributes
Public Class Methods
Create a new {Workbook} object.
filename is the name of the Excel 2007/2010 file (xlsx) to be opened (Optional)
options is a collection of options that can be passed to Workbook
. Options may include:
-
:copymerge (=> true/false) - Copy and repeat the content of the merged cells into the whole group, e.g.
the group of three merged cells | a |
will become: |a|a|a|
-
:include (Array) - an array of sheet names to be included
-
:exclude (Array) - an array of sheet names not to be processed
-
:paginate (Array) - an array that defines the number of lines to be included in the pagination and the page to be parsed
-
:cellrange (Range) - the range of cells to be included in parsing
If a filename gets passed, the excel file is first getting unzipped, then the workbook.xml file gets processed. This file stores sheet metadata, which will be filtered (by including and excluding sheets from further processing)
The next stage is building sheets. This includes:
-
Parsing the XML files representing the sheets
-
Interpolation of the shared strings
-
adding comments to the cells
-
Converting each sheet to a
Matrix
object -
Deleting the temporary directory that stores the XML files.
# File lib/oxcelix/workbook.rb, line 53 def initialize(filename=nil, options={}) @sheets = [] @sheetbase = {} @sharedstrings = [] unless filename.nil? unpack filename open options parse options FileUtils.remove_dir(@destination, true) end end
Public Instance Methods
Parses workbook metadata (sheet data, comments, shared strings) @param [Hash] options Options affecting file opening, metadata collection and processing.
# File lib/oxcelix/workbook.rb, line 85 def open(options={}) f = IO.read(@destination + '/xl/workbook.xml') a = Ox::load(f) sheetdata(a, options); commentsrel; shstrings; @styles = Styles.new() File.open(@destination + '/xl/styles.xml', 'r') do |f| Ox.sax_parse(@styles, f) end @styles.temparray.sort_by!{|st| st[:numFmtId].to_i} add_custom_formats @styles.temparray @styles.styleary.map!{|s| Numformats::Formatarray[s.to_i][:id].to_i} end
Parses sheet data by feeding the output of the Xlsheet
SAX parser into the arrays representing the sheets. @param [Hash] options Options that affect the parser.
# File lib/oxcelix/workbook.rb, line 103 def parse(options={}) @sheets.each do |x| if !options[:paginate].nil? lines = options[:paginate][0] page = options[:paginate][1] sheet = PagSheet.new(lines, page) elsif !options[:cellrange].nil? range = options[:cellrange] sheet = Cellrange.new(range) else sheet = Xlsheet.new() end File.open(@destination+"/xl/#{x[:filename]}", 'r') do |f| Ox.sax_parse(sheet, f) end comments = mkcomments(x[:comments]) sheet.cellarray.each do |sh| sh.numformat = @styles.styleary[sh.style.to_i] if sh.type == "s" sh.value = @sharedstrings[sh.value.to_i] end if !comments.nil? comm = comments.select{|c| c[:ref] == (sh.xlcoords) } if comm.size > 0 sh.comment = comm[0][:comment] end comments.delete_if{ |c| c[:ref] == (sh.xlcoords) } end end x[:cells] = sheet.cellarray x[:mergedcells] = sheet.mergedcells end matrixto options end
Unzips the excel file to a temporary directory. The directory will be removed at the end of the parsing stage when invoked by initialize, otherwise at exit. @param [String] filename the name of the Excel file to be unpacked
# File lib/oxcelix/workbook.rb, line 72 def unpack(filename) @destination = Dir.mktmpdir Zip::File.open(filename){ |zip_file| zip_file.each{ |f| f_path = File.join(@destination, f.name) FileUtils.mkdir_p(File.dirname(f_path)) zip_file.extract(f, f_path) unless File.exists?(f_path) } } end
Private Instance Methods
buildsheet creates a matrix of the needed size and fills it with the cells. Mainly for internal use only. When paginating or parsing only a range of cells, the size of the matrix will be adjusted (no nil values will be left at the beginning of the sheet), to preserve memory. @param [Sheet] sheet the actual sheetarray. @param [Hash] options :paginate or :cellrange will affect the size of the matrix @return [Sheet] a Sheet
object that stores the cell values.
# File lib/oxcelix/workbook.rb, line 292 def buildsheet(sheet, options) ydiff, xdiff = 0,0 if !options[:paginate].nil? ydiff = options[:paginate][0] * (options[:paginate][1]-1) elsif !options[:cellrange].nil? xdiff = x(options[:cellrange].begin) ydiff = y(options[:cellrange].begin) end m = Sheet.build(sheet[:cells].last.y+1-ydiff, sheet[:cells].last.x+1-xdiff) {nil} sheet[:cells].each do |c| m[c.y-ydiff, c.x-xdiff] = c end return m end
Build the relationship between sheets and the XML files storing the comments to the actual sheet.
# File lib/oxcelix/workbook.rb, line 193 def commentsrel unless Dir[@destination + '/xl/worksheets/_rels'].empty? Find.find(@destination + '/xl/worksheets/_rels') do |path| if File.basename(path).split(".").last == 'rels' a = IO.read(path) f = Ox::load(a) f.locate("Relationships/*").each do |x| if x[:Target].include?"comments" @sheets.each do |s| if "worksheets/" + File.basename(path,".rels") == s[:filename] s[:comments] = x[:Target] end end end end end end else @sheets.each do |s| s[:comments] = nil end end end
Returns an array of Matrix
objects. For each sheet, matrixto first checks the address (xlcoords) of the last cell in the cellarray, then builds a nil-filled Matrix
object of size *xlcoords.x, xlcoords.y*.
The matrix will then be filled with Cell
objects according to their coordinates.
If the copymerge parameter is true, it creates a submatrix (minor) of every mergegroup (based on the mergedcells array relative to the actual sheet), and after the only meaningful cell of the minor is found, it is copied back to the remaining cells of the group. The coordinates (xlcoords) of each copied cell is changed to reflect the actual Excel coordinate.
The matrix will replace the array of cells in the actual sheet. @param [Hash] options @return [Matrix] a Matrix
object that stores the cell values, and, depending on the copymerge parameter, will copy the merged value
into every merged cell
# File lib/oxcelix/workbook.rb, line 256 def matrixto(options) @sheets.each_with_index do |sheet, i| if sheet[:cells].empty? m = Sheet.build(0,0) else m = buildsheet(sheet, options) if options[:copymerge] == true sheet[:mergedcells].each do |mc| a = mc.split(':') x1 = x(a[0]) y1 = y(a[0]) x2 = x(a[1]) y2 = y(a[1]) mrange = m.minor(y1..y2, x1..x2) valuecell = mrange.to_a.flatten.compact[0] (x1..x2).each do |col| (y1..y2).each do |row| m, valuecell = mergevalues(m, col, row, valuecell) end end end end m.name = @sheets[i][:name]; m.sheetId = @sheets[i][:sheetId]; m.relationId = @sheets[i][:relationId] @sheets[i] = m end end end
Replace the empty values of the mergegroup with cell values or nil. @param [Matrix] m the Sheet
object @param [Integer] col Column of the actual cell @param [Integer] row Row of the actual cell @param [Cell] valuecell A Cell
containing the value to be copied over the mergegroup @return [Matrix, Cell] the sheet and the new (empty) cell or nil.
# File lib/oxcelix/workbook.rb, line 314 def mergevalues(m, col, row, valuecell) if valuecell != nil valuecell.xlcoords = (col.col_name)+(row+1).to_s m[row, col] = valuecell return m, valuecell else valuecell = Cell.new valuecell.xlcoords = (col.col_name)+(row+1).to_s m[row, col] = valuecell return m, valuecell end end
Parses the comments related to the actual sheet. @param [String] commentfile @return [Array] a collection of comments relative to the Excel sheet currently processed
# File lib/oxcelix/workbook.rb, line 229 def mkcomments(commentfile) unless commentfile.nil? comms = Comments.new() File.open(@destination + '/xl/'+commentfile.gsub('../', ''), 'r') do |f| Ox.sax_parse(comms, f) end return comms.commarray end end
Build the array of working sheets based on the :include and :exclude parameters. @param[sheetarr, options]
# File lib/oxcelix/workbook.rb, line 181 def sheet_collection(sheetarr, options) options[:include]||=[] if options[:include].to_a.size > 0 sheetarr.keep_if{|item| options[:include].to_a.detect{|d| d == item} } end sheetarr = (sheetarr - options[:exclude].to_a) @sheets.keep_if{|item| sheetarr.detect{|d| d == item[:name] } } @sheets.uniq! end
@private Given the data found in workbook.xml, create a hash and push it to the sheets array.
The hash will not be pushed into the array if the sheet name is blacklisted (it appears in the excluded_sheets array) or does not appear in the list of included sheets.
If included_sheets (the array of whitelisted sheets) is nil, the hash is added.
# File lib/oxcelix/workbook.rb, line 149 def sheetdata(wb_file, options) wb_file.locate("workbook/sheets/*").each do |x| @sheetbase[:name] = x[:name] @sheetbase[:sheetId] = x[:sheetId] @sheetbase[:relationId] = x[:"r:id"] relationshipfile = nil fname = nil unless Dir[@destination + '/xl/_rels'].empty? Find.find(@destination + '/xl/_rels') do |path| if File.basename(path).split(".").last=='rels' g = IO.read(path) relationshipfile=Ox::load(g) end end end relationshipfile.locate("Relationships/*").each do |rship| if rship[:Id] == x[:"r:id"] @sheetbase[:filename]=rship[:Target] end end @sheets << @sheetbase @sheetbase = Hash.new end sheetarr = @sheets.map{|i| i[:name]} sheet_collection(sheetarr, options) end
Invokes the Sharedstrings
helper class
# File lib/oxcelix/workbook.rb, line 218 def shstrings strings = Sharedstrings.new() File.open(@destination + '/xl/sharedStrings.xml', 'r') do |f| Ox.sax_parse(strings, f) end @sharedstrings = strings.stringarray end