class Oxcelix::Workbook

A class that represents an Excel workbook. By default, it will open the excel file, and convert it to a collection of Matrix objects @!attribute [rw] sheets

@return [Array] a collection of {Sheet} objects

Attributes

sheets[RW]

Public Class Methods

new(filename=nil, options={}) click to toggle source

Create a new {Workbook} object.

filename is the name of the Excel 2007/2010 file (xlsx) to be opened (Optional)

options is a collection of options that can be passed to Workbook. Options may include:

  • :copymerge (=> true/false) - Copy and repeat the content of the merged cells into the whole group, e.g.

the group of three merged cells | a | will become: |a|a|a|

  • :include (Array) - an array of sheet names to be included

  • :exclude (Array) - an array of sheet names not to be processed

  • :paginate (Array) - an array that defines the number of lines to be included in the pagination and the page to be parsed

  • :cellrange (Range) - the range of cells to be included in parsing

If a filename gets passed, the excel file is first getting unzipped, then the workbook.xml file gets processed. This file stores sheet metadata, which will be filtered (by including and excluding sheets from further processing)

The next stage is building sheets. This includes:

  • Parsing the XML files representing the sheets

  • Interpolation of the shared strings

  • adding comments to the cells

  • Converting each sheet to a Matrix object

  • Deleting the temporary directory that stores the XML files.

# File lib/oxcelix/workbook.rb, line 53
def initialize(filename=nil, options={})
  @sheets        = []
  @sheetbase     = {}
  @sharedstrings = []
  unless filename.nil?
    unpack filename
    open options
    parse options
    FileUtils.remove_dir(@destination, true)
  end
end

Public Instance Methods

open(options={}) click to toggle source

Parses workbook metadata (sheet data, comments, shared strings) @param [Hash] options Options affecting file opening, metadata collection and processing.

# File lib/oxcelix/workbook.rb, line 85
def open(options={})
  f = IO.read(@destination + '/xl/workbook.xml')
  a = Ox::load(f)

  sheetdata(a, options); commentsrel; shstrings;

  @styles = Styles.new()
  File.open(@destination + '/xl/styles.xml', 'r') do |f|
    Ox.sax_parse(@styles, f)
  end

  @styles.temparray.sort_by!{|st| st[:numFmtId].to_i}
  add_custom_formats @styles.temparray
  @styles.styleary.map!{|s| Numformats::Formatarray[s.to_i][:id].to_i}
end
parse(options={}) click to toggle source

Parses sheet data by feeding the output of the Xlsheet SAX parser into the arrays representing the sheets. @param [Hash] options Options that affect the parser.

# File lib/oxcelix/workbook.rb, line 103
def parse(options={})
  @sheets.each do |x|
    if !options[:paginate].nil?
      lines = options[:paginate][0]
      page  = options[:paginate][1]
      sheet = PagSheet.new(lines, page)
    elsif !options[:cellrange].nil?
      range = options[:cellrange]
      sheet = Cellrange.new(range)
    else
      sheet = Xlsheet.new()
    end

    File.open(@destination+"/xl/#{x[:filename]}", 'r') do |f|
      Ox.sax_parse(sheet, f)
    end
    comments = mkcomments(x[:comments])
    sheet.cellarray.each do |sh|
      sh.numformat = @styles.styleary[sh.style.to_i]
      if sh.type == "s"
        sh.value = @sharedstrings[sh.value.to_i]
      end
      if !comments.nil?
        comm = comments.select{|c| c[:ref] == (sh.xlcoords) }
        if comm.size > 0
          sh.comment = comm[0][:comment]
        end
        comments.delete_if{ |c| c[:ref] == (sh.xlcoords) }
      end
    end
    x[:cells]       = sheet.cellarray
    x[:mergedcells] = sheet.mergedcells
  end
  matrixto options
end
unpack(filename) click to toggle source

Unzips the excel file to a temporary directory. The directory will be removed at the end of the parsing stage when invoked by initialize, otherwise at exit. @param [String] filename the name of the Excel file to be unpacked

# File lib/oxcelix/workbook.rb, line 72
def unpack(filename)
  @destination = Dir.mktmpdir
  Zip::File.open(filename){ |zip_file|
    zip_file.each{ |f|
      f_path = File.join(@destination, f.name)
      FileUtils.mkdir_p(File.dirname(f_path))
      zip_file.extract(f, f_path) unless File.exists?(f_path)
    }
  }
end

Private Instance Methods

buildsheet(sheet, options) click to toggle source

buildsheet creates a matrix of the needed size and fills it with the cells. Mainly for internal use only. When paginating or parsing only a range of cells, the size of the matrix will be adjusted (no nil values will be left at the beginning of the sheet), to preserve memory. @param [Sheet] sheet the actual sheetarray. @param [Hash] options :paginate or :cellrange will affect the size of the matrix @return [Sheet] a Sheet object that stores the cell values.

# File lib/oxcelix/workbook.rb, line 292
def buildsheet(sheet, options)
  ydiff, xdiff = 0,0
  if !options[:paginate].nil?
    ydiff = options[:paginate][0] * (options[:paginate][1]-1)
  elsif !options[:cellrange].nil?
    xdiff = x(options[:cellrange].begin)
    ydiff = y(options[:cellrange].begin)
  end

  m = Sheet.build(sheet[:cells].last.y+1-ydiff, sheet[:cells].last.x+1-xdiff) {nil}
  sheet[:cells].each do |c|
    m[c.y-ydiff, c.x-xdiff] = c
  end
  return m
end
commentsrel() click to toggle source

Build the relationship between sheets and the XML files storing the comments to the actual sheet.

# File lib/oxcelix/workbook.rb, line 193
def commentsrel
 unless Dir[@destination + '/xl/worksheets/_rels'].empty?
  Find.find(@destination + '/xl/worksheets/_rels') do |path|
    if File.basename(path).split(".").last == 'rels'
      a = IO.read(path)
      f = Ox::load(a)
      f.locate("Relationships/*").each do |x|
        if x[:Target].include?"comments"
          @sheets.each do |s|
            if "worksheets/" + File.basename(path,".rels") == s[:filename]
              s[:comments] = x[:Target]
            end
          end
        end
      end
    end
  end
 else
   @sheets.each do |s|
     s[:comments] = nil
   end
 end
end
matrixto(options) click to toggle source

Returns an array of Matrix objects. For each sheet, matrixto first checks the address (xlcoords) of the last cell in the cellarray, then builds a nil-filled Matrix object of size *xlcoords.x, xlcoords.y*.

The matrix will then be filled with Cell objects according to their coordinates.

If the copymerge parameter is true, it creates a submatrix (minor) of every mergegroup (based on the mergedcells array relative to the actual sheet), and after the only meaningful cell of the minor is found, it is copied back to the remaining cells of the group. The coordinates (xlcoords) of each copied cell is changed to reflect the actual Excel coordinate.

The matrix will replace the array of cells in the actual sheet. @param [Hash] options @return [Matrix] a Matrix object that stores the cell values, and, depending on the copymerge parameter, will copy the merged value

into every merged cell
# File lib/oxcelix/workbook.rb, line 256
def matrixto(options)
  @sheets.each_with_index do |sheet, i|
    if sheet[:cells].empty?
      m = Sheet.build(0,0)
    else
      m = buildsheet(sheet, options)
      if options[:copymerge] == true
        sheet[:mergedcells].each do |mc|
          a  = mc.split(':')
          x1 = x(a[0])
          y1 = y(a[0])
          x2 = x(a[1])
          y2 = y(a[1])
          mrange    = m.minor(y1..y2, x1..x2)
          valuecell = mrange.to_a.flatten.compact[0]
          (x1..x2).each do |col|
            (y1..y2).each do |row|
              m, valuecell = mergevalues(m, col, row, valuecell)
            end
          end
        end
      end
      m.name       = @sheets[i][:name];
      m.sheetId    = @sheets[i][:sheetId];
      m.relationId = @sheets[i][:relationId]
      @sheets[i]   = m
    end
  end
end
mergevalues(m, col, row, valuecell) click to toggle source

Replace the empty values of the mergegroup with cell values or nil. @param [Matrix] m the Sheet object @param [Integer] col Column of the actual cell @param [Integer] row Row of the actual cell @param [Cell] valuecell A Cell containing the value to be copied over the mergegroup @return [Matrix, Cell] the sheet and the new (empty) cell or nil.

# File lib/oxcelix/workbook.rb, line 314
def mergevalues(m, col, row, valuecell)
  if valuecell != nil
    valuecell.xlcoords = (col.col_name)+(row+1).to_s
    m[row, col] = valuecell
    return m, valuecell
  else
    valuecell = Cell.new
    valuecell.xlcoords = (col.col_name)+(row+1).to_s
    m[row, col] = valuecell
    return m, valuecell
  end
end
mkcomments(commentfile) click to toggle source

Parses the comments related to the actual sheet. @param [String] commentfile @return [Array] a collection of comments relative to the Excel sheet currently processed

# File lib/oxcelix/workbook.rb, line 229
def mkcomments(commentfile)
  unless commentfile.nil?
    comms = Comments.new()
    File.open(@destination + '/xl/'+commentfile.gsub('../', ''), 'r') do |f|
      Ox.sax_parse(comms, f)
    end
    return comms.commarray
  end
end
sheet_collection(sheetarr, options) click to toggle source

Build the array of working sheets based on the :include and :exclude parameters. @param[sheetarr, options]

# File lib/oxcelix/workbook.rb, line 181
def sheet_collection(sheetarr, options)
  options[:include]||=[]
  if options[:include].to_a.size > 0
    sheetarr.keep_if{|item| options[:include].to_a.detect{|d| d == item} }
  end
  sheetarr = (sheetarr - options[:exclude].to_a)
  @sheets.keep_if{|item| sheetarr.detect{|d| d == item[:name] } }
  @sheets.uniq!
end
sheetdata(wb_file, options) click to toggle source

@private Given the data found in workbook.xml, create a hash and push it to the sheets array.

The hash will not be pushed into the array if the sheet name is blacklisted (it appears in the excluded_sheets array) or does not appear in the list of included sheets.

If included_sheets (the array of whitelisted sheets) is nil, the hash is added.

# File lib/oxcelix/workbook.rb, line 149
def sheetdata(wb_file, options)
  wb_file.locate("workbook/sheets/*").each do |x|
    @sheetbase[:name]       = x[:name]
    @sheetbase[:sheetId]    = x[:sheetId]
    @sheetbase[:relationId] = x[:"r:id"]

    relationshipfile = nil
    fname            = nil
    unless Dir[@destination + '/xl/_rels'].empty?
      Find.find(@destination + '/xl/_rels') do |path|
        if File.basename(path).split(".").last=='rels'
          g = IO.read(path)
          relationshipfile=Ox::load(g)
        end
      end
    end
    relationshipfile.locate("Relationships/*").each do |rship|
      if rship[:Id] == x[:"r:id"]
        @sheetbase[:filename]=rship[:Target]
      end
    end


    @sheets << @sheetbase
    @sheetbase = Hash.new
  end
  sheetarr = @sheets.map{|i| i[:name]}
  sheet_collection(sheetarr, options)
end
shstrings() click to toggle source

Invokes the Sharedstrings helper class

# File lib/oxcelix/workbook.rb, line 218
def shstrings
  strings = Sharedstrings.new()
  File.open(@destination + '/xl/sharedStrings.xml', 'r') do |f|
    Ox.sax_parse(strings, f)
  end
  @sharedstrings = strings.stringarray
end