class CTioga2::Data::Backends::TextBackend

Constants

InvalidLineRE

A line is invalid if it is blank or starts neither with a digit nor +, - or .

Maybe to be improved later.

Public Class Methods

new() click to toggle source
Calls superclass method CTioga2::Data::Backends::Backend::new
# File lib/ctioga2/data/backends/backends/text.rb, line 87
def initialize
  @dummy = nil
  @current = nil   
  # Current is the name of the last file used. Necessary for '' specs.
  @current_data = nil       # The data of the last file used.
  @skip = 0
  @included_modules = [NaN]    # to make sure we give them to
  # Dvector.compute_formula
  @default_column_spec = "1:2"

  @separator = /\s+/

  # We don't split data by default.
  @split = false

  @param_regex = nil

  @header_line_regex = /^\#\#\s*/

  super()

  # Override Backend's cache - for now.
  @cache = {}               # A cache file_name -> data

  @param_cache = {}     # Same thing as cache, but for parameters

  @headers_cache = {}   # Same thing as cache, but for header
                        # lines.

end

Public Instance Methods

expand_sets(spec) click to toggle source

Expands specifications into few sets. This function will separate the set into a file spec and a col spec. Within the col spec, the 2##6 keyword is used to expand to 2,3,4,5,6. 2## followed by a non-digit expands to 2,…,last column in the file. For now, the expansions stops on the first occurence found, and the second form doesn't work yet. But soon…

# File lib/ctioga2/data/backends/backends/text.rb, line 129
def expand_sets(spec)
  if m = /(\d+)##(\D|$)/.match(spec)
    a = m[1].to_i 
    trail = m[2]
    b = read_file(spec)
    b = (b.length - 1) 
    ret = []
    a.upto(b) do |i|
      ret << m.pre_match + i.to_s + trail + m.post_match
    end
    return ret
  else
    m = Dir::glob(spec)
    if m.size > 0
      m.sort!
      return m
    else
      return super
    end
  end
end
extend(mod) click to toggle source
Calls superclass method
# File lib/ctioga2/data/backends/backends/text.rb, line 118
def extend(mod)
  super
  @included_modules << mod
end

Protected Instance Methods

get_data_column(column, compute_formulas = false, parameters = nil, header = nil) click to toggle source

Gets the data corresponding to the given column. If compute_formulas is true, the column specification is taken to be a formula (in the spirit of gnuplot's)

# File lib/ctioga2/data/backends/backends/text.rb, line 354
def get_data_column(column, compute_formulas = false, 
                    parameters = nil, header = nil)
  if compute_formulas
    formula = Utils::parse_formula(column, parameters, header)
    debug { "Using formula #{formula} for column spec: #{column}" }
    return Ruby.compute_formula(formula, 
                                @current_data,
                                @included_modules)
  else
    if @current_data[column.to_i]
      return @current_data[column.to_i].dup
    else
      raise "Cannot find column number #{column.to_i} -- maybe you got the column separator wrong ?"
    end
  end
end
get_io_object(file) click to toggle source

Returns a IO object suitable to acquire data from it for the given file, which can be one of the following:

  • a real file name

  • a compressed file name

  • a pipe command.

# File lib/ctioga2/data/backends/backends/text.rb, line 160
def get_io_object(file)
  if file == "-"
    return $stdin
  elsif file =~ /(.*?)\|\s*$/ # A pipe
    return IO.popen($1)
  else
    return Utils::open(file)
  end
end
get_io_set(file) click to toggle source

Returns an IO object corresponding to the given file.

# File lib/ctioga2/data/backends/backends/text.rb, line 209
def get_io_set(file)
  if not @split
    return get_io_object(file)
  else
    file =~ /(.*?)(?:#(\d+))?$/; # ; to make ruby-mode indent correctly.
    filename = $1
    if $2
      set = $2.to_i
    else
      set = 1
    end
    debug { "Trying to get set #{set} from file '#{filename}'" }
    str = get_set_string(get_io_object(filename), set)
    return StringIO.new(str)
  end
end
get_set_string(io, set) click to toggle source

Returns a string corresponding to the given set of the given io object.

Sets are 1-based.

# File lib/ctioga2/data/backends/backends/text.rb, line 180
def get_set_string(io, set)
  cur_set = 1
  last_line_is_invalid = true
  str = ""
  line_number = 0
  while line = io.gets
    line_number += 1
    if line =~ InvalidLineRE
      debug { "Found invalid line at #{line_number}" }
      if ! last_line_is_invalid
        # We begin a new set.
        cur_set += 1
        debug { "Found set #{cur_set} at line #{line_number}" }
        if(cur_set > set)
          return str
        end
      end
      last_line_is_invalid = true
    else
      last_line_is_invalid = false
      if cur_set == set
        str += line
      end
    end
  end
  return str
end
param_regex=(val) click to toggle source

A proper writer for @param_regex

# File lib/ctioga2/data/backends/backends/text.rb, line 228
def param_regex=(val)
  if val.is_a? Regexp
    @param_regex = val
  elsif val =~ /([^\\]|^)\(/     # Has capturing groups
    @param_regex = /#{val}/
  else                  # Treat as separator
    @param_regex = /(\S+)\s*#{val}\s*(\S+)/
  end
end
parse_header_line(comments) click to toggle source

Turns an array of comments into a hash column name -> column number (1-based)

# File lib/ctioga2/data/backends/backends/text.rb, line 251
def parse_header_line(comments)
  for line in comments
    if line =~ @header_line_regex
      colnames = line.gsub(@header_line_regex,'').split(@separator)
      i = 1
      ret = {}
      for n in colnames
        ret[n] = i
        i += 1
      end
      return ret
    end
  end
  return {}
end
parse_parameters(comments) click to toggle source

Turns an array of comments into a hash -> value

# File lib/ctioga2/data/backends/backends/text.rb, line 239
def parse_parameters(comments)
  ret = {}
  for line in comments
    if line =~ @param_regex
      ret[$1] = $2.to_f
    end
  end
  return ret
end
query_dataset(set) click to toggle source

This is called by the architecture to get the data. It splits the set name into filename@cols, reads the file if necessary and calls get_data

# File lib/ctioga2/data/backends/backends/text.rb, line 325
def query_dataset(set)
  if set =~ /(.*)@(.*)/
    col_spec = $2
    file = $1
  else
    col_spec = @default_column_spec
    file = set
  end
  if file.length > 0
    @current_data = read_file(file)
    @current = file
  end

  # Wether we need or not to compute formulas:
  if col_spec =~ /\$/
    compute_formulas = true
  else
    compute_formulas = false
  end
  
  return Dataset.dataset_from_spec(set, col_spec) do |col|
    get_data_column(col, compute_formulas, 
                    @current_parameters, @current_header)
  end
end
read_file(file) click to toggle source

Reads data from a file. If needed, extract the file from the columns specification.

todo the cache really should include things such as time of last modification and various parameters that influence the reading of the file, and the parameters read from the file using parse_parameters

todo There should be a real global handling of meta-data extracted from files, so that they could be included for instance in the automatic labels ? (and we could have fun improving this one ?)

@todo There should be a way to read pure text columns and use them somehow, to annotate the output ? This should be implemented at the Tioga level, though (both for reading, in fancy_read, and for using hover stuff)

warning This needs Tioga r561

# File lib/ctioga2/data/backends/backends/text.rb, line 286
def read_file(file)
  if file =~ /(.*)@.*/
    file = $1
  end
  name = file               # As file will be modified.
  if ! @cache.key?(file)    # Read the file if it is not cached.
    comments = []
    fancy_read_options = {'index_col' => true,
      'skip_first' => @skip,
      'sep' => @separator,
      'comment_out' => comments
    }
    io_set = get_io_set(file)
    debug { "Fancy read '#{file}', options #{fancy_read_options.inspect}" }
    @cache[name] = Dvector.fancy_read(io_set, nil, fancy_read_options)
    if @param_regex
      # Now parsing params
      @param_cache[name] = parse_parameters(comments)
      info { "Read #{@param_cache[name].size} parameters from #{name}" }
      debug { "Parameters read: #{@param_cache[name].inspect}" }
    end
    if @header_line_regex
      @headers_cache[name] = parse_header_line(comments)
      info { "Read #{@headers_cache[name].size} column names from #{name}" }
      debug { "Got: #{@headers_cache[name].inspect}" }
    end
  end
  ## @todo These are not very satisfying; ideally, the data
  ## information should be embedded into @cache[name] rather
  ## than as external variables. Well...
  @current_parameters = @param_cache[name]
  @current_header = @headers_cache[name]
  return @cache[name]
end