class Myasorubka::MSD

MSD is a morphosyntactic descriptor model.

This representation, with the concrete applications which display and exemplify the attributes and values and provide their internal constraints and relationships, makes the proposal self-explanatory. Other groups can easily test the specifications on their language, simply by following the method of the applications. The possibility of incorporating idiosyncratic classes and distinctions after the common core features makes the proposal relatively adaptable and flexible, without compromising compatibility.

MSD implementation and documentation are based on MULTEXT-East Morphosyntactic Specifications, Version 4: nl.ijs.si/ME/V4/msd/html/msd.html

You may use Myasorubka::MSD either as parser and generator.

“`ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian) msd = :noun msd = :common msd = :plural msd = :locative msd.to_s # => “Nc-pl” “`

“`ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian, 'Vmps-snpfel') msd # => :verb msd # => :past msd # => nil msd.grammemes # => {:type=>:main, :vform=>:participle, …} “`

Constants

EMPTY_DESCRIPTOR

Empty descriptor character.

Attributes

grammemes[R]
language[R]
pos[RW]

Public Class Methods

new(language, msd = '') click to toggle source

Creates a new morphosyntactic descriptor model instance. Please specify a `language` module with defined `CATEGORIES`.

Optionally, you can parse MSD string that is passed as `msd` argument.

@param language [Myasorubka::MSD::Language] a language to use. @param msd [String] a String to initialize new MSD.

# File lib/myasorubka/msd.rb, line 63
def initialize(language, msd = '')
  @language, @pos, @grammemes = language, nil, {}

  unless language.const_defined? 'CATEGORIES'
    raise ArgumentError,
      'given language has no morphosyntactic descriptions'
  end

  parse! msd if msd && !msd.empty?
end

Public Instance Methods

<=>(other) click to toggle source

@private

# File lib/myasorubka/msd.rb, line 104
def <=> other
  to_s <=> other.to_s
end
==(other) click to toggle source

@private

# File lib/myasorubka/msd.rb, line 109
def == other
  to_s == other.to_s
end
[](key) click to toggle source

Retrieves the morphosyntactic descriptor corresponding to the `key` object. If not, returns `nil`.

@param key [Symbol] a key to look at. @return [Symbol] a value of `key`.

# File lib/myasorubka/msd.rb, line 80
def [] key
  return pos if :pos == key
  grammemes[key]
end
[]=(key, value) click to toggle source

Assignes the morphosyntactic descriptor given by `value` with the key given by `key` object.

@param key [Symbol] a key to be set. @param value [Symbol] a value to be assigned. @return [Symbol] the assigned value.

# File lib/myasorubka/msd.rb, line 92
def []= key, value
  return @pos = value if :pos == key
  raise InvalidDescriptor, 'category is not set yet' unless pos
  grammemes[key] = value
end
inspect() click to toggle source

@private

# File lib/myasorubka/msd.rb, line 99
def inspect
  '#<%s msd=%s>' % [language.name, to_s.inspect]
end
merge!(hash) click to toggle source

Merges grammemes that are stored in `hash` into the MSD grammemes.

@param hash [Hash<Symbol, Symbol>] a hash to be processed. @return [MSD] self.

# File lib/myasorubka/msd.rb, line 140
def merge! hash
  hash.each do |key, value|
    self[key.to_sym] = value.to_sym
  end

  self
end
prune!() click to toggle source

Drop every attribute that does not appear in the category.

@return [MSD] self.

# File lib/myasorubka/msd.rb, line 192
def prune!
  unless category = language::CATEGORIES[pos]
    self.pos = nil
    grammemes.clear
    return self
  end

  attributes = category[:attrs]

  grammemes.reject! do |attribute, value|
    if index = attributes.index { |name, _| name == attribute }
      _, values = attributes[index]
      !values[value]
    else
      true
    end
  end

  self
end
to_regexp() click to toggle source

Generates Regexp from the MSD that is useful to perform database queries.

“`ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian, 'Vm') r = msd.to_regexp # => /^Vm.*$/ 'Vmp' =~ r # 0 'Nc-pl' =~ r # nil “`

@return [Regexp] the correspondent regular expression.

# File lib/myasorubka/msd.rb, line 125
def to_regexp
  Regexp.new([
    '^',
    self.to_s.gsub(EMPTY_DESCRIPTOR, '.'),
    '.*',
    '$'
  ].join)
end
to_s() click to toggle source

@private

# File lib/myasorubka/msd.rb, line 149
def to_s
  return '' unless pos

  unless category = language::CATEGORIES[pos]
    raise InvalidDescriptor, "category is nil"
  end

  attributes = category[:attrs]
  msd = [category[:code]]

  grammemes.each do |attribute, value|
    next unless value

    unless index = attributes.index { |name, _| name == attribute }
      raise InvalidDescriptor, 'no such attribute "%s" of category "%s"' % [attribute, pos]
    end

    _, values = attributes[index]

    unless attribute_value = values[value]
      raise InvalidDescriptor, 'no such value "%s" for attribute "%s" of category "%s"' % [value, attribute, pos]
    end

    msd[index + 1] = attribute_value
  end

  msd.map { |e| e || EMPTY_DESCRIPTOR }.join
end
valid?() click to toggle source

Validates the MSD instance.

@return [true, false] validation state of the MSD instance.

# File lib/myasorubka/msd.rb, line 182
def valid?
  !!to_s
rescue InvalidDescriptor
  false
end

Protected Instance Methods

parse!(msd_line) click to toggle source

@private

# File lib/myasorubka/msd.rb, line 215
def parse! msd_line
  msd = msd_line.chars.to_a

  category_code = msd.shift

  @pos, category = language::CATEGORIES.find do |name, candidate|
    candidate[:code] == category_code
  end

  raise InvalidDescriptor, msd_line unless @pos

  attrs = category[:attrs]

  msd.each_with_index do |value_code, i|
    attr_name, values = attrs[i]
    raise InvalidDescriptor, msd_line unless attr_name

    next if :blank == attr_name
    next if EMPTY_DESCRIPTOR == value_code

    attribute = values.find { |name, code| code == value_code }
    raise InvalidDescriptor, msd_line unless attribute

    self[attr_name] = attribute.first
  end
end