class WordNet::Synset

Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!) relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)

Attributes

gloss[R]

Get a string representation of this synset's gloss. “Gloss” is a human-readable description of this concept, often with example usage, e.g:

move upward; "The fog lifted"; "The smoke arose from the forest fire"; "The mist uprose from the meadows"

for the second sense of the verb “fall”

lex_filenum[R]

A two digit decimal integer representing the name of the lexicographer file containing the synset for the sense. Probably only of interest if you're using a wordnet database marked up with custom attributes, and you want to ensure that you're using your own additions.

pos[R]

Get a shorthand representation of the part of speech this synset represents, e.g. “v” for verbs.

pos_offset[R]

Get the offset, in bytes, at which this synset's POS information is stored in WordNet's internal DB. You almost certainly don't care about this.

synset_offset[R]

Get the offset, in bytes, at which this synset's information is stored in WordNet's internal DB. You almost certainly don't care about this.

synset_type[R]

Get the part of speech type of this synset. One of 'n' (noun), 'v' (verb), 'a' (adjective), or 'r' (adverb)

word_counts[R]

Get the list of words (and their frequencies within the WordNet graph) contained in this Synset.

Public Class Methods

_apply_rules(forms, pos) click to toggle source
# File lib/rwordnet/synset.rb, line 109
def self._apply_rules(forms, pos)
    substitutions = MORPHOLOGICAL_SUBSTITUTIONS[pos]
    out = []
    forms.each do |form|
        substitutions.each do |old, new|
            if form.end_with? old
                out.push form[0...-old.length] + new
            end
        end
    end
    return out
end
_filter_forms(forms, pos) click to toggle source
# File lib/rwordnet/synset.rb, line 122
def self._filter_forms(forms, pos)
    forms.reject{|form| Lemma.find(form, pos).nil?}.uniq
end
find(word, pos) click to toggle source

Ported from python NLTK Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned.

# File lib/rwordnet/synset.rb, line 89
def self.find(word, pos)
    word = word.downcase
    lemmas = self.morphy(word, pos).map{|form| WordNet::Lemma.find(form, pos)}
    lemmas.map{|lemma| lemma.synsets}.flatten
end
find_all(word) click to toggle source
# File lib/rwordnet/synset.rb, line 95
def self.find_all(word)
    SYNSET_TYPES.values.map{|pos| self.find(word, pos)}.flatten
end
load_exception_map() click to toggle source
# File lib/rwordnet/synset.rb, line 99
def self.load_exception_map
    SYNSET_TYPES.each do |_, pos|
        @exception_map[pos] = {}
        File.open(File.join(@morphy_path, 'exceptions', "#{pos}.exc"), 'r').each_line do |line|
            line = line.split
            @exception_map[pos][line[0]] = line[1..-1]
        end
    end
end
morphy(form, pos) click to toggle source

ported from nltk python from jordanbg: Given an original string x

  1. Apply rules once to the input to get y1, y2, y3, etc.

  2. Return all that are in the database

  3. If there are no matches, keep applying rules until you either find a match or you can't go any further

# File lib/rwordnet/synset.rb, line 133
def self.morphy(form, pos)
    if @exception_map == {}
        self.load_exception_map
    end
    exceptions = @exception_map[pos]

    # 0. Check the exception lists
    if exceptions.has_key? form
        return self._filter_forms([form] + exceptions[form], pos)
    end

    # 1. Apply rules once to the input to get y1, y2, y3, etc.
    forms = self._apply_rules([form], pos)

    # 2. Return all that are in the database (and check the original too)
    results = self._filter_forms([form] + forms, pos)
    if results != []
        return results
    end

    # 3. If there are no matches, keep applying rules until we find a match
    while forms.length > 0
        forms = self._apply_rules(forms, pos)
        results = self._filter_forms(forms, pos)
        if results != []
            return results
        end
    end

    # Return an empty list if we can't find anything
    return []
end
morphy_all(form) click to toggle source
# File lib/rwordnet/synset.rb, line 166
def self.morphy_all(form)
    SYNSET_TYPES.values.map{|pos| self.morphy(form, pos)}.flatten
end
new(pos, offset) click to toggle source

Create a new synset by reading from the data file specified by pos, at offset bytes into the file. This is how the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.

# File lib/rwordnet/synset.rb, line 51
def initialize(pos, offset)
  data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
    f.seek(offset)
    f.readline.strip
  end

  info_line, @gloss = data_line.split(" | ", 2)
  line = info_line.split(" ")

  @pos = pos
  @pos_offset = offset
  @synset_offset = line.shift
  @lex_filenum = line.shift
  @synset_type = line.shift

  @word_counts = {}
  word_count = line.shift.to_i
  word_count.times do
    @word_counts[line.shift] = line.shift.to_i
  end

  pointer_count = line.shift.to_i
  @pointers = Array.new(pointer_count).map do
    Pointer.new(
      symbol: line.shift[0],
      offset: line.shift.to_i,
      pos: line.shift,
      source: line.shift
    )
  end
end

Public Instance Methods

antonyms() click to toggle source

Get the Synsets of this sense's antonym

# File lib/rwordnet/synset.rb, line 192
def antonyms
  relation(ANTONYM)
end
children()
Alias for: hyponyms
expanded_first_hypernyms() click to toggle source

Get the entire hypernym tree (from this synset all the way up to entity) as an array.

# File lib/rwordnet/synset.rb, line 213
def expanded_first_hypernyms
  parent = hypernym
  list = []
  return list unless parent

  while parent
    break if list.include? parent.pos_offset
    list.push parent.pos_offset
    parent = parent.hypernym
  end

  list.flatten!
  list.map! { |offset| Synset.new(@pos, offset)}
end
expanded_hypernyms() click to toggle source

Get the entire hypernym tree (from this synset all the way up to entity) as an array.

# File lib/rwordnet/synset.rb, line 229
def expanded_hypernyms
  parents = hypernyms
  list = []
  return list unless parents

  while parents.length > 0
    parent = parents.pop
    next if list.include? parent.pos_offset
    list.push parent.pos_offset
    parents.push *parent.hypernyms
  end

  list.flatten!
  list.map! { |offset| Synset.new(@pos, offset)}
end
expanded_hypernyms_depth() click to toggle source
# File lib/rwordnet/synset.rb, line 245
def expanded_hypernyms_depth
  parents = hypernyms.map{|hypernym| [hypernym, 1]}
  list = []
  out = []
  return list unless parents

  max_depth = 1
  while parents.length > 0
    parent, depth = parents.pop
    next if list.include? parent.pos_offset
    list.push parent.pos_offset
    out.push [Synset.new(@pos, parent.pos_offset), depth]
    parents.push *(parent.hypernyms.map{|hypernym| [hypernym, depth + 1]})
    max_depth = [max_depth, depth].max
  end
  return [out, max_depth]
end
hypernym() click to toggle source

Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).

# File lib/rwordnet/synset.rb, line 197
def hypernym
  relation(HYPERNYM)[0]
end
Also aliased as: parent
hypernyms() click to toggle source

Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure) as an array.

# File lib/rwordnet/synset.rb, line 203
def hypernyms
  relation(HYPERNYM)
end
Also aliased as: parents
hyponyms() click to toggle source

Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)

# File lib/rwordnet/synset.rb, line 208
def hyponyms
  relation(HYPONYM)
end
Also aliased as: children
parent()
Alias for: hypernym
parents()
Alias for: hypernyms
relation(pointer_symbol) click to toggle source

Get an array of Synsets with the relation `pointer_symbol` relative to this Synset. Mostly, this is an internal method used by convience methods (e.g. Synset#antonym), but it can take any valid valid pointer_symbol defined in pointers.rb.

Example (get the gloss of an antonym for 'fall'):

WordNet::Lemma.find("fall", :verb).synsets[1].relation("!")[0].gloss
# File lib/rwordnet/synset.rb, line 186
def relation(pointer_symbol)
  @pointers.select { |pointer| pointer.symbol == pointer_symbol }.
    map! { |pointer| Synset.new(@synset_type, pointer.offset) }
end
size()
Alias for: word_count
to_s() click to toggle source

Returns a compact, human-readable form of this synset, e.g.

(v) fall (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse")

for the second meaning of the verb “fall.”

# File lib/rwordnet/synset.rb, line 268
def to_s
  "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
end
Also aliased as: to_str
to_str()
Alias for: to_s
word_count() click to toggle source

How many words does this Synset include?

# File lib/rwordnet/synset.rb, line 171
def word_count
  @word_counts.size
end
Also aliased as: size
words() click to toggle source

Get a list of words included in this Synset

# File lib/rwordnet/synset.rb, line 176
def words
  @word_counts.keys
end