class WordNet::Synset
Represents a synset (or group of synonymous words) in WordNet
. Synsets are related to each other by various (and numerous!) relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
Attributes
Get a string representation of this synset's gloss. “Gloss” is a human-readable description of this concept, often with example usage, e.g:
move upward; "The fog lifted"; "The smoke arose from the forest fire"; "The mist uprose from the meadows"
for the second sense of the verb “fall”
A two digit decimal integer representing the name of the lexicographer file containing the synset for the sense. Probably only of interest if you're using a wordnet database marked up with custom attributes, and you want to ensure that you're using your own additions.
Get a shorthand representation of the part of speech this synset represents, e.g. “v” for verbs.
Get the offset, in bytes, at which this synset's POS information is stored in WordNet's internal DB
. You almost certainly don't care about this.
Get the offset, in bytes, at which this synset's information is stored in WordNet's internal DB
. You almost certainly don't care about this.
Get the part of speech type of this synset. One of 'n' (noun), 'v' (verb), 'a' (adjective), or 'r' (adverb)
Public Class Methods
# File lib/rwordnet/synset.rb, line 109 def self._apply_rules(forms, pos) substitutions = MORPHOLOGICAL_SUBSTITUTIONS[pos] out = [] forms.each do |form| substitutions.each do |old, new| if form.end_with? old out.push form[0...-old.length] + new end end end return out end
# File lib/rwordnet/synset.rb, line 122 def self._filter_forms(forms, pos) forms.reject{|form| Lemma.find(form, pos).nil?}.uniq end
Ported from python NLTK Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned.
# File lib/rwordnet/synset.rb, line 89 def self.find(word, pos) word = word.downcase lemmas = self.morphy(word, pos).map{|form| WordNet::Lemma.find(form, pos)} lemmas.map{|lemma| lemma.synsets}.flatten end
# File lib/rwordnet/synset.rb, line 95 def self.find_all(word) SYNSET_TYPES.values.map{|pos| self.find(word, pos)}.flatten end
# File lib/rwordnet/synset.rb, line 99 def self.load_exception_map SYNSET_TYPES.each do |_, pos| @exception_map[pos] = {} File.open(File.join(@morphy_path, 'exceptions', "#{pos}.exc"), 'r').each_line do |line| line = line.split @exception_map[pos][line[0]] = line[1..-1] end end end
ported from nltk python from jordanbg: Given an original string x
-
Apply rules once to the input to get y1, y2, y3, etc.
-
Return all that are in the database
-
If there are no matches, keep applying rules until you either find a match or you can't go any further
# File lib/rwordnet/synset.rb, line 133 def self.morphy(form, pos) if @exception_map == {} self.load_exception_map end exceptions = @exception_map[pos] # 0. Check the exception lists if exceptions.has_key? form return self._filter_forms([form] + exceptions[form], pos) end # 1. Apply rules once to the input to get y1, y2, y3, etc. forms = self._apply_rules([form], pos) # 2. Return all that are in the database (and check the original too) results = self._filter_forms([form] + forms, pos) if results != [] return results end # 3. If there are no matches, keep applying rules until we find a match while forms.length > 0 forms = self._apply_rules(forms, pos) results = self._filter_forms(forms, pos) if results != [] return results end end # Return an empty list if we can't find anything return [] end
# File lib/rwordnet/synset.rb, line 166 def self.morphy_all(form) SYNSET_TYPES.values.map{|pos| self.morphy(form, pos)}.flatten end
Create a new synset by reading from the data file specified by pos
, at offset
bytes into the file. This is how the WordNet
database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets
.
# File lib/rwordnet/synset.rb, line 51 def initialize(pos, offset) data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f| f.seek(offset) f.readline.strip end info_line, @gloss = data_line.split(" | ", 2) line = info_line.split(" ") @pos = pos @pos_offset = offset @synset_offset = line.shift @lex_filenum = line.shift @synset_type = line.shift @word_counts = {} word_count = line.shift.to_i word_count.times do @word_counts[line.shift] = line.shift.to_i end pointer_count = line.shift.to_i @pointers = Array.new(pointer_count).map do Pointer.new( symbol: line.shift[0], offset: line.shift.to_i, pos: line.shift, source: line.shift ) end end
Public Instance Methods
Get the Synsets of this sense's antonym
# File lib/rwordnet/synset.rb, line 192 def antonyms relation(ANTONYM) end
Get the entire hypernym tree (from this synset all the way up to entity
) as an array.
# File lib/rwordnet/synset.rb, line 213 def expanded_first_hypernyms parent = hypernym list = [] return list unless parent while parent break if list.include? parent.pos_offset list.push parent.pos_offset parent = parent.hypernym end list.flatten! list.map! { |offset| Synset.new(@pos, offset)} end
Get the entire hypernym tree (from this synset all the way up to entity
) as an array.
# File lib/rwordnet/synset.rb, line 229 def expanded_hypernyms parents = hypernyms list = [] return list unless parents while parents.length > 0 parent = parents.pop next if list.include? parent.pos_offset list.push parent.pos_offset parents.push *parent.hypernyms end list.flatten! list.map! { |offset| Synset.new(@pos, offset)} end
# File lib/rwordnet/synset.rb, line 245 def expanded_hypernyms_depth parents = hypernyms.map{|hypernym| [hypernym, 1]} list = [] out = [] return list unless parents max_depth = 1 while parents.length > 0 parent, depth = parents.pop next if list.include? parent.pos_offset list.push parent.pos_offset out.push [Synset.new(@pos, parent.pos_offset), depth] parents.push *(parent.hypernyms.map{|hypernym| [hypernym, depth + 1]}) max_depth = [max_depth, depth].max end return [out, max_depth] end
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
# File lib/rwordnet/synset.rb, line 197 def hypernym relation(HYPERNYM)[0] end
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure) as an array.
# File lib/rwordnet/synset.rb, line 203 def hypernyms relation(HYPERNYM) end
Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
# File lib/rwordnet/synset.rb, line 208 def hyponyms relation(HYPONYM) end
Get an array of Synsets with the relation `pointer_symbol` relative to this Synset
. Mostly, this is an internal method used by convience methods (e.g. Synset#antonym), but it can take any valid valid pointer_symbol
defined in pointers.rb.
Example (get the gloss of an antonym for 'fall'):
WordNet::Lemma.find("fall", :verb).synsets[1].relation("!")[0].gloss
# File lib/rwordnet/synset.rb, line 186 def relation(pointer_symbol) @pointers.select { |pointer| pointer.symbol == pointer_symbol }. map! { |pointer| Synset.new(@synset_type, pointer.offset) } end
Returns a compact, human-readable form of this synset, e.g.
(v) fall (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse")
for the second meaning of the verb “fall.”
# File lib/rwordnet/synset.rb, line 268 def to_s "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})" end
How many words does this Synset
include?
# File lib/rwordnet/synset.rb, line 171 def word_count @word_counts.size end
Get a list of words included in this Synset
# File lib/rwordnet/synset.rb, line 176 def words @word_counts.keys end