module Emoninja::Stemmable
There are numerous strategies and algorithms for stemming. A widely-used algorithm for English stemming is the Porter stemming algorithm,
written by Martin Porter in 1980. The Porter stemmer follows a strategy of suffix stripping, which basically uses a set of rules to strip away suffixes.
For example, a word that ends with ‘-ed’ might be suffix-stripped to remove the ‘-ed’. The Porter stemmer follows a sequence of steps in stripping suffixes.
tartarus.org/~martin/PorterStemmer/ruby.txt rubocop:disable Metrics/ModuleLength
Constants
- C
- CC
- MEQ1
- MGR0
- MGR1
- STEP_2_LIST
- STEP_3_LIST
- SUFFIX_1_REGEXP
- SUFFIX_2_REGEXP
- V
- VOWEL_IN_STEM
- VV
Public Instance Methods
stem()
make the stem_porter
the default stem method, just in case we feel like having multiple stemmers available later.
Alias for: stem_porter
stem_porter()
click to toggle source
rubocop:disable Metrics/MethodLength rubocop:disable Style/PerlBackrefs
# File lib/emoninja/porter_stemmer.rb, line 97 def stem_porter # make a copy of the given object and convert it to a string. w = dup.to_str return w if w.length < 3 # now map initial y to Y so that the patterns never treat it as vowel w[0] = 'Y' if w[0] == 'y' # Step 1a case w when /(ss|i)es$/ then w = $` + $1 when /([^s])s$/ then w = $` + $1 end # Step 1b if w =~ /eed$/ w.chop! if $` =~ MGR0 elsif w =~ /(ed|ing)$/ stem = $` if stem =~ VOWEL_IN_STEM w = stem case w when /(at|bl|iz)$/ then w << 'e' when /([^aeiouylsz])\1$/ then w.chop! when /^#{CC}#{V}[^aeiouwxy]$/o then w << 'e' end end end if w =~ /y$/ stem = $` w = stem + 'i' if stem =~ VOWEL_IN_STEM end # Step 2 if w =~ SUFFIX_1_REGEXP stem = $` suffix = $1 # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n" w = stem + STEP_2_LIST[suffix] if stem =~ MGR0 end # Step 3 if w =~ /(icate|ative|alize|iciti|ical|ful|ness)$/ stem = $` suffix = $1 w = stem + STEP_3_LIST[suffix] if stem =~ MGR0 end # Step 4 if w =~ SUFFIX_2_REGEXP stem = $` w = stem if stem =~ MGR1 elsif w =~ /(s|t)(ion)$/ stem = $` + $1 w = stem if stem =~ MGR1 end # Step 5 if w =~ /e$/ stem = $` w = stem if (stem =~ MGR1) || (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/o) end w.chop! if w =~ /ll$/ && w =~ MGR1 # and turn initial Y back to y w[0] = 'y' if w[0] == 'Y' w end
Also aliased as: stem