module Splitta
Provides convenience methods for splitting text into sentences.
@see README
A Document points to a collection of Frags
A fragment of text that ends with a possible sentence boundary
Naive Bayes model, with a few tweaks:
-
all feature types are pooled together for normalization (this might help because the independence assumption is so broken for our features)
-
smoothing: add 0.1 to all counts
-
priors are modified for better performance (this is mysterious but works much better)
A list of (regexp, repl) pairs applied in sequence. The resulting string is split on whitespace. (Adapted from the Punkt Word Tokenizer)
Constants
- VERSION
Current gem version
Public Class Methods
sentences(text)
click to toggle source
# File lib/splitta.rb, line 17 def self.sentences(text) Doc.new(text, model: Model.instance).segments.map(&:strip) end