public final class WordSet
extends java.lang.Object
Performance of the set is comparable to that of TreeSet
for Strings, ie. 2-3x slower than HashSet
when
using pre-constructed Strings. This is generally result of algorithmic
complexity of structures; Word and Tree sets are roughly logarithmic
to the whole data, whereas Hash set is linear to the length of key.
However:
Although this is an efficient set for specific set of usage patterns, one restriction is that the full set of words to include has to be known before constructing the set. Also, the size of the set is limited to total word content of about 20k characters; factory method does verify the limit and indicates if an instance can not be created.
Modifier and Type | Class and Description |
---|---|
private static class |
WordSet.Builder |
Modifier and Type | Field and Description |
---|---|
(package private) static char |
CHAR_NULL |
(package private) char[] |
mData
Compressed presentation of the word set.
|
(package private) static int |
MIN_BINARY_SEARCH
This is actually just a guess; but in general linear search should
be faster for short sequences (definitely for 4 or less; maybe up
to 8 or less?)
|
(package private) static int |
NEGATIVE_OFFSET
Offset added to numbers to mark 'negative' numbers.
|
Modifier | Constructor and Description |
---|---|
private |
WordSet(char[] data) |
Modifier and Type | Method and Description |
---|---|
static char[] |
constructRaw(java.util.TreeSet<java.lang.String> wordSet) |
static WordSet |
constructSet(java.util.TreeSet<java.lang.String> wordSet) |
static boolean |
contains(char[] data,
char[] str,
int start,
int end) |
boolean |
contains(char[] buf,
int start,
int end) |
static boolean |
contains(char[] data,
java.lang.String str) |
boolean |
contains(java.lang.String str) |
static final char CHAR_NULL
static final int NEGATIVE_OFFSET
static final int MIN_BINARY_SEARCH
final char[] mData
public static WordSet constructSet(java.util.TreeSet<java.lang.String> wordSet)
public static char[] constructRaw(java.util.TreeSet<java.lang.String> wordSet)
public boolean contains(char[] buf, int start, int end)
public static boolean contains(char[] data, char[] str, int start, int end)
public boolean contains(java.lang.String str)
public static boolean contains(char[] data, java.lang.String str)