Package edu.berkeley.nlp.lm
Class StringWordIndexer
- java.lang.Object
-
- edu.berkeley.nlp.lm.StringWordIndexer
-
- All Implemented Interfaces:
WordIndexer<java.lang.String>,java.io.Serializable
public class StringWordIndexer extends java.lang.Object implements WordIndexer<java.lang.String>
Implementation of a WordIndexer in which words are represented as strings.- Author:
- adampauls
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.WordIndexer
WordIndexer.StaticMethods
-
-
Constructor Summary
Constructors Constructor Description StringWordIndexer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringgetEndSymbol()Returns the start symbol (usually something like </s>intgetIndexPossiblyUnk(java.lang.String word)Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.intgetOrAddIndex(java.lang.String word)Gets the index for a word, adding if necessary.intgetOrAddIndexFromString(java.lang.String word)java.lang.StringgetStartSymbol()Returns the start symbol (usually something like <s>java.lang.StringgetUnkSymbol()Returns the unk symbol (usually something like <unk>java.lang.StringgetWord(int index)Gets the word object for an index.intnumWords()Number of words that have been added so farvoidsetEndSymbol(java.lang.String sym)voidsetStartSymbol(java.lang.String sym)voidsetUnkSymbol(java.lang.String sym)voidtrimAndLock()Informs the implementation that no more words can be added to the vocabulary.
-
-
-
Method Detail
-
getOrAddIndex
public int getOrAddIndex(java.lang.String word)
Description copied from interface:WordIndexerGets the index for a word, adding if necessary.- Specified by:
getOrAddIndexin interfaceWordIndexer<java.lang.String>- Returns:
-
getWord
public java.lang.String getWord(int index)
Description copied from interface:WordIndexerGets the word object for an index.- Specified by:
getWordin interfaceWordIndexer<java.lang.String>- Returns:
-
numWords
public int numWords()
Description copied from interface:WordIndexerNumber of words that have been added so far- Specified by:
numWordsin interfaceWordIndexer<java.lang.String>- Returns:
-
getStartSymbol
public java.lang.String getStartSymbol()
Description copied from interface:WordIndexerReturns the start symbol (usually something like <s>- Specified by:
getStartSymbolin interfaceWordIndexer<java.lang.String>- Returns:
-
getEndSymbol
public java.lang.String getEndSymbol()
Description copied from interface:WordIndexerReturns the start symbol (usually something like </s>- Specified by:
getEndSymbolin interfaceWordIndexer<java.lang.String>- Returns:
-
getUnkSymbol
public java.lang.String getUnkSymbol()
Description copied from interface:WordIndexerReturns the unk symbol (usually something like <unk>- Specified by:
getUnkSymbolin interfaceWordIndexer<java.lang.String>- Returns:
-
getOrAddIndexFromString
public int getOrAddIndexFromString(java.lang.String word)
- Specified by:
getOrAddIndexFromStringin interfaceWordIndexer<java.lang.String>
-
setStartSymbol
public void setStartSymbol(java.lang.String sym)
- Specified by:
setStartSymbolin interfaceWordIndexer<java.lang.String>
-
setEndSymbol
public void setEndSymbol(java.lang.String sym)
- Specified by:
setEndSymbolin interfaceWordIndexer<java.lang.String>
-
setUnkSymbol
public void setUnkSymbol(java.lang.String sym)
- Specified by:
setUnkSymbolin interfaceWordIndexer<java.lang.String>
-
trimAndLock
public void trimAndLock()
Description copied from interface:WordIndexerInforms the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.- Specified by:
trimAndLockin interfaceWordIndexer<java.lang.String>
-
getIndexPossiblyUnk
public int getIndexPossiblyUnk(java.lang.String word)
Description copied from interface:WordIndexerShould never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.- Specified by:
getIndexPossiblyUnkin interfaceWordIndexer<java.lang.String>- Returns:
-
-