Package edu.berkeley.nlp.lm.map
Class HashNgramMap<T>
- java.lang.Object
-
- edu.berkeley.nlp.lm.map.AbstractNgramMap<T>
-
- edu.berkeley.nlp.lm.map.HashNgramMap<T>
-
- Type Parameters:
T-
- All Implemented Interfaces:
ContextEncodedNgramMap<T>,NgramMap<T>,java.io.Serializable
public final class HashNgramMap<T> extends AbstractNgramMap<T> implements ContextEncodedNgramMap<T>
- Author:
- adampauls
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.map.NgramMap
NgramMap.Entry<T>
-
-
Field Summary
-
Fields inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
NUM_BITS_PER_BYTE, NUM_SUFFIX_BITS, NUM_WORD_BITS, opts, SUFFIX_BIT_MASK, values, WORD_BIT_MASK
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclearStorage()booleancontains(int[] ngram, int startPos, int endPos)static <T> HashNgramMap<T>createExplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed)Note: Explicit HashNgramMap can grow beyond maxNgramOrderstatic <T> HashNgramMap<T>createImplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed)Tget(int[] ngram, int startPos, int endPos)intgetFirstWordForOffset(long offset, int ngramOrder)intgetLastWordForOffset(long offset, int ngramOrder)intgetMaxNgramOrder()longgetNextContextOffset(long offset, int ngramOrder)intgetNextWord(long offset, int ngramOrder)int[]getNgramForOffset(long offset, int ngramOrder)int[]getNgramForOffset(long offset, int ngramOrder, int[] ret)int[]getNgramFromContextEncoding(long contextOffset, int contextOrder, int word)java.lang.Iterable<java.lang.Long>getNgramOffsetsForOrder(int ngramOrder)java.lang.Iterable<NgramMap.Entry<T>>getNgramsForOrder(int ngramOrder)longgetNumNgrams(int ngramOrder)longgetOffset(long contextOffset, int contextOrder, int word)ContextEncodedNgramLanguageModel.LmContextInfogetOffsetForNgram(int[] ngram, int startPos, int endPos)longgetOffsetForNgramInModel(int[] ngram, int startPos, int endPos)LikegetOffsetForNgram(int[], int, int), but assumes that the full n-gram is in the map (i.e.longgetPrefixOffset(long offset, int ngramOrder)Gets the offset of the context for an n-gram (represented by offset)longgetTotalSize()longgetValueAndOffset(long contextOffset, int contextOrder, int word, T outputVal)CustomWidthArraygetValueStoringArray(int ngramOrder)voidhandleNgramsFinished(int justFinishedOrder)voidinitWithLengths(java.util.List<java.lang.Long> numNGrams)booleanisReversed()longput(int[] ngram, int startPos, int endPos, T val)longputWithOffset(int[] ngram, int startPos, int endPos, long contextOffset, T val)Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.longputWithOffsetAndSuffix(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val)Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.voidrehashIfNecessary(int num)voidtrim()booleanwordHasBigrams(int word)-
Methods inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
combineToKey, containsOutOfVocab, contextOffsetOf, equals, getSubArray, getValues, wordOf
-
-
-
-
Method Detail
-
createImplicitWordHashNgramMap
public static <T> HashNgramMap<T> createImplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed)
-
createExplicitWordHashNgramMap
public static <T> HashNgramMap<T> createExplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed)
Note: Explicit HashNgramMap can grow beyond maxNgramOrder- Type Parameters:
T-- Parameters:
values-opts-maxNgramOrder-reversed-- Returns:
-
put
public long put(int[] ngram, int startPos, int endPos, T val)
-
putWithOffset
public long putWithOffset(int[] ngram, int startPos, int endPos, long contextOffset, T val)Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram-startPos-endPos-contextOffset-val-- Returns:
-
putWithOffsetAndSuffix
public long putWithOffsetAndSuffix(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val)Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram-startPos-endPos-contextOffset-val-- Returns:
-
rehashIfNecessary
public void rehashIfNecessary(int num)
-
getValueAndOffset
public long getValueAndOffset(long contextOffset, int contextOrder, int word, T outputVal)- Specified by:
getValueAndOffsetin interfaceNgramMap<T>
-
getOffset
public long getOffset(long contextOffset, int contextOrder, int word)- Specified by:
getOffsetin interfaceContextEncodedNgramMap<T>
-
getNgramFromContextEncoding
public int[] getNgramFromContextEncoding(long contextOffset, int contextOrder, int word)- Specified by:
getNgramFromContextEncodingin interfaceContextEncodedNgramMap<T>
-
getNextWord
public int getNextWord(long offset, int ngramOrder)
-
getNextContextOffset
public long getNextContextOffset(long offset, int ngramOrder)
-
getFirstWordForOffset
public int getFirstWordForOffset(long offset, int ngramOrder)
-
getLastWordForOffset
public int getLastWordForOffset(long offset, int ngramOrder)
-
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder)
-
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder, int[] ret)
-
getOffsetForNgram
public ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos)
- Specified by:
getOffsetForNgramin interfaceContextEncodedNgramMap<T>
-
getOffsetForNgramInModel
public long getOffsetForNgramInModel(int[] ngram, int startPos, int endPos)LikegetOffsetForNgram(int[], int, int), but assumes that the full n-gram is in the map (i.e. does not back off to the largest suffix which is in the model).- Parameters:
ngram-startPos-endPos-- Returns:
-
handleNgramsFinished
public void handleNgramsFinished(int justFinishedOrder)
- Specified by:
handleNgramsFinishedin interfaceNgramMap<T>
-
initWithLengths
public void initWithLengths(java.util.List<java.lang.Long> numNGrams)
- Specified by:
initWithLengthsin interfaceNgramMap<T>
-
getPrefixOffset
public long getPrefixOffset(long offset, int ngramOrder)Gets the offset of the context for an n-gram (represented by offset)- Parameters:
offset-- Returns:
-
getMaxNgramOrder
public int getMaxNgramOrder()
- Specified by:
getMaxNgramOrderin interfaceNgramMap<T>
-
getNumNgrams
public long getNumNgrams(int ngramOrder)
- Specified by:
getNumNgramsin interfaceNgramMap<T>
-
getNgramsForOrder
public java.lang.Iterable<NgramMap.Entry<T>> getNgramsForOrder(int ngramOrder)
- Specified by:
getNgramsForOrderin interfaceNgramMap<T>
-
getNgramOffsetsForOrder
public java.lang.Iterable<java.lang.Long> getNgramOffsetsForOrder(int ngramOrder)
-
isReversed
public boolean isReversed()
-
wordHasBigrams
public boolean wordHasBigrams(int word)
- Specified by:
wordHasBigramsin interfaceContextEncodedNgramMap<T>
-
contains
public boolean contains(int[] ngram, int startPos, int endPos)
-
getTotalSize
public long getTotalSize()
-
getValueStoringArray
public CustomWidthArray getValueStoringArray(int ngramOrder)
- Specified by:
getValueStoringArrayin interfaceNgramMap<T>
-
clearStorage
public void clearStorage()
- Specified by:
clearStoragein interfaceNgramMap<T>
-
-