public static class VectorIndexer.CategoryStats
extends java.lang.Object
implements scala.Serializable
TODO: Track which features are known to be continuous already; do not update counts for them.
param: numFeatures This class fails if it encounters a Vector whose length is not numFeatures. param: maxCategories This class caps the number of unique values collected at maxCategories.
Constructor and Description |
---|
VectorIndexer.CategoryStats(int numFeatures,
int maxCategories) |
Modifier and Type | Method and Description |
---|---|
void |
addVector(Vector v)
Add a new vector to this index, updating sets of unique feature values
|
scala.collection.immutable.Map<java.lang.Object,scala.collection.immutable.Map<java.lang.Object,java.lang.Object>> |
getCategoryMaps()
Based on stats collected, decide which features are categorical,
and choose indices for categories.
|
VectorIndexer.CategoryStats |
merge(VectorIndexer.CategoryStats other)
Merge with another instance, modifying this instance.
|
public VectorIndexer.CategoryStats(int numFeatures, int maxCategories)
public VectorIndexer.CategoryStats merge(VectorIndexer.CategoryStats other)
public void addVector(Vector v)
public scala.collection.immutable.Map<java.lang.Object,scala.collection.immutable.Map<java.lang.Object,java.lang.Object>> getCategoryMaps()
Sparsity: This tries to maintain sparsity by treating value 0.0 specially. If a categorical feature takes value 0.0, then value 0.0 is given index 0.