diff options
author | Davies Liu <davies@databricks.com> | 2014-11-20 15:31:28 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-11-20 15:31:28 -0800 |
commit | 1c53a5db993193122bfa79574d2540149fe2cc08 (patch) | |
tree | 68efb0022ee91fc29450473eede8ac1ce9bccb39 /python/docs | |
parent | b8e6886fb8ff8f667fb7e600cd727d8649cad1d1 (diff) | |
download | spark-1c53a5db993193122bfa79574d2540149fe2cc08.tar.gz spark-1c53a5db993193122bfa79574d2540149fe2cc08.tar.bz2 spark-1c53a5db993193122bfa79574d2540149fe2cc08.zip |
[SPARK-4439] [MLlib] add python api for random forest
```
class RandomForestModel
| A model trained by RandomForest
|
| numTrees(self)
| Get number of trees in forest.
|
| predict(self, x)
| Predict values for a single data point or an RDD of points using the model trained.
|
| toDebugString(self)
| Full model
|
| totalNumNodes(self)
| Get total number of nodes, summed over all trees in the forest.
|
class RandomForest
| trainClassifier(cls, data, numClassesForClassification, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='gini', maxDepth=4, maxBins=32, seed=None):
| Method to train a decision tree model for binary or multiclass classification.
|
| :param data: Training dataset: RDD of LabeledPoint.
| Labels should take values {0, 1, ..., numClasses-1}.
| :param numClassesForClassification: number of classes for classification.
| :param categoricalFeaturesInfo: Map storing arity of categorical features.
| E.g., an entry (n -> k) indicates that feature n is categorical
| with k categories indexed from 0: {0, 1, ..., k-1}.
| :param numTrees: Number of trees in the random forest.
| :param featureSubsetStrategy: Number of features to consider for splits at each node.
| Supported: "auto" (default), "all", "sqrt", "log2", "onethird".
| If "auto" is set, this parameter is set based on numTrees:
| if numTrees == 1, set to "all";
| if numTrees > 1 (forest) set to "sqrt".
| :param impurity: Criterion used for information gain calculation.
| Supported values: "gini" (recommended) or "entropy".
| :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means
| 1 internal node + 2 leaf nodes. (default: 4)
| :param maxBins: maximum number of bins used for splitting features (default: 100)
| :param seed: Random seed for bootstrapping and choosing feature subsets.
| :return: RandomForestModel that can be used for prediction
|
| trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='variance', maxDepth=4, maxBins=32, seed=None):
| Method to train a decision tree model for regression.
|
| :param data: Training dataset: RDD of LabeledPoint.
| Labels are real numbers.
| :param categoricalFeaturesInfo: Map storing arity of categorical features.
| E.g., an entry (n -> k) indicates that feature n is categorical
| with k categories indexed from 0: {0, 1, ..., k-1}.
| :param numTrees: Number of trees in the random forest.
| :param featureSubsetStrategy: Number of features to consider for splits at each node.
| Supported: "auto" (default), "all", "sqrt", "log2", "onethird".
| If "auto" is set, this parameter is set based on numTrees:
| if numTrees == 1, set to "all";
| if numTrees > 1 (forest) set to "onethird".
| :param impurity: Criterion used for information gain calculation.
| Supported values: "variance".
| :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means
| 1 internal node + 2 leaf nodes.(default: 4)
| :param maxBins: maximum number of bins used for splitting features (default: 100)
| :param seed: Random seed for bootstrapping and choosing feature subsets.
| :return: RandomForestModel that can be used for prediction
|
```
Author: Davies Liu <davies@databricks.com>
Closes #3320 from davies/forest and squashes the following commits:
8003dfc [Davies Liu] reorder
53cf510 [Davies Liu] fix docs
4ca593d [Davies Liu] fix docs
e0df852 [Davies Liu] fix docs
0431746 [Davies Liu] rebased
2b6f239 [Davies Liu] Merge branch 'master' of github.com:apache/spark into forest
885abee [Davies Liu] address comments
dae7fc0 [Davies Liu] address comments
89a000f [Davies Liu] fix docs
565d476 [Davies Liu] add python api for random forest
Diffstat (limited to 'python/docs')
-rw-r--r-- | python/docs/epytext.py | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/python/docs/epytext.py b/python/docs/epytext.py index 19fefbfc05..e884d5e6b1 100644 --- a/python/docs/epytext.py +++ b/python/docs/epytext.py @@ -1,7 +1,7 @@ import re RULES = ( - (r"<[\w.]+>", r""), + (r"<(!BLANKLINE)[\w.]+>", r""), (r"L{([\w.()]+)}", r":class:`\1`"), (r"[LC]{(\w+\.\w+)\(\)}", r":func:`\1`"), (r"C{([\w.()]+)}", r":class:`\1`"), |