diff options
author | coderxiang <shuoxiangpub@gmail.com> | 2014-10-21 15:45:47 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-10-21 15:45:47 -0700 |
commit | 814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2 (patch) | |
tree | d5c7cf40a503e2d1e6950ebd3e2889d75acf043a /sql | |
parent | 5fdaf52a9df21cac69e2a4612aeb4e760e4424e7 (diff) | |
download | spark-814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2.tar.gz spark-814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2.tar.bz2 spark-814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2.zip |
SPARK-3568 [mllib] add ranking metrics
Add common metrics for ranking algorithms (http://www-nlp.stanford.edu/IR-book/), including:
- Mean Average Precision
- Precisionn: top-n precision
- Discounted cumulative gain (DCG) and NDCG
The following methods and the corresponding tests are implemented:
```
class RankingMetrics[T](predictionAndLabels: RDD[(Array[T], Array[T])]) {
/* Returns the precsionk for each query */
lazy val precAtK: RDD[Array[Double]]
/**
* param k the position to compute the truncated precision
* return the average precision at the first k ranking positions
*/
def precision(k: Int): Double
/* Returns the average precision for each query */
lazy val avePrec: RDD[Double]
/*Returns the mean average precision (MAP) of all the queries*/
lazy val meanAvePrec: Double
/*Returns the normalized discounted cumulative gain for each query */
lazy val ndcgAtK: RDD[Array[Double]]
/**
* param k the position to compute the truncated ndcg
* return the average ndcg at the first k ranking positions
*/
def ndcg(k: Int): Double
}
```
Author: coderxiang <shuoxiangpub@gmail.com>
Closes #2667 from coderxiang/rankingmetrics and squashes the following commits:
d881097 [coderxiang] update doc
14d9cd9 [coderxiang] remove unexpected files
d7fb93f [coderxiang] style change and remove ignored files
f113ee1 [coderxiang] modify doc for displaying superscript and subscript
f626896 [coderxiang] improve doc and remove unnecessary computation while labSet is empty
be6645e [coderxiang] set the precision of empty labset to 0.0
d64c120 [coderxiang] add logWarning for empty ground truth set
dfae292 [coderxiang] handle empty labSet for map. add test
62047c4 [coderxiang] style change and add documentation
f66612d [coderxiang] add additional test of precisionAt
b794cb2 [coderxiang] move private members precAtK, ndcgAtK into public methods. style change
77c9e5d [coderxiang] set precAtK and ndcgAtK as private member. Improve documentation
5f87bce [coderxiang] add API to calculate precision and ndcg at each ranking position
b7851cc [coderxiang] Use generic type to represent IDs
e443fee [coderxiang] change style and use alternative builtin methods
3a5a6ff [coderxiang] add ranking metrics
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions