public final class DataFrameStatFunctions
extends Object
DataFrame
s.
Modifier and Type | Method and Description |
---|---|
double |
corr(String col1,
String col2)
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
|
double |
corr(String col1,
String col2,
String method)
Calculates the correlation of two columns of a DataFrame.
|
double |
cov(String col1,
String col2)
Calculate the sample covariance of two numerical columns of a DataFrame.
|
DataFrame |
crosstab(String col1,
String col2)
Computes a pair-wise frequency table of the given columns.
|
DataFrame |
freqItems(scala.collection.Seq<String> cols)
(Scala-specific) Finding frequent items for columns, possibly with false positives.
|
DataFrame |
freqItems(scala.collection.Seq<String> cols,
double support)
(Scala-specific) Finding frequent items for columns, possibly with false positives.
|
DataFrame |
freqItems(String[] cols)
Finding frequent items for columns, possibly with false positives.
|
DataFrame |
freqItems(String[] cols,
double support)
Finding frequent items for columns, possibly with false positives.
|
public double cov(String col1, String col2)
col1
- the name of the first columncol2
- the name of the second columnpublic double corr(String col1, String col2, String method)
col1
- the name of the columncol2
- the name of the column to calculate the correlation againstmethod
- (undocumented)public double corr(String col1, String col2)
col1
- the name of the columncol2
- the name of the column to calculate the correlation againstpublic DataFrame crosstab(String col1, String col2)
col1
and the column names will
be the distinct values of col2
. The name of the first column will be $col1_$col2
. Counts
will be returned as Long
s. Pairs that have no occurrences will have null
as their counts.
col1
- The name of the first column. Distinct items will make the first item of
each row.col2
- The name of the second column. Distinct items will make the column names
of the DataFrame.public DataFrame freqItems(String[] cols, double support)
http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou
.
The support
should be greater than 1e-4.
This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame
.
cols
- the names of the columns to search frequent items in.support
- The minimum frequency for an item to be considered frequent
. Should be greater
than 1e-4.public DataFrame freqItems(String[] cols)
http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou
.
Uses a default
support of 1%.
This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame
.
cols
- the names of the columns to search frequent items in.public DataFrame freqItems(scala.collection.Seq<String> cols, double support)
http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou
.
This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame
.
cols
- the names of the columns to search frequent items in.support
- (undocumented)public DataFrame freqItems(scala.collection.Seq<String> cols)
http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou
.
Uses a default
support of 1%.
This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame
.
cols
- the names of the columns to search frequent items in.