final class DataFrameStatFunctions extends Logging
Provides eagerly computed statistical functions for DataFrames.
To access an object of this class, use DataFrame.stat.
- Since
0.2.0
- Alphabetic
- By Inheritance
- DataFrameStatFunctions
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
action[T](funcName: String)(func: ⇒ T): T
- Attributes
- protected
- Annotations
- @inline()
-
def
approxQuantile(cols: Array[String], percentile: Array[Double]): Array[Array[Option[Double]]]
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example,
result(0)(1)contains the approximate value for columncols(0)at quantilepercentile(1).This function uses the t-Digest algorithm.
For example, the following code:
import session.implicits._ val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b") val res = double2.stat.approxQuantile(Array("a", "b"), Array(0, 0.1, 0.6))
prints out the following result:
res: Array(Array(Some(0.05), Some(0.15000000000000002), Some(0.25)), Array(Some(0.45), Some(0.55), Some(0.6499999999999999)))
- cols
An array of column names.
- percentile
An array of double values greater than or equal to 0.0 and less than 1.0.
- returns
A matrix with the dimensions
(cols.size * percentile.size)containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.
- Since
0.2.0
-
def
approxQuantile(col: String, percentile: Array[Double]): Array[Option[Double]]
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.
This function uses the t-Digest algorithm.
For example, the following code:
import session.implicits._ val df = Seq(1, 2, 3, 4, 5, 6, 7, 8, 9, 0).toDF("a") val res = df.stat.approxQuantile("a", Array(0, 0.1, 0.4, 0.6, 1))
prints out the following result:
res: Array(Some(-0.5), Some(0.5), Some(3.5), Some(5.5), Some(9.5))
- col
The name of the numeric column.
- percentile
An array of double values greater than or equal to 0.0 and less than 1.0.
- returns
An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.
- Since
0.2.0
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
corr(col1: String, col2: String): Option[Double]
Calculates the correlation coefficient for non-null pairs in two numeric columns.
Calculates the correlation coefficient for non-null pairs in two numeric columns.
For example, the following code:
import session.implicits._ val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b") double res = df.stat.corr("a", "b").get
prints out the following result:
res: 0.9999999999999991- col1
The name of the first numeric column to use.
- col2
The name of the second numeric column to use.
- returns
The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.
- Since
0.2.0
-
def
cov(col1: String, col2: String): Option[Double]
Calculates the sample covariance for non-null pairs in two numeric columns.
Calculates the sample covariance for non-null pairs in two numeric columns.
For example, the following code:
import session.implicits._ val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b") double res = df.stat.cov("a", "b").get
prints out the following result:
res: 0.010000000000000037- col1
The name of the first numeric column to use.
- col2
The name of the second numeric column to use.
- returns
The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.
- Since
0.2.0
-
def
crosstab(col1: String, col2: String): DataFrame
Computes a pair-wise frequency table (a contingency table) for the specified columns.
Computes a pair-wise frequency table (a contingency table) for the specified columns. The method returns a DataFrame containing this table.
In the returned contingency table:
- The first column of each row contains the distinct values of
col1. - The name of the first column is the name of
col1. - The rest of the column names are the distinct values of
col2. - The counts are returned as Longs.
- For pairs that have no occurrences, the contingency table contains 0 as the count.
Note: The number of distinct values in
col2should not exceed 1000.For example, the following code:
import session.implicits._ val df = Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)).toDF("key", "value") val ct = df.stat.crosstab("key", "value") ct.show()
prints out the following result:
--------------------------------------------------------------------------------------------- |"KEY" |"CAST(1 AS NUMBER(38,0))" |"CAST(2 AS NUMBER(38,0))" |"CAST(3 AS NUMBER(38,0))" | --------------------------------------------------------------------------------------------- |1 |1 |1 |0 | |2 |2 |0 |1 | |3 |0 |1 |1 | ---------------------------------------------------------------------------------------------
- col1
The name of the first column to use.
- col2
The name of the second column to use.
- returns
A DataFrame containing the contingency table.
- Since
0.2.0
- The first column of each row contains the distinct values of
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
log(): Logger
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logDebug(msg: String, throwable: Throwable): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logDebug(msg: String): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logError(msg: String, throwable: Throwable): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logError(msg: String): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logInfo(msg: String, throwable: Throwable): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logInfo(msg: String): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logTrace(msg: String, throwable: Throwable): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logTrace(msg: String): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logWarning(msg: String, throwable: Throwable): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
def
logWarning(msg: String): Unit
- Attributes
- protected[internal]
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
sampleBy[T](col: String, fractions: Map[T, Double]): DataFrame
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
For example, the following code:
import session.implicits._ val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age") val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0) df.stat.sampleBy("name", fractions).show()
prints out the following result:
------------------ |"NAME" |"AGE" | ------------------ |Bob |17 | |Nico |8 | ------------------
- T
The type of the stratum.
- col
The name of the column that defines the strata.
- fractions
A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.
- returns
A new DataFrame that contains the stratified sample.
- Since
0.2.0
-
def
sampleBy[T](col: Column, fractions: Map[T, Double]): DataFrame
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
For example, the following code:
import session.implicits._ val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age") val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0) df.stat.sampleBy(col("name"), fractions).show()
prints out the following result:
------------------ |"NAME" |"AGE" | ------------------ |Bob |17 | |Nico |8 | ------------------
- T
The type of the stratum.
- col
An expression for the column that defines the strata.
- fractions
A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.
- returns
A new DataFrame that contains the stratified sample.
- Since
0.2.0
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transformation(funcName: String)(func: ⇒ DataFrame): DataFrame
- Attributes
- protected
- Annotations
- @inline()
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated