Feature Transformation -- QuantileDiscretizer

Arguments
Details
See also

Takes a column with continuous features and outputs a column with binned categorical features. The bin ranges are chosen by taking a sample of the data and dividing it into roughly equal parts. The lower and upper bin bounds will be -Infinity and +Infinity, covering all real values. This attempts to find numBuckets partitions based on a sample of the given input data, but it may find fewer depending on the data sample values.

ft_quantile_discretizer(x, input.col, output.col, n.buckets = 5L, ...)

Arguments

x	An object (usually a `spark_tbl`) coercable to a Spark DataFrame.
input.col	The name of the input column(s).
output.col	The name of the output column.
n.buckets	The number of buckets to use.
...	Optional arguments; currently unused.

Details

Note that the result may be different every time you run it, since the sample strategy behind it is non-deterministic.

Feature Transformation -- QuantileDiscretizer

Arguments

Details

See also