pyspark.sql.functions.theta_sketch_agg#
- pyspark.sql.functions.theta_sketch_agg(col, lgNomEntries=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch with the values in the input column configured with lgNomEntries nominal entries.
New in version 4.1.0.
- Parameters
- Returns
ColumnThe binary representation of the ThetaSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([1,2,2,3], "INT") >>> df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value"))).show() +--------------------------------------------------+ |theta_sketch_estimate(theta_sketch_agg(value, 12))| +--------------------------------------------------+ | 3| +--------------------------------------------------+
>>> df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value", 15))).show() +--------------------------------------------------+ |theta_sketch_estimate(theta_sketch_agg(value, 15))| +--------------------------------------------------+ | 3| +--------------------------------------------------+