pyspark.sql.functions.theta_sketch_agg#

pyspark.sql.functions.theta_sketch_agg(col, lgNomEntries=None)[source]#

Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch with the values in the input column configured with lgNomEntries nominal entries.

New in version 4.1.0.

Parameters
colColumn or column name
lgNomEntriesColumn or int, optional

The log-base-2 of nominal entries, where nominal entries is the size of the sketch (must be between 4 and 26, defaults to 12)

Returns
Column

The binary representation of the ThetaSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([1,2,2,3], "INT")
>>> df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value"))).show()
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
|                                                 3|
+--------------------------------------------------+
>>> df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value", 15))).show()
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 15))|
+--------------------------------------------------+
|                                                 3|
+--------------------------------------------------+