Aggregations¶
Below is a list of valid aggregations which can be included in a formula.
The arguments to the aggregation are column names, e.g. amount
, or formulas
themselves, e.g. risk_factor in ["low_risk"]
.
argmax(formula)
¶
Calculate the row index at which the value of formula occurs. If used with a group by the index is relative to the ungrouped dataframe.
argmax(submit_date)
count([formula])
¶
Calculate the number of rows in the dataset, or if a formula is passed, the number of rows in which the formula is true.
count()
count(risk_factor in ["low_risk"])
max(formula)
¶
max(amount)
min(formula)
¶
min(amount)
mean(formula)
¶
mean(amount)
median(formula)
¶
median(amount)
newest(index_formula, value_formula)
¶
Calculate the row with the newest (maximum) value of index_formula
(internally using argmax) and return the value of the value_formula
for
that row.
Given \(n\) is the number of rows, \(x\) is a vector of the calculated index formula, and \(y\) is a vector of the calculated value formula, this is equivalent to:
newest(submit_date, amount)
pearson(formula1, formula2)
¶
Calculate the Pearson correlation coefficient and p-value for two columns defined by formula1 and formula2. The p-value is stored in a column named after the original name with the suffix “_pvalue”.
For example, we may expect that the number of teachers is correlated with the number of students:
pearson(num_teachers, num_students)
ratio(numerator_formula, denominator_formula)
¶
Calculate the ratio of the sum of values in the numerator divided by the sum of values in the denominator, where any rows containing a missing value in the numerator or denominator, or having a denominator of zero, are ignored. Given \(n\) is the number of rows, \(x\) is a vector of the calculated numerator, and \(y\) is a vector of the calculated denominator, this is equivalent to:
ratio(amount, number_of_guests)
ratio(risk_factor in ["low_risk"], risk_factor in ["low_risk", "medium_risk"])
ratio(risk_factor in ["low_risk"], 1)
sum(formula)
¶
sum(amount)
sum(risk_factor in ["low_risk"])