Aggregations¶
Below is a list of valid aggregations which can be included in a formula.
The arguments to the aggregation are column names, e.g. amount, or formulas
themselves, e.g. risk_factor in ["low_risk"].
argmax(formula)¶
Calculate the row index at which the value of formula occurs. If used with a group by the index is relative to the ungrouped dataframe.
argmax(submit_date)
count([formula])¶
Calculate the number of rows in the dataset, or if a formula is passed, the number of rows in which the formula is true.
count()
count(risk_factor in ["low_risk"])
max(formula)¶
max(amount)
min(formula)¶
min(amount)
mean(formula)¶
mean(amount)
median(formula)¶
median(amount)
newest(index_formula, value_formula)¶
Calculate the row with the newest (maximum) value of index_formula
(internally using argmax) and return the value of the value_formula for
that row.
Given \(n\) is the number of rows, \(x\) is a vector of the calculated index formula, and \(y\) is a vector of the calculated value formula, this is equivalent to:
newest(submit_date, amount)
pearson(formula1, formula2)¶
Calculate the Pearson correlation coefficient and p-value for two columns defined by formula1 and formula2. The p-value is stored in a column named after the original name with the suffix “_pvalue”.
For example, we may expect that the number of teachers is correlated with the number of students:
pearson(num_teachers, num_students)
ratio(numerator_formula, denominator_formula)¶
Calculate the ratio of the sum of values in the numerator divided by the sum of values in the denominator, where any rows containing a missing value in the numerator or denominator, or having a denominator of zero, are ignored. Given \(n\) is the number of rows, \(x\) is a vector of the calculated numerator, and \(y\) is a vector of the calculated denominator, this is equivalent to:
ratio(amount, number_of_guests)
ratio(risk_factor in ["low_risk"], risk_factor in ["low_risk", "medium_risk"])
ratio(risk_factor in ["low_risk"], 1)
sum(formula)¶
sum(amount)
sum(risk_factor in ["low_risk"])