Aggregations¶

Below is a list of valid aggregations which can be included in a formula.

The arguments to the aggregation are column names, e.g. amount, or formulas themselves, e.g. risk_factor in ["low_risk"].

`argmax(formula)`¶

Calculate the row index at which the value of formula occurs. If used with a group by the index is relative to the ungrouped dataframe.

argmax(submit_date)

`count([formula])`¶

Calculate the number of rows in the dataset, or if a formula is passed, the number of rows in which the formula is true.

count()
count(risk_factor in ["low_risk"])

`max(formula)`¶

max(amount)

`min(formula)`¶

min(amount)

`mean(formula)`¶

mean(amount)

`median(formula)`¶

median(amount)

`newest(index_formula, value_formula)`¶

Calculate the row with the newest (maximum) value of index_formula (internally using argmax) and return the value of the value_formula for that row.

Given $n$ is the number of rows, $x$ is a vector of the calculated index formula, and $y$ is a vector of the calculated value formula, this is equivalent to:

\[newest(x, y) = y_{{\operatorname{argmax}}(x)}\]

newest(submit_date, amount)

`pearson(formula1, formula2)`¶

Calculate the Pearson correlation coefficient and p-value for two columns defined by formula1 and formula2. The p-value is stored in a column named after the original name with the suffix “_pvalue”.

For example, we may expect that the number of teachers is correlated with the number of students:

pearson(num_teachers, num_students)

`ratio(numerator_formula, denominator_formula)`¶

Calculate the ratio of the sum of values in the numerator divided by the sum of values in the denominator, where any rows containing a missing value in the numerator or denominator, or having a denominator of zero, are ignored. Given $n$ is the number of rows, $x$ is a vector of the calculated numerator, and $y$ is a vector of the calculated denominator, this is equivalent to:

\[\begin{split}ratio(x, y) = \sum_{i=1}^n \left\{ \begin{array}{l l} 0 & \quad \text{if $x_i$ or $y_i$ is NaN, or $y_i$ = 0}\\ \frac{x_i}{y_i} & \quad \text{otherwise} \end{array} \right.\end{split}\]

ratio(amount, number_of_guests)
ratio(risk_factor in ["low_risk"], risk_factor in ["low_risk", "medium_risk"])
ratio(risk_factor in ["low_risk"], 1)

`std(formula)`¶

Standard deviation:

std(amount)

`sum(formula)`¶

sum(amount)
sum(risk_factor in ["low_risk"])

`var(formula)`¶

Variance:

var(amount)

Aggregations¶

`argmax(formula)`¶

`count([formula])`¶

`max(formula)`¶

`min(formula)`¶

`mean(formula)`¶

`median(formula)`¶

`newest(index_formula, value_formula)`¶

`pearson(formula1, formula2)`¶

`ratio(numerator_formula, denominator_formula)`¶

`std(formula)`¶

`sum(formula)`¶

`var(formula)`¶

Table Of Contents

Previous topic

Next topic

This Page

Aggregations¶

argmax(formula)¶

count([formula])¶

max(formula)¶

min(formula)¶

mean(formula)¶

median(formula)¶

newest(index_formula, value_formula)¶

pearson(formula1, formula2)¶

ratio(numerator_formula, denominator_formula)¶

std(formula)¶

sum(formula)¶

var(formula)¶

`argmax(formula)`¶

`count([formula])`¶

`max(formula)`¶

`min(formula)`¶

`mean(formula)`¶

`median(formula)`¶

`newest(index_formula, value_formula)`¶

`pearson(formula1, formula2)`¶

`ratio(numerator_formula, denominator_formula)`¶

`std(formula)`¶

`sum(formula)`¶

`var(formula)`¶