Core¶
Aggregation¶
-
class
bamboo.core.aggregations.
Aggregation
(name, groups, dframe)[source]¶ Abstract class for all aggregations.
Parameters: - column – Column to aggregate.
- columns – List of columns to aggregate.
- formula_name – The string to refer to this aggregation.
-
class
bamboo.core.aggregations.
ArgMaxAggregation
(name, groups, dframe)[source]¶ Return the index for the maximum of a column.
Written as
argmax(FORMULA)
. Where FORMULA is a valid formula.
-
class
bamboo.core.aggregations.
CountAggregation
(name, groups, dframe)[source]¶ Calculate the count of rows fulfilling the criteria in the formula.
N/A values are ignored unless there are no arguments to the function, in which case it returns the number of rows in the dataset.
Written as
count(CRITERIA)
. Where CRITERIA is an optional boolean expression that signifies which rows are to be counted.
-
class
bamboo.core.aggregations.
MaxAggregation
(name, groups, dframe)[source]¶ Calculate the maximum.
Written as
max(FORMULA)
. Where FORMULA is a valid formula.
-
class
bamboo.core.aggregations.
MeanAggregation
(name, groups, dframe)[source]¶ Calculate the arithmetic mean.
Written as
mean(FORMULA)
. Where FORMULA is a valid formula.Because mean is a ratio this inherits from RatioAggregation to use its generic reduce implementation.
-
class
bamboo.core.aggregations.
MedianAggregation
(name, groups, dframe)[source]¶ Calculate the median. Written as
median(FORMULA)
.Where FORMULA is a valid formula.
-
class
bamboo.core.aggregations.
MinAggregation
(name, groups, dframe)[source]¶ Calculate the minimum.
Written as
min(FORMULA)
. Where FORMULA is a valid formula.
-
class
bamboo.core.aggregations.
NewestAggregation
(name, groups, dframe)[source]¶ Return the second column’s value at the newest row in the first column.
Find the maximum value for the first column and return the entry at that row from the second column.
Written as
newest(INDEX_FORMULA, VALUE_FORMULA)
whereINDEX_FORMULA
andVALUE_FORMULA
are valid formulae.
-
class
bamboo.core.aggregations.
PearsonAggregation
(name, groups, dframe)[source]¶ Calculate the Pearson correlation and associatd p-value.
Calculate the Pearson correlation coefficient between two columns and the p-value for that correlation coefficient.
Written as
pearson(FORMULA1, FORMULA2)
. WhereFORMULA1
andFORMULA2
are valid formulae.
-
class
bamboo.core.aggregations.
RatioAggregation
(name, groups, dframe)[source]¶ Calculate the ratio.
Columns with N/A for either the numerator or denominator are ignored. This will store associated numerator and denominator columns. Written as
ratio(NUMERATOR, DENOMINATOR)
. Where NUMERATOR and DENOMINATOR are both valid formulas.
-
class
bamboo.core.aggregations.
StandardDeviationAggregation
(name, groups, dframe)[source]¶ Calculate the standard deviation. Written as
std(FORMULA)
.Where FORMULA is a valid formula.
Aggregator¶
-
class
bamboo.core.aggregator.
Aggregator
(dframe, groups, _type, name, columns)[source]¶ Perform a aggregations on datasets.
Apply the aggregation to group columns by groups and the columns of the dframe. Store the resulting dframe as a linked dataset for dataset. If a linked dataset with the same groups already exists update this dataset. Otherwise create a new linked dataset.
Calculator¶
-
bamboo.core.calculator.
__calculation_data
(dataset)[source]¶ Create a list of aggregate calculation information.
Builds a list of calculation information from the current datasets aggregated datasets and aggregate calculations.
-
bamboo.core.calculator.
__propagate_column
(dataset, parent_dataset)[source]¶ Propagate columns in parent_dataset to dataset.
When a new calculation is added to a dataset this will propagate the new column to all child (merged) datasets.
Parameters: - dataset – THe child dataet.
- parent_dataset – The dataset to propagate.
-
bamboo.core.calculator.
__update_aggregate_dataset
(dataset, formula, new_dframe, name, groups, a_dataset, reducible)[source]¶ Update the aggregated dataset built for dataset with calculation.
Proceed with the following steps:
- delete the rows in this dataset from the parent
- recalculate aggregated dataframe from aggregation
- update aggregated dataset with new dataframe and add parent id
- recur on all merged datasets descending from the aggregated dataset
Parameters: - formula – The formula to execute.
- new_dframe – The DataFrame to aggregate on.
- name – The name of the aggregation.
- groups – A column or columns to group on.
- a_dataset – The DataSet to store the aggregation in.
-
bamboo.core.calculator.
__update_is_valid
(dataset, new_dframe)[source]¶ Check if the update is valid.
Check whether this is a right-hand side of any joins and deny the update if the update would produce an invalid join as a result.
Parameters: - dataset – The dataset to check if update valid for.
- new_dframe – The update dframe to check.
Returns: True is the update is valid, False otherwise.
-
bamboo.core.calculator.
__update_joined_datasets
(dataset, update)[source]¶ Update any joined datasets.
-
bamboo.core.calculator.
calculate_columns
(dataset, calculations)[source]¶ Calculate and store new columns for calculations.
The new columns are join t othe Calculation dframe and replace the dataset’s observations.
Note
This can result in race-conditions when:
- deleting
controllers.Datasets.DELETE
- updating
controllers.Datasets.POST([dataset_id])
Therefore, perform these actions asychronously.
Parameters: - dataset – The dataset to calculate for.
- calculations – A list of calculations.
- deleting
Frame¶
-
bamboo.core.frame.
add_parent_column
(df, parent_dataset_id)[source]¶ Add parent ID column to this DataFrame.
-
bamboo.core.frame.
join_dataset
(left, other, on)[source]¶ Left join an other dataset.
Parameters: - other – Other dataset to join.
- on – Column or 2 comma seperated columns to join on.
Returns: Joined DataFrame.
Raises: KeyError if join columns not in datasets.
Merge¶
-
bamboo.core.merge.
merge_dataset_ids
(dataset_ids, mapping)[source]¶ Load a JSON array of dataset IDs and start a background merge task.
Parameters: dataset_ids – An array of dataset IDs to merge. Raises: MergeError if less than 2 datasets are provided. If a dataset cannot be found for a dataset ID it is ignored. Therefore if 2 dataset IDs are provided and one of them is bad an error is raised. However, if three dataset IDs are provided and one of them is bad, an error is not raised.
Parsing Operations¶
-
class
bamboo.core.operations.
EvalAndOp
(tokens)[source]¶ Class to distinguish precedence of and expressions.
-
class
bamboo.core.operations.
EvalBinaryArithOp
(tokens)[source]¶ Class for evaluating binary arithmetic operations.
-
class
bamboo.core.operations.
EvalBinaryBooleanOp
(tokens)[source]¶ Class for evaluating binary boolean operations.
-
class
bamboo.core.operations.
EvalComparisonOp
(tokens)[source]¶ Class to evaluate comparison expressions.
-
class
bamboo.core.operations.
EvalConstant
(tokens)[source]¶ Class to evaluate a parsed constant or variable.
-
class
bamboo.core.operations.
EvalExpOp
(tokens)[source]¶ Class to distinguish precedence of exponentiation expressions.
-
class
bamboo.core.operations.
EvalMultOp
(tokens)[source]¶ Class to distinguish precedence of multiplication/division expressions.
-
class
bamboo.core.operations.
EvalOrOp
(tokens)[source]¶ Class to distinguish precedence of or expressions.
-
class
bamboo.core.operations.
EvalPercentile
(tokens)[source]¶ Class to evaluate percentile expressions.
-
class
bamboo.core.operations.
EvalPlusOp
(tokens)[source]¶ Class to distinguish precedence of addition/subtraction expressions.
-
class
bamboo.core.operations.
EvalSignOp
(tokens)[source]¶ Class to evaluate expressions with a leading + or - sign.
Formula Parser¶
-
class
bamboo.core.parser.
Parser
[source]¶ Class for parsing and evaluating formula.
Attributes:
- aggregation: Aggregation parsed from formula.
- aggregation_names: Possible aggregations.
- bnf: Cached Backus-Naur Form of formula.
- column_functions: Cached additional columns as aggregation parameters.
- function_names: Names of possible functions in formulas.
- operator_names: Names of possible operators in formulas.
- parsed_expr: Cached parsed expression.
- special_names: Names of possible reserved names in formulas.
- reserved_words: List of all possible reserved words that may be used in formulas.
-
_Parser__build_bnf
()¶ Parse formula to function based on language definition.
Backus-Naur Form of formula language:
Operation Expression addop ‘+’ | ‘-‘ multop ‘*’ | ‘/’ expop ‘^’ compop ‘==’ | ‘<’ | ‘>’ | ‘<=’ | ‘>=’ notop ‘not’ andop ‘and’ orop ‘or’ real d+(.d+) integer d+ variable w+ string ”.+” atom real | integer | variable func func ( atom ) factor atom [ expop factor]* term factor [ multop factor ]* expr term [ addop term ]* equation expr [compop expr]* in string in ‘[‘ “string”[, “string”]* ‘]’ neg [notop]* equation | in conj neg [andop neg]* disj conj [orop conj]* case ‘case’ disj: atom[, disj: atom]*[, ‘default’: atom] trans trans ( case ) agg agg ( trans[, trans]* )
-
classmethod
parse
(formula)[source]¶ Parse formula and return evaluation function.
Parse formula into an aggregation name and functions. There will be multiple functions is the aggregation takes multiple arguments, e.g. ratio which takes a numerator and denominator formula.
Examples:
- constants
9 + 5
,
- aliases
rating
,gps
,
- arithmetic
amount + gps_alt
,amount - gps_alt
,amount + 5
,amount - gps_alt + 2.5
,amount * gps_alt
,amount / gps_alt
,amount * gps_alt / 2.5
,amount + gps_alt * gps_precision
,
- precedence
(amount + gps_alt) * gps_precision
,
- comparison
amount == 2
,10 < amount
,10 < amount + gps_alt
,
- logical
not amount == 2
,not(amount == 2)
,amount == 2 and 10 < amount
,amount == 2 or 10 < amount
,not not amount == 2 or 10 < amount
,not amount == 2 or 10 < amount
,not amount == 2) or 10 < amount
,not(amount == 2 or 10 < amount)
,amount ^ 3
,amount + gps_alt) ^ 2 + 100
,amount
,amount < gps_alt - 100
,
- membership
rating in ["delectible"]
,risk_factor in ["low_risk"]
,amount in ["9.0", "2.0", "20.0"]
,risk_factor in ["low_risk"]) and (amount in ["9.0", "20.0"])
,
- dates
date("09-04-2012") - submit_date > 21078000
,
- cases
case food_type in ["morning_food"]: 1, default: 3
- transformations: row-wise column based aggregations
percentile(amount)
Parameters: formula – The string to parse. Returns: A tuple with the name of the aggregation in the formula, if any and a list of functions built from the input string.
-
classmethod
validate
(dataset, formula, groups)[source]¶ Validate formula and groups for dataset.
Validate the formula and group string by attempting to get a row from the dframe for the dataset and then running parser validation on this row. Additionally, ensure that the groups in the group string are columns in the dataset.
Parameters: - dataset – The dataset to validate for.
- formula – The formula to validate.
- groups – A list of columns to group by.
Returns: The aggregation (or None) for the formula.
Summary Statistic Utilities¶
-
exception
bamboo.core.summary.
ColumnTypeError
[source]¶ Exception when grouping on a non-dimensional column.
-
bamboo.core.summary.
summarizable
(dframe, col, groups, dataset)[source]¶ Check if column should be summarized.
Parameters: - dframe – DataFrame to check unique values in.
- col – Column to check for factor and number of uniques.
- groups – List of groups if summarizing with group, can be empty.
- dataset – Dataset to pull schema from.
Returns: True if column, with parameters should be summarized, otherwise False.
-
bamboo.core.summary.
summarize
(dataset, dframe, groups, no_cache, update=False)[source]¶ Raises a ColumnTypeError if grouping on a non-dimensional column.