Core

Aggregation

class bamboo.core.aggregations.Aggregation(name, groups, dframe)[source]

Abstract class for all aggregations.

Parameters:
  • column – Column to aggregate.
  • columns – List of columns to aggregate.
  • formula_name – The string to refer to this aggregation.
agg()[source]

For when aggregation is called without a group parameter.

group()[source]

For when aggregation is called with a group parameter.

class bamboo.core.aggregations.ArgMaxAggregation(name, groups, dframe)[source]

Return the index for the maximum of a column.

Written as argmax(FORMULA). Where FORMULA is a valid formula.

group()[source]

For when aggregation is called with a group parameter.

class bamboo.core.aggregations.CountAggregation(name, groups, dframe)[source]

Calculate the count of rows fulfilling the criteria in the formula.

N/A values are ignored unless there are no arguments to the function, in which case it returns the number of rows in the dataset.

Written as count(CRITERIA). Where CRITERIA is an optional boolean expression that signifies which rows are to be counted.

class bamboo.core.aggregations.MaxAggregation(name, groups, dframe)[source]

Calculate the maximum.

Written as max(FORMULA). Where FORMULA is a valid formula.

class bamboo.core.aggregations.MeanAggregation(name, groups, dframe)[source]

Calculate the arithmetic mean.

Written as mean(FORMULA). Where FORMULA is a valid formula.

Because mean is a ratio this inherits from RatioAggregation to use its generic reduce implementation.

class bamboo.core.aggregations.MedianAggregation(name, groups, dframe)[source]

Calculate the median. Written as median(FORMULA).

Where FORMULA is a valid formula.

class bamboo.core.aggregations.MinAggregation(name, groups, dframe)[source]

Calculate the minimum.

Written as min(FORMULA). Where FORMULA is a valid formula.

class bamboo.core.aggregations.NewestAggregation(name, groups, dframe)[source]

Return the second column’s value at the newest row in the first column.

Find the maximum value for the first column and return the entry at that row from the second column.

Written as newest(INDEX_FORMULA, VALUE_FORMULA) where INDEX_FORMULA and VALUE_FORMULA are valid formulae.

class bamboo.core.aggregations.PearsonAggregation(name, groups, dframe)[source]

Calculate the Pearson correlation and associatd p-value.

Calculate the Pearson correlation coefficient between two columns and the p-value for that correlation coefficient.

Written as pearson(FORMULA1, FORMULA2). Where FORMULA1 and FORMULA2 are valid formulae.

class bamboo.core.aggregations.RatioAggregation(name, groups, dframe)[source]

Calculate the ratio.

Columns with N/A for either the numerator or denominator are ignored. This will store associated numerator and denominator columns. Written as ratio(NUMERATOR, DENOMINATOR). Where NUMERATOR and DENOMINATOR are both valid formulas.

reduce(dframe, columns)[source]

Reduce the columns and store in dframe.

Parameters:
  • dframe – The DataFrame to reduce.
  • columns – Columns in the DataFrame to reduce on.
class bamboo.core.aggregations.StandardDeviationAggregation(name, groups, dframe)[source]

Calculate the standard deviation. Written as std(FORMULA).

Where FORMULA is a valid formula.

class bamboo.core.aggregations.SumAggregation(name, groups, dframe)[source]

Calculate the sum.

Written as sum(FORMULA). Where FORMULA is a valid formula.

class bamboo.core.aggregations.VarianceAggregation(name, groups, dframe)[source]

Calculate the variance. Written as var(FORMULA).

Where FORMULA is a valid formula.

Aggregator

class bamboo.core.aggregator.Aggregator(dframe, groups, _type, name, columns)[source]

Perform a aggregations on datasets.

Apply the aggregation to group columns by groups and the columns of the dframe. Store the resulting dframe as a linked dataset for dataset. If a linked dataset with the same groups already exists update this dataset. Otherwise create a new linked dataset.

save(dataset)[source]

Save this aggregation.

If an aggregated dataset for this aggregations group already exists store in this dataset, if not create a new aggregated dataset and store the aggregation in this new aggregated dataset.

update(dataset, child_dataset, formula, reducible)[source]

Attempt to reduce an update and store.

updated_dframe(dataset, formula, dframe)[source]

Create a new aggregation and update return updated dframe.

Calculator

bamboo.core.calculator.__calculation_data(dataset)[source]

Create a list of aggregate calculation information.

Builds a list of calculation information from the current datasets aggregated datasets and aggregate calculations.

bamboo.core.calculator.__propagate_column(dataset, parent_dataset)[source]

Propagate columns in parent_dataset to dataset.

When a new calculation is added to a dataset this will propagate the new column to all child (merged) datasets.

Parameters:
  • dataset – THe child dataet.
  • parent_dataset – The dataset to propagate.
bamboo.core.calculator.__update_aggregate_dataset(dataset, formula, new_dframe, name, groups, a_dataset, reducible)[source]

Update the aggregated dataset built for dataset with calculation.

Proceed with the following steps:

  • delete the rows in this dataset from the parent
  • recalculate aggregated dataframe from aggregation
  • update aggregated dataset with new dataframe and add parent id
  • recur on all merged datasets descending from the aggregated dataset
Parameters:
  • formula – The formula to execute.
  • new_dframe – The DataFrame to aggregate on.
  • name – The name of the aggregation.
  • groups – A column or columns to group on.
  • a_dataset – The DataSet to store the aggregation in.
bamboo.core.calculator.__update_is_valid(dataset, new_dframe)[source]

Check if the update is valid.

Check whether this is a right-hand side of any joins and deny the update if the update would produce an invalid join as a result.

Parameters:
  • dataset – The dataset to check if update valid for.
  • new_dframe – The update dframe to check.
Returns:

True is the update is valid, False otherwise.

bamboo.core.calculator.__update_joined_datasets(dataset, update)[source]

Update any joined datasets.

bamboo.core.calculator.calculate_columns(dataset, calculations)[source]

Calculate and store new columns for calculations.

The new columns are join t othe Calculation dframe and replace the dataset’s observations.

Note

This can result in race-conditions when:

  • deleting controllers.Datasets.DELETE
  • updating controllers.Datasets.POST([dataset_id])

Therefore, perform these actions asychronously.

Parameters:
  • dataset – The dataset to calculate for.
  • calculations – A list of calculations.
bamboo.core.calculator.dframe_from_update(dataset, new_data)[source]

Make a DataFrame for the new_data.

Parameters:new_data (List.) – Data to add to dframe.

Frame

bamboo.core.frame.add_parent_column(df, parent_dataset_id)[source]

Add parent ID column to this DataFrame.

bamboo.core.frame.join_dataset(left, other, on)[source]

Left join an other dataset.

Parameters:
  • other – Other dataset to join.
  • on – Column or 2 comma seperated columns to join on.
Returns:

Joined DataFrame.

Raises:

KeyError if join columns not in datasets.

bamboo.core.frame.remove_reserved_keys(df, exclude=[])[source]

Remove reserved internal columns in this DataFrame.

Parameters:keep_parent_ids – Keep parent column if True, default False.
bamboo.core.frame.rows_for_parent_id(df, parent_id)[source]

DataFrame with only rows for parent_id.

Parameters:parent_id – The ID to restrict rows to.
Returns:A DataFrame including only rows with a parent ID equal to that passed in.

Merge

exception bamboo.core.merge.MergeError[source]

For errors while merging datasets.

bamboo.core.merge.merge_dataset_ids(dataset_ids, mapping)[source]

Load a JSON array of dataset IDs and start a background merge task.

Parameters:dataset_ids – An array of dataset IDs to merge.
Raises:MergeError if less than 2 datasets are provided. If a dataset cannot be found for a dataset ID it is ignored. Therefore if 2 dataset IDs are provided and one of them is bad an error is raised. However, if three dataset IDs are provided and one of them is bad, an error is not raised.

Parsing Operations

class bamboo.core.operations.EvalAndOp(tokens)[source]

Class to distinguish precedence of and expressions.

class bamboo.core.operations.EvalBinaryArithOp(tokens)[source]

Class for evaluating binary arithmetic operations.

class bamboo.core.operations.EvalBinaryBooleanOp(tokens)[source]

Class for evaluating binary boolean operations.

class bamboo.core.operations.EvalCaseOp(tokens)[source]

Class to eval case statements.

class bamboo.core.operations.EvalComparisonOp(tokens)[source]

Class to evaluate comparison expressions.

class bamboo.core.operations.EvalConstant(tokens)[source]

Class to evaluate a parsed constant or variable.

class bamboo.core.operations.EvalDate(tokens)[source]

Class to evaluate date expressions.

class bamboo.core.operations.EvalExpOp(tokens)[source]

Class to distinguish precedence of exponentiation expressions.

class bamboo.core.operations.EvalFunction(tokens)[source]

Class to eval functions.

class bamboo.core.operations.EvalInOp(tokens)[source]

Class to eval in expressions.

class bamboo.core.operations.EvalMapOp(tokens)[source]

Class to eval map statements.

class bamboo.core.operations.EvalMultOp(tokens)[source]

Class to distinguish precedence of multiplication/division expressions.

class bamboo.core.operations.EvalNotOp(tokens)[source]

Class to evaluate not expressions.

class bamboo.core.operations.EvalOrOp(tokens)[source]

Class to distinguish precedence of or expressions.

class bamboo.core.operations.EvalPercentile(tokens)[source]

Class to evaluate percentile expressions.

class bamboo.core.operations.EvalPlusOp(tokens)[source]

Class to distinguish precedence of addition/subtraction expressions.

class bamboo.core.operations.EvalSignOp(tokens)[source]

Class to evaluate expressions with a leading + or - sign.

class bamboo.core.operations.EvalString(tokens)[source]

Class to evaluate a parsed string.

class bamboo.core.operations.EvalTerm(tokens)[source]

Base class for evaluation.

operator_operands(tokenlist)[source]

Generator to extract operators and operands in pairs.

class bamboo.core.operations.EvalToday(tokens)[source]

Class to produce te current date time.

Formula Parser

class bamboo.core.parser.Parser[source]

Class for parsing and evaluating formula.

Attributes:

  • aggregation: Aggregation parsed from formula.
  • aggregation_names: Possible aggregations.
  • bnf: Cached Backus-Naur Form of formula.
  • column_functions: Cached additional columns as aggregation parameters.
  • function_names: Names of possible functions in formulas.
  • operator_names: Names of possible operators in formulas.
  • parsed_expr: Cached parsed expression.
  • special_names: Names of possible reserved names in formulas.
  • reserved_words: List of all possible reserved words that may be used in formulas.
_Parser__build_bnf()

Parse formula to function based on language definition.

Backus-Naur Form of formula language:

Operation Expression
addop ‘+’ | ‘-‘
multop ‘*’ | ‘/’
expop ‘^’
compop ‘==’ | ‘<’ | ‘>’ | ‘<=’ | ‘>=’
notop ‘not’
andop ‘and’
orop ‘or’
real d+(.d+)
integer d+
variable w+
string ”.+”
atom real | integer | variable
func func ( atom )
factor atom [ expop factor]*
term factor [ multop factor ]*
expr term [ addop term ]*
equation expr [compop expr]*
in string in ‘[‘ “string”[, “string”]* ‘]’
neg [notop]* equation | in
conj neg [andop neg]*
disj conj [orop conj]*
case ‘case’ disj: atom[, disj: atom]*[, ‘default’: atom]
trans trans ( case )
agg agg ( trans[, trans]* )
classmethod parse(formula)[source]

Parse formula and return evaluation function.

Parse formula into an aggregation name and functions. There will be multiple functions is the aggregation takes multiple arguments, e.g. ratio which takes a numerator and denominator formula.

Examples:

  • constants
    • 9 + 5,
  • aliases
    • rating,
    • gps,
  • arithmetic
    • amount + gps_alt,
    • amount - gps_alt,
    • amount + 5,
    • amount - gps_alt + 2.5,
    • amount * gps_alt,
    • amount / gps_alt,
    • amount * gps_alt / 2.5,
    • amount + gps_alt * gps_precision,
  • precedence
    • (amount + gps_alt) * gps_precision,
  • comparison
    • amount == 2,
    • 10 < amount,
    • 10 < amount + gps_alt,
  • logical
    • not amount == 2,
    • not(amount == 2),
    • amount == 2 and 10 < amount,
    • amount == 2 or 10 < amount,
    • not not amount == 2 or 10 < amount,
    • not amount == 2 or 10 < amount,
    • not amount == 2) or 10 < amount,
    • not(amount == 2 or 10 < amount),
    • amount ^ 3,
    • amount + gps_alt) ^ 2 + 100,
    • amount,
    • amount < gps_alt - 100,
  • membership
    • rating in ["delectible"],
    • risk_factor in ["low_risk"],
    • amount in ["9.0", "2.0", "20.0"],
    • risk_factor in ["low_risk"]) and (amount in ["9.0", "20.0"]),
  • dates
    • date("09-04-2012") - submit_date > 21078000,
  • cases
    • case food_type in ["morning_food"]: 1, default: 3
  • transformations: row-wise column based aggregations
    • percentile(amount)
Parameters:formula – The string to parse.
Returns:A tuple with the name of the aggregation in the formula, if any and a list of functions built from the input string.
store_aggregation(_, __, tokens)[source]

Cached a parsed aggregation.

classmethod validate(dataset, formula, groups)[source]

Validate formula and groups for dataset.

Validate the formula and group string by attempting to get a row from the dframe for the dataset and then running parser validation on this row. Additionally, ensure that the groups in the group string are columns in the dataset.

Parameters:
  • dataset – The dataset to validate for.
  • formula – The formula to validate.
  • groups – A list of columns to group by.
Returns:

The aggregation (or None) for the formula.

classmethod validate_formula(formula, dataset)[source]

Validate the formula on an example row of data.

Rebuild the BNF then parse the formula given the sample row.

Parameters:
  • formula – The formula to validate.
  • dataset – The dataset to validate against.
Returns:

The aggregation for the formula.

Summary Statistic Utilities

exception bamboo.core.summary.ColumnTypeError[source]

Exception when grouping on a non-dimensional column.

bamboo.core.summary.summarizable(dframe, col, groups, dataset)[source]

Check if column should be summarized.

Parameters:
  • dframe – DataFrame to check unique values in.
  • col – Column to check for factor and number of uniques.
  • groups – List of groups if summarizing with group, can be empty.
  • dataset – Dataset to pull schema from.
Returns:

True if column, with parameters should be summarized, otherwise False.

bamboo.core.summary.summarize(dataset, dframe, groups, no_cache, update=False)[source]

Raises a ColumnTypeError if grouping on a non-dimensional column.

bamboo.core.summary.summarize_df(dframe, dataset, groups=[])[source]

Calculate summary statistics.

bamboo.core.summary.summarize_series(is_factor, data)[source]

Call summary function dependent on dtype type.

Parameters:
  • dtype – The dtype of the column to be summarized.
  • data – The data to be summarized.
Returns:

The appropriate summarization for the type of dtype.

bamboo.core.summary.summarize_with_groups(dframe, groups, dataset)[source]

Calculate summary statistics for group.