Models

AbstractModel

class bamboo.models.abstract_model.AbstractModel(record=None)[source]

An abstact class for all MongoDB models.

Attributes:

  • __collection__: The MongoDB collection to communicate with.
  • STATE: A key with which to store state.
  • STATE_PENDING: A value for the pending state.
  • STATE_READY: A value for the ready state.
clean_record

Remove reserved keys from records.

delete(query)[source]

Delete rows matching query.

Parameters:query – The query for rows to delete.
failed(message=None)[source]

Perist the state of the current instance to STATE_FAILED.

Params message:A string store as the error message, default None.
classmethod find(query_args, as_dict=False, as_cursor=False)[source]

An interface to MongoDB’s find functionality.

Parameters:
  • query_args – An optional QueryArgs to hold the query arguments.
  • as_cursor – If True, return the cursor.
  • as_dict – If True, return dicts and not model instances.
Returns:

A list of dicts or model instances for each row returned.

classmethod find_one(query, select=None, as_dict=False)[source]

Return the first row matching query and select from MongoDB.

Parameters:
  • query – A query to pass to MongoDB.
  • select – An optional select to pass to MongoDB.
  • as_dict – If true, return dicts and not model instances.
Returns:

A model instance of the row returned for this query and select.

pending()[source]

Perist the state of the current instance to STATE_PENDING

ready()[source]

Perist the state of the current instance to STATE_READY

save(record)[source]

Save record in this model’s collection.

Save the record in the model instance’s collection and set the internal record of this instance to the passed in record.

Parameters:record – The dict to save in the model’s collection.
Returns:The record passed in.
classmethod set_collection(collection_name)[source]

Return a MongoDB collection for the passed name.

Parameters:collection_name – The name of collection to return.
Returns:A MongoDB collection from the current database.
split_groups(groups)[source]

Split a string based on the group delimiter

classmethod unset(query, unset_query)[source]

Call unset with the spec query the unset document unset_query.

Parameters:
  • query – The spec restrict updates to.
  • unset_query – The query to pass to unset.
update(record)[source]

Update the current instance with record.

Update the current model instance based on its _id, set it to the passed in record.

Parameters:record – The record to replace the instance’s data with.

Calculation

class bamboo.models.calculation.Calculation(record=None)[source]
add_dependencies(dataset, dependent_columns)[source]

Store calculation dependencies.

delete(dataset)[source]

Delete this calculation.

First ensure that there are no other calculations which depend on this one. If not, start a background task to delete the calculation.

Parameters:dataset – Dataset for this calculation.
Raises:DependencyError if dependent calculations exist.
Raises:ArgumentError if group is not in DataSet or calculation does not exist for DataSet.
classmethod find(dataset, include_aggs=True, only_aggs=False)[source]

Return the calculations for`dataset`.

Parameters:
  • dataset – The dataset to retrieve the calculations for.
  • include_aggs – Include aggregations, default True.
  • only_aggs – Exclude non-aggregations, default False.
save(dataset, formula, name, group_str=None)[source]

Parse, save, and calculate a formula.

Validate formula and group_str for the given dataset. If the formula and group are valid for the dataset, then save a new calculation for them under name. Finally, create a background task to compute the calculation.

Calculations are initially saved in a pending state, after the calculation has finished processing it will be in a ready state.

Parameters:
  • dataset – The DataSet to save.
  • formula – The formula to save.
  • name – The name of the formula.
  • group_str (String, list or strings, or None.) – Columns to group on.
Raises:

ParseError if an invalid formula was supplied.

Dataset

class bamboo.models.dataset.Dataset(record=None)[source]
add_joined_dataset(new_data)[source]

Add the ID of new_dataset to the list of joined datasets.

add_merged_dataset(mapping, new_dataset)[source]

Add the ID of new_dataset to the list of merged datasets.

add_observations(new_data)[source]

Update dataset with new_data.

build_schema(dframe, overwrite=False, set_num_columns=True)[source]

Build schema for a dataset.

If no schema exists, build a schema from the passed dframe and store that schema for this dataset. Otherwise, if a schema does exist, build a schema for the passed dframe and merge this schema with the current schema. Keys in the new schema replace keys in the current schema but keys in the current schema not in the new schema are retained.

If set_num_columns is True the number of columns will be set to the number of keys (columns) in the new schema.

Parameters:
  • dframe – The DataFrame whose schema to merge with the current schema.
  • overwrite – If true replace schema, otherwise update.
  • set_num_columns – If True also set the number of columns.
calculations(include_aggs=True, only_aggs=False)[source]

Return the calculations for this dataset.

Parameters:
  • include_aggs – Include aggregations, default True.
  • only_aggs – Exclude non-aggregations, default False.
clear_summary_stats(group=None, column=None)[source]

Remove summary stats for group and optional column.

By default will remove all stats.

Parameters:
  • group – The group to remove stats for, default None.
  • column – The column to remove stats for, default None.
count(query_args=None)[source]

Return the count of rows matching query in dataset.

Parameters:query_args – An optional QueryArgs to hold the query arguments.
delete(query=None, countdown=0)[source]

Delete this dataset.

Parameters:countdown – Delete dataset after this number of seconds.
delete_columns(columns)[source]

Delete column column from this dataset.

Parameters:column – The column to delete.
delete_observation(index)[source]

Delete observation at index.

Params index:The index of an observation to delete.
dframe(query_args=None, keep_parent_ids=False, padded=False, index=False, reload_=False, keep_mongo_keys=False)[source]

Fetch the dframe for this dataset.

Parameters:
  • query_args – An optional QueryArgs to hold the query arguments.
  • keep_parent_ids – Do not remove parent IDs from the dframe, default False.
  • padded – Used for joining, default False.
  • index – Return the index with dframe, default False.
  • reload – Force refresh of data, default False.
  • keep_mongo_keys – Used for updating documents, default False.
Returns:

Return DataFrame with contents based on query parameters passed to MongoDB. DataFrame will not have parent ids if keep_parent_ids is False.

classmethod find(dataset_id)[source]

Return datasets for dataset_id.

classmethod find_one(dataset_id)[source]

Return dataset for dataset_id.

has_pending_updates(update_id)[source]

Check if this dataset has pending updates.

Call the update identfied by update_id the current update. A dataset has pending updates if, not including the current update, there are any pending updates and the update at the top of the queue is not the current update.

Parameters:update_id – An update to exclude when checking for pending updates.
Returns:True if there are pending updates, False otherwise.
info(update=None)[source]

Return or update meta-data for this dataset.

Parameters:update – Dictionary to update info with, default None.
Returns:Dictionary of info for this dataset.
join(other, on)[source]

Join with dataset other on the passed columns.

Parameters:
  • other – The other dataset to join.
  • on – The column in this and the other dataset to join on.
observations(query_args=None, as_cursor=False)[source]

Return observations for this dataset.

Parameters:
  • query_args – An optional QueryArgs to hold the query arguments.
  • as_cursor – Return the observations as a cursor.
reload()[source]

Reload the dataset from DB and clear any cache.

remove_parent_observations(parent_id)[source]

Remove obervations for this dataset with the passed parent_id.

Parameters:parent_id – Remove observations with this ID as their parent dataset ID.
replace_observations(dframe, overwrite=False, set_num_columns=True)[source]

Remove all rows for this dataset and save the rows in dframe.

Parameters:
  • dframe – Replace rows in this dataset with this DataFrame’s rows.
  • overwrite – If true replace the schema, otherwise update it. Default False.
  • set_num_columns – If true update the dataset stored number of columns. Default True.
Returns:

DataFrame equivalent to the passed in dframe.

resample(date_column, interval, how, query=None)[source]

Resample a dataset given a new time frame.

Parameters:
  • date_column – The date column use as the index for resampling.
  • interval – The interval code for resampling.
  • how – How to aggregate in the resample.
Returns:

A DataFrame of the resampled DataFrame for this dataset.

rolling(win_type, window)[source]

Calculate a rolling window over all numeric columns.

Parameters:
  • win_type – The type of window, see pandas pandas.rolling_window.
  • window – The number of observations used for calculating the window.
Returns:

A DataFrame of the rolling window calculated for this dataset.

save(dataset_id=None)[source]

Store dataset with dataset_id as the unique internal ID.

Store a new dataset with an ID given by dataset_id is exists, otherwise reate a random UUID for this dataset. Additionally, set the created at time to the current time and the state to pending.

Parameters:dataset_id – The ID to store for this dataset, default is None.
Returns:A dict representing this dataset.
save_observations(dframe)[source]

Save rows in dframe for this dataset.

Parameters:dframe – DataFrame to save rows from.
set_olap_type(column, olap_type)[source]

Set the OLAP Type for this column of dataset.

Only columns with an original OLAP Type of ‘measure’ can be modified. This includes columns with Simple Type integer, float, and datetime.

Parameters:
  • column – The column to set the OLAP Type for.
  • olap_type – The OLAP Type to set. Must be ‘dimension’ or ‘measure’.
set_schema(schema, set_num_columns=True)[source]

Set the schema from an existing one.

summarize(dframe, groups=[], no_cache=False, update=False, flat=False)[source]

Build and return a summary of the data in this dataset.

Return a summary of dframe grouped by groups, or the overall summary if no groups are specified.

Parameters:
  • dframe – dframe to summarize
  • groups – A list of columns to group on.
  • no_cache – Do not fetch a cached summary.
  • flat – Return a flattened list of groups.
Returns:

A summary of the dataset as a dict. Numeric columns will be summarized by the arithmetic mean, standard deviation, and percentiles. Dimensional columns will be summarized by counts.

update(record)[source]

Update dataset dataset with record.

update_complete(update_id)[source]

Remove update_id from this datasets list of pending updates.

Parameters:update_id – The ID of the completed update.
update_stats(dframe, update=False)[source]

Update store statistics for this dataset.

Parameters:
  • dframe – Use this DataFrame for summary statistics.
  • update – Update or replace summary statistics, default False.

Observation

class bamboo.models.observation.Observation(record=None)[source]
classmethod append(dframe, dataset)[source]

Append an additional dframe to an existing dataset.

Params dframe:The DataFrame to append.
Params dataset:The DataSet to add dframe to.
classmethod batch_read_dframe_from_cursor(dataset, observations, distinct, limit)[source]

Read a DataFrame from a MongoDB Cursor in batches.

classmethod delete(dataset, index)[source]

Delete observation at index for dataset.

Parameters:
  • dataset – The dataset to delete the observation from.
  • index – The index of the observation to delete.
classmethod delete_all(dataset, query=None)[source]

Delete the observations for dataset.

Parameters:
  • dataset – The dataset to delete observations for.
  • query – An optional query to restrict deletion.
classmethod delete_columns(dataset, columns)[source]

Delete a column from the dataset.

classmethod find(dataset, query_args=None, as_cursor=False, include_deleted=False)[source]

Return observation rows matching parameters.

Parameters:
  • dataset – Dataset to return rows for.
  • include_deleted – If True, return delete records, default False.
  • query_args – An optional QueryArgs to hold the query arguments.
Raises:

JSONError if the query could not be parsed.

Returns:

A list of dictionaries matching the passed in query and other parameters.

classmethod find_one(dataset, index, decode=True)[source]

Return row by index.

Parameters:
  • dataset – The dataset to find the row for.
  • index – The index of the row to find.
classmethod save(dframe, dataset)[source]

Save data in dframe with the dataset.

Encode dframe for MongoDB, and add fields to identify it with the passed in dataset. All column names in dframe are converted to slugs using the dataset’s schema. The dataset is update to store the size of the stored data.

Parameters:
  • dframe – The DataFrame to store.
  • dataset – The dataset to store the dframe in.
classmethod update(dataset, index, record)[source]

Update a dataset row by index.

The record dictionary will update, not replace, the data in the row at index.

Parameters:
  • dataset – The dataset to update a row for.
  • dex – The index of the row to update.
  • record – The dictionary to update the row with.