Basic Commands

Note

On the command line some special characters may need to be escaped for the commands to function correctly. E.g. & as \&, ? as \?, = as \=.

Note

[SIC] all spelling errors in the example dataset.

Storing data in bamboo

Upload data from a URL to bamboo

curl -X POST -d "url=http://formhub.org/mberg/forms/good_eats/data.csv" http://bamboo.io/datasets

returns:

{
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Upload data from a CSV file to bamboo

given the file /home/modilabs/good_eats.csv exists locally on your filesystem

curl -X POST -F csv_file=@/home/modilabs/good_eats.csv http://bamboo.io/datasets

returns:

{
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Upload data from a JSON file to bamboo

given the file /home/modilabs/good_eats.json exists locally on your filesystem

curl -X POST -F json_file=@/home/modilabs/good_eats.json http://bamboo.io/datasets

returns:

{
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Missing Data in bamboo

NA Values in uploaded data are interpreted using pandas. The following values are by default interpreted as NA:

  • missing values,
  • the string ‘NA’,
  • the string ‘NaN’,
  • the string ‘NaT’ for datetime columns.

For details see the pandas docs.

You can specify custom values to interpret as NA using the na_values parameter. For example, to interpret the string ‘n/a’ as missing data, call:

curl -X POST -d "url=http://formhub.org/mberg/forms/good_eats/data.csv" http://bamboo.io/datasets?na_values='["n/a"]'

Deleting a dataset

To delete a dataset pass the dataset ID to a delete request.

curl -X DELETE http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

{
    "success": "deleted dataset",
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Retrieve information about a dataset

given the id is 8a3d74711475d8a51c84484fe73f24bd151242ea

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/info

returns:

{
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea",
    "schema": {
        "amount": {
            "label": "Amount",
            "olap_type": "measure",
            "simpletype": "float"
        },
        "rating": {
            "label": "Rating",
            "olap_type": "dimension",
            "simpletype": "string",
            "cardinality": 2
        },
        "food_type": {
            "label": "Food Type",
            "olap_type": "dimension",
            "simpletype": "string",
            "cardinality": 8
        },
        ...
    },
    "created_at": "2012-6-18 14:43:32",
    "updated_at": "2012-6-18 14:43:32",
    "num_rows": "500",
    "num_columns": "30",
    "state": "ready"
}

Retrieve data

given the id is 8a3d74711475d8a51c84484fe73f24bd151242ea

By ID

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea

This returns the dataset as JSON.

returns:

[
    {
        "rating": "delectible",
        "_percentage_complete": "n/a",
        "_xform_id_string": "good_eats",
        "risk_factor": "low_risk",
        "gps_alt": "39.5",
        "food_type": "lunch",
        ...
    },
    ...
]

Alternatively, return the dataset as a CSV,

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea.csv

returns:

rating,_percentage_complete,_xform_id_string,gps_alt,food_type
delectible,n/a,good_eats,low_risk,39.5,lunch
...

By ID with select

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?select='{"rating":1}'

This returns the dataset as JSON given the select, i.e. only the rating column.

returns:

[
    {"rating": "epic_eat"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "epic_eat"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "epic_eat"},
    {"rating": "epic_eat"},
    {"rating": "epic_eat"},
    {"rating": "delectible"},
    {"rating": "epic_eat"},
    {"rating": "epic_eat"},
    {"rating": "epic_eat"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "delectible"},
    {"rating": "epic_eat"}
]

By ID with distinct

To retrieve only the unique values in a column, pass the distinct parameter:

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?select='{"rating":1}&distinct=rating'

This returns the distinct keys for the results of the passed query as a JSON array.

returns:

[
    "delectible",
    "epic_eat"
]

By ID and query

The query must be valid MongoDB extended JSON

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?query='{"food_type":"lunch"}'

This returns the dataset as JSON given the query, i.e. only rows with a food_type of “lunch”.

returns:

[
    {
        "rating": "delectible",
        "location_name": "Tolga Copsis ",
        "description": "Cotsi ", "_gps_precision": "85.0",
        "submit_date": {"$date": 1325635200000},
        "_gps_latitude": "37.951282449999994",
        "_gps_altitude": "0.0",
        "submit_data": {"$date": 1325635200000},
        "_gps_longitude": "27.3700048",
        "comments": "n/a",
        "amount": 8.0,
        "risk_factor": "low_risk",
        "imei": 358490042584319,
        "food_type": "lunch",
        "gps": "37.951282449999994 27.3700048 0.0 85.0",
        "location_photo": "1325672494341.jpg",
        "food_photo": "1325672462974.jpg"
    },
    ...
]

Query with dates

To query with dates use the MongoDB query format and specify dates as Unix epochs.

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?query='{"submit_date": {"$lt": 1320000000}'

Returns the rows with a time stamp less than 1320000000, which is October 30th 2011.

You may also pass dates in the form “YYYY-MM-DD”, and other common formats:

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?query='{"submit_date": {"$lt": "2011-10-30"}'

Only return the count

To only the return the number of records in your query pass count=True. The count will take into consideration the query, distinct, and limit parameters.

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?query='{"rating":"delectible"}'&count=True

returns:

11

Retrieve summary statistics for dataset

By ID

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/summary?select=all

This returns a summary of the dataset. Columns of type float and integer are show as summary statistics. Columns of type string and boolean are shown as counts of unique values.

The select argument is required. It can either be all or a MongoDB JSON select query.

returns:

{
    "rating": {
        "summary": {
            "delectible": 12,
            "epic_eat": 10
        }
    },
    "amount": {
        "summary": {
            "count": 22.0,
            "std": 339.16360630207191,
            "min": 2.0,
            "max": 1600.0,
            "50%": 12.0,
            "25%": 4.6875,
            "75%": 19.5,
            "mean": 92.772727272727266
        }
    },
    ...
}

With a query

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/summary?query='{"food_type": "lunch"}'&select=all

Return the summary restricting to data that matches the Mongo query passed as query.

returns:

{
    "rating": {
        "summary": {
            "delectible": 5,
            "epic_eat": 2
        }
    },
    "amount": {
        "summary": {
            "count": 7.0,
            "std": 71.321017238959797,
            "min": 4.25,
            "max": 200.0,
            "50%": 12.0,
            "25%": 8.5,
            "75%": 19.0,
            "mean": 38.75
        }
    },
    "risk_factor": {
        "summary": {
            "low_risk": 7
        }
    },
    "food_type": {
        "summary": {
            "lunch": 7
        }
    },
    ...
}

With a grouping

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/summary?select=all&group=food_type

Return the summary grouping on the value passed as group.

returns:

{
    "food_type": {
        "caffeination": {
            "rating": {
                "summary": {
                    "epic_eat": 1
                 }
            },
            "description": {
                "summary": {
                    "Turkish coffee": 1
                }
            },
            "amount": {
                "summary": {
                    "count": 1.0,
                    "std": "null",
                    "min": 2.5,
                    "max": 2.5,
                    "50%": 2.5,
                    "25%": 2.5,
                    "75%": 2.5,
                    "mean": 2.5
                }
            },
            "risk_factor": {
                "summary": {
                    "low_risk": 1
                }
            },
            ...
        "deserts": {
            "rating": {
                "summary": {
                    "epic_eat": 2
                }
            },
            "description": {
                "summary": {
                    "Baklava": 1,
                    "Rice Pudding ": 1
                }
            },
            "amount": {
                "summary": {
                    "count": 2.0,
                    "std": 2.2980970388562794,
                    "min": 2.75,
                    "max": 6.0,
                    "50%": 4.375,
                    "25%": 3.5625,
                    "75%": 5.1875,
                    "mean": 4.375
                }
            },
            "risk_factor": {
                "summary": {
                    "low_risk": 2
                }
            },
            ...
        }
        ...
    }
}

With a grouping and a select

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/summary?select='{"rating":1}'&group=food_type

Return the summary grouping on the value passed as group and only showing the columns specified by the select.

returns:

{
    "food_type": {
        "caffeination": {
            "rating": {
                "summary": {
                    "epic_eat": 1
                }
            }
        },
        "deserts": {
            "rating": {
                "summary": {
                    "epic_eat": 2
                }
            }
        },
        ...
    }
}

With a multi-grouping

curl http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/summary?select=all&group=food_type,rating

returns:

{
    "food_type,rating": {
        "(u'dinner', u'delectible')": {
            "rating": {
                "summary": {
                    "delectible": 2
                }
            },
            "amount": {
                "summary": {
                    "count": 2.0,
                    "std": 1.4142135623730951,
                    "min": 12.0,
                    "max": 14.0,
                    "50%": 13.0,
                    "25%": 12.5,
                    "75%": 13.5,
                    "mean": 13.0
                }
            },
            "risk_factor": {
                "summary": {
                    "low_risk": 2
                }
            },
            "food_type": {
                "summary": {
                    "dinner": 2
                }
            },
            ...
        }
        "(u'deserts', u'epic_eat')": {
            "rating": {
                "summary": {
                    "epic_eat": 2
                }
            },
            "amount": {
                "summary": {
                    "count": 2.0,
                    "std": 2.2980970388562794,
                    "min": 2.75,
                    "max": 6.0,
                    "50%": 4.375,
                    "25%": 3.5625,
                    "75%": 5.1875,
                    "mean": 4.375
                }
            },
            "risk_factor": {
                "summary": {
                    "low_risk": 2
                }
            },
            "food_type": {
                "summary": {
                    "deserts": 2
                }
            },
            ...
        }
        ...
    }
}

Calculation formulas

Calculations are specified by a name, which is the label and a formula, which is either calculated by row or aggregated over multiple rows.

The calculation formula can contain a combination of integers, floats, and/or strings which must map to column names, as well as operators and functions (specified in the Parser).

Calculations that are aggregations can also be specified with a group and a query. The dataset will be grouped by the group parameter and limited to rows matching the query parameter.

The results of aggregations are stored in a dataset with one column for the unique groups and another for the result of the formula. This dataset is indexed by the group parameter and unique per dataset ID.

Note

When a two calculations with the same name are added the calculations are not overwritten.

The second calculation will have a label equal to the same name as the first calculation but it will have a unique slug. You can determine this slug via a dataset info call.

Note

It is possible to have the same calculation label with different formulas, but impossible to have the same calculation slug with different formulas.

Store calculation formula

curl -X POST -d "name=amount_less_than_10&formula=amount<10" http://bamboo.io/calculations/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

{
    "success": "created calulcation: water_functioning_count",
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Retrieve a list of stored calculations

curl http://bamboo.io/calculations/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

[
    {
        "formula": "amount<10",
        "group": null,
        "name": "amount_less_than_10"
    }
]

Retrieve newly calculated column

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea?select='{"amount_less_than_10":1}'

returns:

[
    {"amount_less_than_10": true},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": true},
    {"amount_less_than_10": true},
    {"amount_less_than_10": true},
    {"amount_less_than_10": true},
    {"amount_less_than_10": false},
    {"amount_less_than_10": true},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": true},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": true},
    {"amount_less_than_10": true},
    {"amount_less_than_10": false},
    {"amount_less_than_10": false},
    {"amount_less_than_10": true}
]

Delete a calculation

To delete a calculation use the format datasets/[dataset ID]/calculations/[name] or datasets/[dataset ID]/calculations?name=[name]. For example,

curl -X DELETE http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/calculations/amount_less_than_10

returns:

{
    "success": "deleted calculation: 'amount_less_than_10'",
    "id": "8a3d74711475d8a51c84484fe73f24bd151242ea"
}

Store aggregation formula

curl -X POST -d "name=sum_of_amount&formula=sum(amount)" http://bamboo.io/calculations/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

{
    "formula": "sum(amount)",
    "group": null,
    "name": "sum_of_amount"
}

Store aggregation formula with group

curl -X POST -d "name=sum_of_amount&formula=sum(amount)&group=food_type" http://bamboo.io/calculations/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

{
    "formula": "sum(amount)",
     "group": "food_type",
     "name": "sum_of_amount"
}

Store aggregation formula with multi-group

curl -X POST -d "name=sum_of_amount&formula=sum(amount)&group=food_type,rating" http://bamboo.io/calculations/8a3d74711475d8a51c84484fe73f24bd151242ea

returns:

{
    "formula": "sum(amount)",
     "group": "food_type,rating",
     "name": "sum_of_amount"
}

Retrieve lists of aggregated datasets

curl -g http://bamboo.io/datasets/8a3d74711475d8a51c84484fe73f24bd151242ea/aggregations

Returns a map of groups (included an empty group) to dataset IDs for aggregation calculations.

returns:

{
    "": "9ae0ee32b78d445588742ac818c3d533",
    "food_type": "643eaccb31e74216bfa7c16bfb0e79e5",
    "food_type,rating": "10cedc551e40418caa72495d771703b3"
}

Retrieve the linked datasets that groups on foodtype and rating

curl -g http://bamboo.io/datasets/10cedc551e40418caa72495d771703b3

Linked dataset are the same as any other dataset.

returns:

[
    {
        "rating": "epic_eat",
        "food_type": "deserts",
        "sum_of_amount": 8.75
    },
    {
        "rating": "delectible",
        "food_type": "dinner",
        "sum_of_amount": 26.0
    },
    {
        "rating": "epic_eat",
        "food_type": "lunch",
        "sum_of_amount": 22.25
    },
    {
        "rating": "delectible",
        "food_type": "street_meat",
        "sum_of_amount": 2.0
    },
    {
        "rating": "epic_eat",
        "food_type": "caffeination",
        "sum_of_amount": 2.5
    },
    {
        "rating": "epic_eat",
        "food_type": "dinner",
        "sum_of_amount": 1612.0
    },
    {
        "rating": "delectible",
        "food_type": "drunk_food",
        "sum_of_amount": 20.0
    },
    {
        "rating": "epic_eat",
        "food_type": "libations",
        "sum_of_amount": 9.5
    },
    {
        "rating": "delectible",
        "food_type": "lunch",
        "sum_of_amount": 249.0
    },
    {
        "rating": "delectible",
        "food_type": "morning_food",
        "sum_of_amount": 12.0
    },
    {
        "rating": "epic_eat",
        "food_type": "morning_food",
        "sum_of_amount": 28.0
    },
    {
        "rating": "delectible",
        "food_type": "streat_sweets",
        "sum_of_amount": 4.0
    }
]

Check the bamboo version

curl http://bamboo.io/version