Skip to main content

Fact Tables

With Fact Tables, you write SQL once for an event (e.g. SELECT * FROM orders) and from that you can quickly and easily create a bunch of related metrics - for example, Revenue per User, Average Order Value, and Items per Order.

These metrics can be used as Goals and Guardrails in Experiments.

New

Fact Tables are a brand new feature of GrowthBook. If you experience any issues or have questions, check out our Slack Channel or file an issue on GitHub.

Fact Table SQL

First, you have to write SQL that transforms your event data into a format that GrowthBook understands. The following column names are required:

  • timestamp - The date the event happened
  • One column per supported identifier type (e.g. user_id and anonymous_id). The possible identifier types depend on your data source settings.

Any other columns you return can be used to create metrics or used as a filter.

Here's a full example:

SELECT
-- Required columns
purchase_date as timestamp,
user_id,
anonymous_id,
-- Additional columns (depends on your event)
amount,
coupon_code,
device_type
FROM
orders
WHERE
status = 'completed'

Columns

When you create a Fact Table, we query the first few rows and inspect the returned data to determine which columns you've selected and what data types they are.

This process isn't perfect. For example, if the first few rows happen to have null as the value of a column, we can't tell if it's supposed to be a number, string, etc.. When this happens, we will show you a warning and you can manually specify the type.

Any column can be used to create filters, but only numeric columns can be used as a metric value.

Number Formats

For numeric columns, you can also specify a number format, which controls how the metric is displayed on the front-end. Possible values are currency and time:seconds.

note

The default display currency is USD, but you can change it under Settings > General > Metric Settings

Filters

Filters are reusable SQL snippets to filter the rows in the Fact Table. These let you easily create metric variants. For example, Purchase, Mobile Purchase, and Desktop Purchase

Here's an example Filter SQL:

device_type = 'mobile'

If a metric uses this Filter, GrowthBook will add it to the WHERE clause automatically. If there are multiple Filters, they will be ANDed together.

Metrics

There are 3 types of supported Fact Table metrics today - Proportion, Mean, and Ratio.

Proportion Metrics

Proportion Metrics measure the percent of users in an experiment who exist in a Fact Table. The number of rows the user has doesn't matter - we just care whether or not they have at least 1 row.

For example, if you have an Orders Fact Table, you can create a Proportion Metric to see what percent of users purchased something. Whether a user made 1 purchase or 10 purchases, it will just count as 1 conversion.

You can optionally add Filters to limit which rows are considered. For example, adding a Mobile filter, you can see what percent of users completed a purchase on a mobile device.

Mean Metrics

Mean Metrics are useful when there are more than 2 possible states a user can be in. For example, instead of just purchase/not purchase, you might care about the average number of orders a user makes, or the average revenue you earned per user in an experiment.

For the metric value, there are 2 types of aggregations you can do - COUNT(*) and SUM(column). For SUM, you will be able to choose any numeric column in your Fact Table.

For example, an Orders per User metric would use COUNT(*) since you want to count the total number of rows. A Revenue per User metric would use SUM and select the column where the order total is stored.

The denominator for Mean Metrics is always the number of users in an experiment variation. So, even though you may be doing SUM(revenue) as the aggregation, it will be divided by number of users and you will actually end up with an average value per user, not an overall sum for the variation.

Ratio Metrics

Ratio Metrics let you pick a custom denominator and allow for metrics such as Average Order Value (denominator is number of orders) or Session Duration (denominator is number of sessions).

For both the numerator and denominator, you can select a Fact Table, an aggregation, and optional filters. GrowthBook allows the numerator and denominator to have different Fact Tables, which can open up some really advanced use cases.

The types of aggregations supported for ratio metrics are:

  • COUNT(*) - Number of rows in the Fact Table
  • COUNT(DISTINCT 'User Identifier') - Sample size of the experiment variation
  • SUM(column) - Total of a numeric column in the Fact Table

Under the hood, GrowthBook uses the Delta Method to accurately determine the variance.

For example, to create a Session Duration metric, you would do the following:

  • Numerator: SUM(duration) from a Sessions fact table
  • Denominator: COUNT(*) from a Sessions fact table

Conversion Windows

When used in an experiment, we only consider rows in a Fact Table where the timestamp is greater than or equal to the first time the user was exposed to the experiment. In other words, if someone purchases something before seeing your experiment, it won't be included in the analysis. This behavior is ideal for the vast majority of metrics, but you can change it with the Conversion Delay setting if desired (see Advanced Settings below).

The first exposure date sets the lower bound on the rows that are included. There are 2 options for setting the upper bound:

  • No conversion window (default) - use the experiment end date as the upper bound
  • Specific conversion window (e.g. 3 days) - only include rows within the specified time range of the first exposure event

Advanced Settings

What is the Goal?

For the vast majority of metrics, the goal is to increase the value. But for some metrics like "Bounce Rate" and "Page Load Time", lower is actually better.

Setting this to "decrease" basically inverts the "Chance to Beat Control" value in experiment results so that "beating" the control means decreasing the value. This will also reverse the red and green coloring on graphs.

Capped Value

Large outliers can have an outsized effect on experiment results. For example, if your normal revenue per user $40 and someone happens to make a $5000 order, whatever variation that person is in will be much more likely "win" any experiment because that one order is an outlier.

Capping (also known as winsorization) works by ensuring that all aggregate unit (e.g. user) values are no more than some value. So in the above example, if the cap was $100, the $5000 purchase will still be counted, but the aggregated value for that user will be capped at $100 and will have a much smaller effect on the results. It will still give a boost to whatever variation the person is in, but it won't completely dominate all of the other orders and is unlikely to make a winner just on its own. Another way to think about this is that you are slightly biasing your results by truncating large values, but you are reducing variance to prevent the outsized effect of outliers.

There are two ways to cap metric values in GrowthBook:

1. Absolute capping - if set above zero, all aggregated user values will be capped at exactly this value. For example, if the cap is $100 on total revenue per user, then after we sum all of a users orders up, any user with an aggregate sum of greater than $100 will be set to $100.

2. Percentile capping - when this is set to between 0 and 1, it uses that percentile to select a cap based on the data in your experiment so far. This cap is therefore specific to each experiment and specific to each analysis run in that experiment if new data has come in. It works like so: after we calculate the unit-level aggregate values for all units (e.g. users) during an experiment analysis, we find the specified percentile of these unit-level aggregates and then cap these aggregated values at this percentile. Using the above example, if you were to specify percentile capping with a value of 0.95, then we find the 95th percentile of total revenue per users (say this turns out to be $135). We then cap those user-level aggregates at $135.

Because the percentile cap depends on the data in your experiment, it can be different from experiment to experiment, or even analysis to analysis. To find out what value was actually used for capping you can do the following: on the Experiment Results tab, click the three dot menu in the top right and select "View Queries". Each percentile capped metric will have a column with the main_cap_value that was used to cap that metric and represents the computed percentile of unit-level aggregate values.

Conversion Delay

Conversions within the first X hours of being put into an experiment are ignored (default = 0). This is useful for metrics like "day 2 retention". In that case, if your underlying table reports whether a user is retained on any given day, you could set a conversion delay to 24 hours.

Negative conversion delays

The conversion delay can also be negative to include some conversions before a user is put into an experiment. For example, a value of -2 would mean conversions up to 2 hours before will be included. You might be wondering when this would ever be useful.

Imagine the average person stays on your site for 60 seconds and your experiment can trigger at any time.

If you just look at the average time spent after the experiment, the numbers will lose a lot of meaning. A value of 20 seconds might be horrible if it happened to someone after only 5 seconds on your site since they are staying a lot less time than average. But, that same 20 seconds might be great if it happened to someone after 55 seconds since their visit is a lot longer than usual. Over time, these things will average out and you can eventually see patterns, but you need an enormous amount of data to get to that point.

If you set the conversion delay to something negative, say -0.5 (30 minutes), you can reduce the amount of data you need to see patterns. For example, you may see your average go from 60 seconds to 65 seconds.

Keep in mind, these two things are answering slightly different questions. How much longer do people stay after viewing the experiment? vs How much longer is an average session that includes the experiment?. The first question is more direct and often a more strict test of your hypothesis, but it may not be worth the extra running time.

Migrating Existing Metrics to Fact Tables

Fact Tables are brand new to GrowthBook, first launching in October 2023. Eventually, we see Fact Tables completely replacing the existing way of defining metrics. Right now though, Fact Tables are still in early preview mode and there are some rough edges.

If you'd like to get an early start migrating your existing metrics, there are a few differences to be aware of:

Reusable Definitions

Metric definitions have been split up into a few different reusable pieces. It's a couple more steps to create your first metric, but it should drastically simplify adding subsequent ones and building out your metric library.

  • The SQL and supported user identifiers are defined in the Fact Table
  • The display formatting (e.g. currency vs duration) is attached to a Fact Table Column
  • Any WHERE clauses for metrics are defined as Filters (e.g. device_type = 'mobile')

Once these pieces are in place, defining Metrics is now much easier. The form went from 13+ steps down to just 4 and no longer requires any specialized SQL or database knowledge.

Metric SQL

Before, SQL for a metric would select at most 1 numeric column and it had to be named value. With Fact Tables, you can have as many numeric columns as you want and there are no naming restrictions. This allows you to have 1 complex SQL definition be re-used across many related metrics.

Also, Custom aggregations are no longer supported. Fact Tables only support a few pre-defined aggregations - COUNT and SUM (we may add more in the future). Using these pre-defined aggregations greatly simplifies the queries we run and enables advanced performance and cost optimizations.

Metric Types

Metric types have changed.

  • Binomial Metrics have been renamed to Proportion Metrics
  • Count, Duration, and Revenue are all now just Mean Metrics (the display formatting is controlled by the Fact Table columns now)
  • Instead of just adding a denominator to any metric, there is now a dedicated type for Ratio Metrics.

Ratio Metrics

There are a lot of changes to how Ratio Metrics are defined and how they behave.

Ratio metrics are now self-contained. Previously, the denominator would just be a pointer to an entirely separate metric. This allowed some weird edge cases like the denominator itself being a ratio metric with it's own denominator. With Fact Tables, you define both the numerator and denominator in one flow and these nested denominator edge cases are no longer supported.

During analysis, Ratio metrics used to behave like a Funnel - the denominator had to happen first before the numerator. This was unintuitive for many people, so With Fact Tables, we changed them to act like true ratios - we calculate the numerator and denominator independently and then divide them. We plan to add a dedicated Funnel metric type in the future to support this use case.

Lastly, Ratio metrics now have a single conversion window and capping behavior. Previously, this would be controlled separately for the numerator and denominator. Because of this change, only "percentile" capping is allowed for ratio metrics.

Attribution Model and Conversion Windows

The Attribution Model is now controlled on a per-metric basis. If you disable conversion windows for a metric, it will force the experiment to always use the Experiment Duration attribution model for that metric during analysis.

This means an experiment can now have a mix of some metrics with conversion windows and some without. There is no longer a single setting at the experiment level.