This project uses dbt to transform raw data in BigQuery into clean, analytics-ready tables. The dataset was generated using Faker for testing and development purposes. Key components include:
-
Staging models: Standardize and clean raw data
-
Core marts: Fact and dimension tables
-
Analytics marts: Pre-aggregated metrics for reporting, such as:
- Monthly Revenue
- Customer Lifetime Value
- Top Products
-
Power BI dashboards can be built on top of these marts
dbt_project/
│── models/
│ ├── staging/
│ ├── marts/
│ │ ├── core/
│ │ └── analytics/
- Python (>=3.9)
- dbt-core and dbt-bigquery
- Access to BigQuery (project ID, dataset, and read/write permissions)
- Power BI Desktop (optional)
- Clone the repository
git clone <repo_url>
cd dbt_project- Configure dbt profile for BigQuery
Edit
~/.dbt/profiles.yml:
my_project:
target: dev
outputs:
dev:
type: bigquery
project: your-gcp-project
dataset: your_dataset
threads: 4
timeout_seconds: 300- Install dependencies
pip install dbt-core dbt-bigquery- Run dbt
dbt deps
dbt seed
dbt run
dbt test - Generate documentation
dbt docs generate
dbt docs serve- Query your materialized tables in BigQuery to check metrics:
SELECT DATE_TRUNC(order_date, MONTH) AS month,
SUM(order_amount) AS monthly_revenue
FROM `your_project.your_dataset.fct_orders`
GROUP BY month
ORDER BY month;- Use sample queries on small slices or partitions to minimize BigQuery costs.
- Open Power BI Desktop → Get Data → Google BigQuery
- Sign in with your Google account or service account key
- Select your BigQuery project and dataset
- Load analytics marts (
monthly_revenue,customer_lifetime_value,top_products) for dashboards - Recommended: Use aggregated tables to reduce query costs
- The dataset was generated using Faker for testing purposes.
- Always filter or limit queries to reduce costs when verifying data
- Consider dbt snapshots for slowly changing dimensions
- Pre-aggregated analytics tables make dashboards faster and cheaper