Todos los artículos
Data Pipelines
Analytics

Data Pipelines Explained: What Happens Between Your Tools and Your Dashboard

1 de abril de 20266 min read

You have tools — maybe Stripe for payments, HubSpot for CRM, Google Ads for marketing, and QuickBooks for accounting. You want a dashboard that shows you everything in one place. Between those two points, something needs to happen.

That something is a data pipeline. Here is how it works, in plain language.

The Three Steps: Extract, Transform, Load

Every data pipeline does three things:

Extract — Pull data out of your source systems. This means connecting to the Stripe API and downloading transaction records, connecting to HubSpot and downloading deal data, and so on.

Transform — Clean and reshape that data so it is useful. Raw Stripe data has hundreds of fields you do not need. A transformation step filters it down to what matters: transaction date, amount, customer, product, status.

Load — Put the cleaned data somewhere you can query it. Usually this is a data warehouse (more on that in a moment).

ETL vs. ELT: The Order Matters

Historically, companies would Extract data, Transform it, then Load it into the destination. This is ETL — Extract, Transform, Load.

The modern approach flips the last two steps: Extract, Load, then Transform. This is ELT. You pull the raw data into the warehouse first, then clean it up inside the warehouse.

Why does this matter to you? ELT is cheaper, faster, and more flexible for most companies under $50M. You do not need to set up complex transformation pipelines before you can see your data. Load it raw, then model it as needed.

The Tools: What Does What

Ingestion tools (the Extract and Load part):

Fivetran and Airbyte are the two most common. They are connectors — pre-built integrations that know how to pull data from hundreds of SaaS tools and load it into a warehouse.

Fivetran is managed and reliable. You configure a connector, point it at your warehouse, and it syncs automatically. It costs roughly $1-$2 per million rows synced. For most small-to-mid-sized companies, the bill is $300-$1,000/month.

Airbyte is open-source and self-hosted (or cloud-hosted). Cheaper, more flexible, but requires more technical management.

The warehouse (where data lives):

BigQuery (Google), Snowflake, and Redshift (AWS) are the three major options. Think of the warehouse as a giant, fast, queryable database that holds all your data in one place.

For companies under $50M, BigQuery is often the best starting point. The free tier is generous, scaling is automatic, and the pricing is pay-per-query, which keeps costs low at smaller volumes. Most companies in this range spend $50-$500/month on warehousing.

Transformation tools (the modeling layer):

dbt (data build tool) is the industry standard. It sits on top of your warehouse and lets you write SQL-based transformations that turn raw data into clean, business-ready tables.

For example, dbt takes your raw Stripe transactions, your raw HubSpot deals, and your raw QuickBooks entries, and creates a unified "revenue" table that combines all three with consistent definitions.

dbt Cloud costs $50-$100/month for small teams. The open-source version (dbt Core) is free.

BI tools (the dashboard layer):

Looker Studio (free), Metabase (free/open-source), Tableau, or Looker connect to the warehouse and visualize the modeled data. This is where your dashboards live.

How It All Fits Together

Here is the typical flow:

1. Fivetran extracts data from Stripe, HubSpot, Google Ads, QuickBooks 2. Fivetran loads raw data into BigQuery 3. dbt transforms the raw data into clean, modeled tables 4. Looker Studio (or Metabase) reads the modeled tables and displays dashboards

The pipeline runs automatically — usually every hour or every day, depending on your needs.

What This Costs in Total

For a typical company with 4-6 data sources:

| Component | Monthly Cost | |-----------|-------------| | Fivetran | $300-$800 | | BigQuery | $50-$300 | | dbt Cloud | $50-$100 | | Looker Studio | Free | | Total | $400-$1,200/month |

Add a one-time setup cost of $5,000-$15,000 to build the pipeline, configure the models, and create the dashboards.

That is the entire data stack. No mystery, no magic. Just plumbing that moves your data from where it lives to where it is useful.

¿Listo para automatizar tus operaciones?

Agenda una llamada gratuita de 20 minutos. Diagnosticamos qué está fallando y te decimos si podemos ayudar.

Automatiza Mi Negocio