CDC

pg-cdc

PostgreSQL CDC to Parquet

pg-cdc follows PostgreSQL replication semantics to stream change data into compacted Parquet in cloud storage, giving teams a clear path from WAL to lake-style datasets.

Get Started View GitHub

CLI examples

Start the CDC server

pg-cdc serve --dsn $DATABASE_URL

Attach to PostgreSQL through the replica pattern.

Write to storage

pg-cdc stream --target s3://bucket/prefix

Persist typed change batches as Parquet.

Compact output

pg-cdc compact --target s3://bucket/prefix

Keep downstream partitions query-friendly.

What it does

A focused workflow, not a generic platform pitch

Each product page follows the same template so the navigation scales while the content stays readable.

Reads PostgreSQL changes through the native replication pattern.

Writes typed, compacted Parquet files for downstream analytics.

Keeps CDC understandable enough for platform and data teams to operate together.

Architecture diagram

pg-cdc in the stack

Dark, monospace, and direct. The point is legibility, not decoration.

01->

PostgreSQL WAL

Native logical replication source

02->

CDC Server

Streams, types, and batches change events

03->

Cloud Storage

Compacted Parquet landing zone

Warehouse + AI

Feeds analytics, feature pipelines, and agent context

Key features

What makes this product useful in practice

The feature list stays product-specific while reusing the same card language across the site.

Native Postgres pattern

Use replication semantics teams already understand instead of an opaque connector stack.

Typed Parquet output

Land changes in a portable format that works for analytics and ML workflows.

Compaction built in

Keep downstream files usable as change volume grows.

Pairs with pg-warehouse

Use CDC as the ingestion layer and pg-warehouse as the local-first transform path.

Use cases

Where teams get leverage

Simple, concrete use cases are more credible than broad category claims.

Postgres-to-lake ingestion

Create a reliable landing zone for operational change data.

Analytics foundations

Feed DuckDB, warehouse, or BI workflows from the same change stream.

Agent context capture

Expose database changes as structured evidence for AI-assisted operations.

GitHub plus consulting

A clean split between GitHub evaluation and consulting rollout

Each repo should be useful on its own, with Burnside consulting available when teams want help turning it into a production workflow.

Open source

Core CDC server

PostgreSQL replication workflow

Parquet output path

Consulting

Storage layout and retention design

Cloud deployment and security review

Data pipeline adoption support

Call to action

Start building with pg-cdc

Review the repo, then bring Burnside in when you want help applying it to a real PostgreSQL environment.

Get Started View GitHub