CDC
pg-cdc
PostgreSQL CDC to Parquet
pg-cdc follows PostgreSQL replication semantics to stream change data into compacted Parquet in cloud storage, giving teams a clear path from WAL to lake-style datasets.
CLI examples
Start the CDC server
pg-cdc serve --dsn $DATABASE_URLAttach to PostgreSQL through the replica pattern.
Write to storage
pg-cdc stream --target s3://bucket/prefixPersist typed change batches as Parquet.
Compact output
pg-cdc compact --target s3://bucket/prefixKeep downstream partitions query-friendly.
What it does
A focused workflow, not a generic platform pitch
Each product page follows the same template so the navigation scales while the content stays readable.
Reads PostgreSQL changes through the native replication pattern.
Writes typed, compacted Parquet files for downstream analytics.
Keeps CDC understandable enough for platform and data teams to operate together.
Architecture diagram
pg-cdc in the stack
Dark, monospace, and direct. The point is legibility, not decoration.
PostgreSQL WAL
Native logical replication source
CDC Server
Streams, types, and batches change events
Cloud Storage
Compacted Parquet landing zone
Warehouse + AI
Feeds analytics, feature pipelines, and agent context
Key features
What makes this product useful in practice
The feature list stays product-specific while reusing the same card language across the site.
Native Postgres pattern
Use replication semantics teams already understand instead of an opaque connector stack.
Typed Parquet output
Land changes in a portable format that works for analytics and ML workflows.
Compaction built in
Keep downstream files usable as change volume grows.
Pairs with pg-warehouse
Use CDC as the ingestion layer and pg-warehouse as the local-first transform path.
Use cases
Where teams get leverage
Simple, concrete use cases are more credible than broad category claims.
Postgres-to-lake ingestion
Create a reliable landing zone for operational change data.
Analytics foundations
Feed DuckDB, warehouse, or BI workflows from the same change stream.
Agent context capture
Expose database changes as structured evidence for AI-assisted operations.
GitHub plus consulting
A clean split between GitHub evaluation and consulting rollout
Each repo should be useful on its own, with Burnside consulting available when teams want help turning it into a production workflow.
Open source
Core CDC server
PostgreSQL replication workflow
Parquet output path
Consulting
Storage layout and retention design
Cloud deployment and security review
Data pipeline adoption support
Call to action
Start building with pg-cdc
Review the repo, then bring Burnside in when you want help applying it to a real PostgreSQL environment.