<- back to projects
ETFtracker
ETF holdings pipeline and analytics
PostgreSQLTimescaleDBPythonAlpaca API
// overview
A data engineering pipeline that automates ingestion and analysis of daily holdings data for 5+ ARK ETFs. Features a TimescaleDB-backed time-series database optimized for high-frequency financial data queries.
// architecture
A Python ETL pipeline integrates with the Alpaca API to fetch daily fund holdings. Data flows through extraction, transformation, and loading stages into a PostgreSQL database extended with TimescaleDB for time-series optimization. Complex SQL window functions power the analytics layer.
// challenges
- ·Designing a time-series schema that could handle high-frequency daily updates while maintaining query performance for historical trend analysis
- ·Building reliable data pipelines with proper error handling for API rate limits and intermittent network failures
- ·Writing complex SQL aggregations and window functions to accurately track allocation changes across multiple funds over time
// learnings
- ·TimescaleDB's hypertable abstraction dramatically simplifies time-series data management compared to manual partitioning
- ·ETL pipelines need idempotency built in from day one — reprocessing data should produce identical results
- ·Window functions in SQL are incredibly powerful for financial analysis but require careful attention to frame specifications