ETFtracker

ETF holdings pipeline and analytics

PostgreSQLTimescaleDBPythonAlpaca API

// overview

A data engineering pipeline that automates ingestion and analysis of daily holdings data for 5+ ARK ETFs. Features a TimescaleDB-backed time-series database optimized for high-frequency financial data queries.

// architecture

A Python ETL pipeline integrates with the Alpaca API to fetch daily fund holdings. Data flows through extraction, transformation, and loading stages into a PostgreSQL database extended with TimescaleDB for time-series optimization. Complex SQL window functions power the analytics layer.

// challenges

·Designing a time-series schema that could handle high-frequency daily updates while maintaining query performance for historical trend analysis
·Building reliable data pipelines with proper error handling for API rate limits and intermittent network failures
·Writing complex SQL aggregations and window functions to accurately track allocation changes across multiple funds over time

// learnings

·TimescaleDB's hypertable abstraction dramatically simplifies time-series data management compared to manual partitioning
·ETL pipelines need idempotency built in from day one — reprocessing data should produce identical results
·Window functions in SQL are incredibly powerful for financial analysis but require careful attention to frame specifications