<- back to projects

ETFtracker

ETF holdings pipeline and analytics

PostgreSQLTimescaleDBPythonAlpaca API

// overview

A data engineering pipeline that automates ingestion and analysis of daily holdings data for 5+ ARK ETFs. Features a TimescaleDB-backed time-series database optimized for high-frequency financial data queries.

// architecture

A Python ETL pipeline integrates with the Alpaca API to fetch daily fund holdings. Data flows through extraction, transformation, and loading stages into a PostgreSQL database extended with TimescaleDB for time-series optimization. Complex SQL window functions power the analytics layer.

// challenges

  • ·Designing a time-series schema that could handle high-frequency daily updates while maintaining query performance for historical trend analysis
  • ·Building reliable data pipelines with proper error handling for API rate limits and intermittent network failures
  • ·Writing complex SQL aggregations and window functions to accurately track allocation changes across multiple funds over time

// learnings

  • ·TimescaleDB's hypertable abstraction dramatically simplifies time-series data management compared to manual partitioning
  • ·ETL pipelines need idempotency built in from day one — reprocessing data should produce identical results
  • ·Window functions in SQL are incredibly powerful for financial analysis but require careful attention to frame specifications