Reddit Sentiment Tracker
A command-line tool that fetches posts and comments from any public subreddit, scores their sentiment using natural language processing, and stores everything in a local database for querying and export. Built end-to-end as an independent project to deepen my Python skills and demonstrate real-world API integration, NLP, database design, and test-driven development.
The Challenge
I wanted to build something that pulls live data from a real platform, processes it with actual NLP techniques, and stores it in a way that's queryable and exportable — not a tutorial project, but a tool that works on real data today. Reddit's API approval process was still pending weeks into development, so I also needed to find a way to access live data without waiting on external gatekeepers.
My Solution
Built a complete CLI pipeline: fetch pulls posts and comments from Reddit's public .json endpoints (no API keys needed), sentiment.py scores every piece of text with VADER sentiment analysis, database.py stores everything in SQLite with timestamp indexing, and query/export commands let you filter and extract the data. The tool can also run as a continuous polling loop on a configurable interval.
Rather than waiting on Reddit's API approval, I discovered that appending .json to any public Reddit URL returns the same structured data as the authenticated API. I built the fetcher around this approach — it requires zero credentials, respects rate limits automatically, and lets anyone clone and run the tool immediately.
- Designed a modular architecture with five independent, testable modules following separation of concerns
- Implemented VADER sentiment scoring with edge-case handling (None, empty strings, truncation for long text)
- Built a rate limiter using time.monotonic() to respect Reddit's ~1 req/sec guideline, with automatic 429 retry
- Wrote recursive comment tree flattening to handle Reddit's deeply nested reply structure
- Created an idempotent storage layer (INSERT OR IGNORE) so re-fetching never duplicates data
- Separated runtime and development dependencies (requirements.txt vs requirements-dev.txt)
- Achieved 162 passing tests across all modules (~0.2s total runtime) using pytest
162 tests passing, live-verified on real Reddit data — fully functional CLI tool with zero external credentials required