Roadmap¶
Planned features and improvements, ordered by priority.
Short Term¶
- Build real-time incremental scraper — scrape every 30 minutes, stop when cached URLs detected
- Implement stop-signal mechanism — abort Apify actor mid-scrape via webhook when all page URLs are already cached
- Delete
jobs_austria_cache_key_sync.py— confirmed duplicate, safe to remove - Build Airflow DAG — orchestrate all pipeline stages with proper dependencies and retry logic
- Write unit tests for synchronizer and payload processor
Medium Term¶
- Build additional info enrichment stage — LinkedIn data, company firmographics, salary benchmarks
- Build JobsSlovakia pipeline — mirror JobsAustria architecture for Slovak job boards
- Extract shared utilities into
utils/parsing.py—_extract_portal(),_parse_date(),_str_or_none()
Long Term¶
- Build analytics layer — trend reports and market intelligence queries on top of normalized data
- Write formal business requirements document — define what the pipeline produces and for whom
- Explore real-time dashboard for job market monitoring