Skip to content

Roadmap

Planned features and improvements, ordered by priority.


Short Term

  • Build real-time incremental scraper — scrape every 30 minutes, stop when cached URLs detected
  • Implement stop-signal mechanism — abort Apify actor mid-scrape via webhook when all page URLs are already cached
  • Delete jobs_austria_cache_key_sync.py — confirmed duplicate, safe to remove
  • Build Airflow DAG — orchestrate all pipeline stages with proper dependencies and retry logic
  • Write unit tests for synchronizer and payload processor

Medium Term

  • Build additional info enrichment stage — LinkedIn data, company firmographics, salary benchmarks
  • Build JobsSlovakia pipeline — mirror JobsAustria architecture for Slovak job boards
  • Extract shared utilities into utils/parsing.py_extract_portal(), _parse_date(), _str_or_none()

Long Term

  • Build analytics layer — trend reports and market intelligence queries on top of normalized data
  • Write formal business requirements document — define what the pipeline produces and for whom
  • Explore real-time dashboard for job market monitoring