Project Summary¶
Overview¶
JobsIntelligence is an automated data pipeline that collects, processes, and stores job market data from Austrian job portals. It provides Interconnection Consulting with structured, queryable intelligence on the Austrian job market — positions, companies, locations, employment types, and salary indicators — updated on a regular schedule without manual intervention.
Latest Updates¶
March 2026
- Created comprehensive technical documentation hosted as a browsable website
- Increased data processing efficiency through async scraping — multiple job listings fetched simultaneously instead of one at a time
- Built deduplication system — eliminates redundant data, keeping the database clean and storage costs low
- Completed core pipeline — data is now automatically collected, processed, and stored end to end
- Implemented multi-stage enrichment — raw scraped data is progressively structured into queryable fields (company, location, employment type, salary)
Implementation Status¶
| Stage | Status | Description |
|---|---|---|
| 1a — Real-time scraping | 🔴 Not built | Incremental scrape every 30 min. Stops when cached URLs detected. |
| 1b — Full refresh | 🟠 Needs improvements | Full scrape every Monday. Deduplicates via INSERT IGNORE. |
| 2 — Payload Sync | 🟠 In progress | Imports raw scrape data into jobs, companies, locations. |
| 3 — Detail Enrichment | 🟠 In progress | Scrapes full job detail pages. Writes descriptions, salary, education. |
| 4 — Additional Info | 🔴 Not built | LinkedIn data, company firmographics, salary benchmarks. |
Planned Updates¶
- Real-time data refresh — update listings every 30 minutes instead of weekly full refreshes only
- Job detail enrichment — capture full job descriptions and additional metadata per listing
- Automated scheduling — full pipeline orchestration via Apache Airflow with monitoring and alerting
- Slovak market expansion — extend the pipeline to cover Slovak job boards alongside Austrian data
- Analytics and reporting layer — market intelligence reports and trend analysis on top of collected data