Stage 3 — Detail Enrichment¶

Status: 🟠 In progress

Classes¶

Class	File	Role
`JobsAustriaDetailsETL`	`jobs_austria_details_scraping.py`	Main ETL — fetches pending URLs, fires Apify actors, writes results
`PortalRouter`	`jobs_austria_details_scraping.py`	Routes URLs to the correct Apify actor `run_input` by portal

Polls jobs for rows where order_number IS NULL — these have not been detail-scraped yet
Routes each URL by portal via PortalRouter (currently AMS only — others grouped as unknown and skipped)
Fires Apify detail actors in batches of 100 URLs, max 3 concurrent actors
Streams results back via a producer/consumer async queue
Writes enriched fields to jobs: order_number, education, salary, employment_relationship
Inserts full job descriptions into the descriptions table
Repeats until the queue is empty

Issue	File	Notes
`_fetch_pending_urls()` creates its own engine	`jobs_austria_details_scraping.py`	Should reuse `self.engine` instead
`_PORTAL_INPUTS` accessed from outside the class	`jobs_austria_details_scraping.py`	Should be private — access via a method
`_extract_portal()` duplicated	Multiple files	Same function as in `CacheSynchronizer` — move to `utils/parsing.py`
Only AMS supported	`PortalRouter`	Non-AMS URLs silently fall through as `unknown` and are never scraped

Add crawl4ai_jobs run_input to PortalRouter._PORTAL_INPUTS for non-AMS portals
Add salary scraping from external salary benchmarking sites
Extract shared parsing utilities to utils/parsing.py