Stage 2 — Payload Sync¶

Status: 🟠 In progress

Classes¶

Class	File	Role
`JobsAustriaCacheProcess`	`jobs_austria_cache_process_data_payload.py`	Orchestration loop — polls and calls the synchronizer
`JobsAustriaCacheSynchronizer`	`jobs_austria_cache_synchronizer.py`	Business logic — unpacks payload, writes to `jobs`, updates FKs

Polls scrape_cache every 30 seconds for rows where fk_job_id IS NULL
Fetches a batch and unpacks the data_payload JSON column
Extracts: url, url_hash, position, company, location, publication_date, portal
Inserts new rows into jobs (deduplicates via url_hash unique constraint)
Writes jobs.id back into scrape_cache.fk_job_id — marks the row as processed
Enriches jobs with company_id, location_id, publication_date, portal
Repeats until the queue is empty

Issue	File	Notes
Overlapping responsibilities	Both files	`JobsAustriaCacheProcess` and `JobsAustriaCacheSynchronizer` overlap — should be consolidated
`process_once()` does too much	`cache_process_data_payload.py`	Needs to be split into focused, single-responsibility helpers
`_extract_portal()` duplicated	Multiple files	Same function exists in `CacheSynchronizer` and `DetailsETL` — move to `utils/parsing.py`
`_parse_date()` duplicated	Multiple files	Same as above
Leftover draft class	`jobs_austria_cache_key_sync.py`	`JobsAustriaCacheProcessRework` is an unused draft — safe to delete

Extract _extract_portal(), _parse_date(), _str_or_none() into a shared utils/parsing.py
Consolidate the two classes once the refactor is stable