Add archivist incremental sync state

Request: Add archivist support for remembering which IPO archive stages have already been synced and which stages should be updated next. Changes: - Add sync_runs, ticker_sync_state, sync_tasks, and price_performance tables to the archive schema. - Add scripts/update_sync_state.py to derive per-ticker stage status and rebuild the next-sync task queue. - Export the new sync-state tables as Git-friendly CSV snapshots. - Document the incremental archive flow in the archivist skill and README. Verification: - Ran scripts/bootstrap_historical_data.py. - Ran scripts/update_sync_state.py with a deterministic as-of timestamp. - Checked SQLite integrity and DB-to-snapshot row counts with Python sqlite3. - Parsed Python scripts with ast.parse. - Ran git diff --check and checked for temporary SQLite/cache files. Next useful context: - Current derived queue has 2 open tasks for 06658 and 15 waiting_until_due tasks for future stages.
2026-06-15 06:29:54 +00:00
parent fb715cd446
commit 08db218b6d
10 changed files with 601 additions and 3 deletions
@@ -97,6 +97,18 @@ python3 -m venv .venv

 The extractor reads PDF paths from `data/hk_ipo.sqlite`, writes derived text files under `data/extracted_text/`, and exports `data/snapshots/extracted_text_manifest.csv` with page counts, text hashes, and extraction status.

+## Incremental Archive Sync
+
+The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
+
+```bash
+python3 scripts/update_sync_state.py
+```
+
+This writes `ticker_sync_state` and `sync_tasks` into `data/hk_ipo.sqlite`, then exports `data/snapshots/ticker_sync_state.csv`, `data/snapshots/sync_tasks.csv`, and `data/snapshots/sync_runs.csv`.
+
+Use `sync_tasks` as the next-sync queue. Tasks marked `open` are due now; tasks marked `waiting_until_due` are known future updates.
+
 ## Git Discipline

 The repository uses automatic focused commits for completed project changes.