Add archivist incremental sync state

Request:
Add archivist support for remembering which IPO archive stages have already been synced and which stages should be updated next.

Changes:
- Add sync_runs, ticker_sync_state, sync_tasks, and price_performance tables to the archive schema.
- Add scripts/update_sync_state.py to derive per-ticker stage status and rebuild the next-sync task queue.
- Export the new sync-state tables as Git-friendly CSV snapshots.
- Document the incremental archive flow in the archivist skill and README.

Verification:
- Ran scripts/bootstrap_historical_data.py.
- Ran scripts/update_sync_state.py with a deterministic as-of timestamp.
- Checked SQLite integrity and DB-to-snapshot row counts with Python sqlite3.
- Parsed Python scripts with ast.parse.
- Ran git diff --check and checked for temporary SQLite/cache files.

Next useful context:
- Current derived queue has 2 open tasks for 06658 and 15 waiting_until_due tasks for future stages.
This commit is contained in:
2026-06-15 06:29:54 +00:00
parent fb715cd446
commit 08db218b6d
10 changed files with 601 additions and 3 deletions
+12
View File
@@ -97,6 +97,18 @@ python3 -m venv .venv
The extractor reads PDF paths from `data/hk_ipo.sqlite`, writes derived text files under `data/extracted_text/`, and exports `data/snapshots/extracted_text_manifest.csv` with page counts, text hashes, and extraction status.
## Incremental Archive Sync
The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
```bash
python3 scripts/update_sync_state.py
```
This writes `ticker_sync_state` and `sync_tasks` into `data/hk_ipo.sqlite`, then exports `data/snapshots/ticker_sync_state.csv`, `data/snapshots/sync_tasks.csv`, and `data/snapshots/sync_runs.csv`.
Use `sync_tasks` as the next-sync queue. Tasks marked `open` are due now; tasks marked `waiting_until_due` are known future updates.
## Git Discipline
The repository uses automatic focused commits for completed project changes.