Add archivist incremental sync state
Request: Add archivist support for remembering which IPO archive stages have already been synced and which stages should be updated next. Changes: - Add sync_runs, ticker_sync_state, sync_tasks, and price_performance tables to the archive schema. - Add scripts/update_sync_state.py to derive per-ticker stage status and rebuild the next-sync task queue. - Export the new sync-state tables as Git-friendly CSV snapshots. - Document the incremental archive flow in the archivist skill and README. Verification: - Ran scripts/bootstrap_historical_data.py. - Ran scripts/update_sync_state.py with a deterministic as-of timestamp. - Checked SQLite integrity and DB-to-snapshot row counts with Python sqlite3. - Parsed Python scripts with ast.parse. - Ran git diff --check and checked for temporary SQLite/cache files. Next useful context: - Current derived queue has 2 open tasks for 06658 and 15 waiting_until_due tasks for future stages.
This commit is contained in:
@@ -97,6 +97,18 @@ python3 -m venv .venv
|
||||
|
||||
The extractor reads PDF paths from `data/hk_ipo.sqlite`, writes derived text files under `data/extracted_text/`, and exports `data/snapshots/extracted_text_manifest.csv` with page counts, text hashes, and extraction status.
|
||||
|
||||
## Incremental Archive Sync
|
||||
|
||||
The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
|
||||
|
||||
```bash
|
||||
python3 scripts/update_sync_state.py
|
||||
```
|
||||
|
||||
This writes `ticker_sync_state` and `sync_tasks` into `data/hk_ipo.sqlite`, then exports `data/snapshots/ticker_sync_state.csv`, `data/snapshots/sync_tasks.csv`, and `data/snapshots/sync_runs.csv`.
|
||||
|
||||
Use `sync_tasks` as the next-sync queue. Tasks marked `open` are due now; tasks marked `waiting_until_due` are known future updates.
|
||||
|
||||
## Git Discipline
|
||||
|
||||
The repository uses automatic focused commits for completed project changes.
|
||||
|
||||
Reference in New Issue
Block a user