Backfill first HKEX IPO document batch

Request:
Start progressively filling detailed information for recent HK IPO targets.

Changes:
- Add scripts/archive_hkex_documents.py to map tickers to HKEXnews stock IDs, select official prospectus and allotment-results PDFs, archive them under data/raw/{ticker}, parse high-confidence T0/T1 facts, export snapshots, and refresh sync state.
- Document the small-batch HKEX document backfill workflow in README.md and the archivist skill.
- Archive prospectus and allotment-results PDFs for 00901, 01081, 01779, 02290, 02553, and 03388.
- Fill T0 details including application dates, expected allotment date, board lot, minimum subscription amount, and offer-share counts for the six tickers.
- Fill T1 allotment-demand details including valid/successful applications, public subscription level, international placees, international subscription level, and final offer-share allocations.
- Refresh source_refs, ipo_master, offering_terms, ipo_demand, ticker_sync_state, and sync_tasks snapshots.

Verification:
- Ran archive_hkex_documents.py in a first small batch and re-ran corrected tickers after parser hardening.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, source files exist, and SHA-256 hashes match.
- Confirmed batch field completeness for the six processed tickers.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- This batch added about 55MB of official HKEXnews PDFs.
- Sync state now has 16 complete stages, 1993 pending_due stages, and 42 pending_not_due stages.
- Continue with small --limit batches because HKEXnews title search can include historical or postponed offering documents for the same stock code.
This commit is contained in:
2026-06-15 07:07:46 +00:00
parent c65b20a1c4
commit 993d7b26fa
23 changed files with 4908 additions and 4110 deletions
+7
View File
@@ -1,3 +1,10 @@
sync_run_id,mode,as_of,started_at,finished_at,status,notes
sync_state_20260615T073000Z,recent_ipo_list_refresh,2026-06-15T07:30:00Z,2026-06-15T07:30:00Z,2026-06-15T07:30:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T081500Z,hkex_document_archive,2026-06-15T08:15:00Z,2026-06-15T08:15:00Z,2026-06-15T08:15:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T082000Z,hkex_document_archive,2026-06-15T08:20:00Z,2026-06-15T08:20:00Z,2026-06-15T08:20:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T082500Z,hkex_document_archive,2026-06-15T08:25:00Z,2026-06-15T08:25:00Z,2026-06-15T08:25:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T083000Z,hkex_document_archive,2026-06-15T08:30:00Z,2026-06-15T08:30:00Z,2026-06-15T08:30:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T083500Z,hkex_document_archive,2026-06-15T08:35:00Z,2026-06-15T08:35:00Z,2026-06-15T08:35:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T084500Z,hkex_document_archive,2026-06-15T08:45:00Z,2026-06-15T08:45:00Z,2026-06-15T08:45:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T085000Z,hkex_document_archive,2026-06-15T08:50:00Z,2026-06-15T08:50:00Z,2026-06-15T08:50:00Z,complete,Derived ticker sync state refreshed.
sync_state_seed_2026_06_15,bootstrap_state_refresh,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,complete,Derived ticker sync state refreshed.
1 sync_run_id mode as_of started_at finished_at status notes
2 sync_state_20260615T073000Z recent_ipo_list_refresh 2026-06-15T07:30:00Z 2026-06-15T07:30:00Z 2026-06-15T07:30:00Z complete Derived ticker sync state refreshed.
3 sync_state_20260615T081500Z hkex_document_archive 2026-06-15T08:15:00Z 2026-06-15T08:15:00Z 2026-06-15T08:15:00Z complete Derived ticker sync state refreshed.
4 sync_state_20260615T082000Z hkex_document_archive 2026-06-15T08:20:00Z 2026-06-15T08:20:00Z 2026-06-15T08:20:00Z complete Derived ticker sync state refreshed.
5 sync_state_20260615T082500Z hkex_document_archive 2026-06-15T08:25:00Z 2026-06-15T08:25:00Z 2026-06-15T08:25:00Z complete Derived ticker sync state refreshed.
6 sync_state_20260615T083000Z hkex_document_archive 2026-06-15T08:30:00Z 2026-06-15T08:30:00Z 2026-06-15T08:30:00Z complete Derived ticker sync state refreshed.
7 sync_state_20260615T083500Z hkex_document_archive 2026-06-15T08:35:00Z 2026-06-15T08:35:00Z 2026-06-15T08:35:00Z complete Derived ticker sync state refreshed.
8 sync_state_20260615T084500Z hkex_document_archive 2026-06-15T08:45:00Z 2026-06-15T08:45:00Z 2026-06-15T08:45:00Z complete Derived ticker sync state refreshed.
9 sync_state_20260615T085000Z hkex_document_archive 2026-06-15T08:50:00Z 2026-06-15T08:50:00Z 2026-06-15T08:50:00Z complete Derived ticker sync state refreshed.
10 sync_state_seed_2026_06_15 bootstrap_state_refresh 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z complete Derived ticker sync state refreshed.