Backfill first HKEX IPO document batch
Request:
Start progressively filling detailed information for recent HK IPO targets.
Changes:
- Add scripts/archive_hkex_documents.py to map tickers to HKEXnews stock IDs, select official prospectus and allotment-results PDFs, archive them under data/raw/{ticker}, parse high-confidence T0/T1 facts, export snapshots, and refresh sync state.
- Document the small-batch HKEX document backfill workflow in README.md and the archivist skill.
- Archive prospectus and allotment-results PDFs for 00901, 01081, 01779, 02290, 02553, and 03388.
- Fill T0 details including application dates, expected allotment date, board lot, minimum subscription amount, and offer-share counts for the six tickers.
- Fill T1 allotment-demand details including valid/successful applications, public subscription level, international placees, international subscription level, and final offer-share allocations.
- Refresh source_refs, ipo_master, offering_terms, ipo_demand, ticker_sync_state, and sync_tasks snapshots.
Verification:
- Ran archive_hkex_documents.py in a first small batch and re-ran corrected tickers after parser hardening.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, source files exist, and SHA-256 hashes match.
- Confirmed batch field completeness for the six processed tickers.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.
Next useful context:
- This batch added about 55MB of official HKEXnews PDFs.
- Sync state now has 16 complete stages, 1993 pending_due stages, and 42 pending_not_due stages.
- Continue with small --limit batches because HKEXnews title search can include historical or postponed offering documents for the same stock code.
This commit is contained in:
@@ -1,3 +1,10 @@
|
||||
sync_run_id,mode,as_of,started_at,finished_at,status,notes
|
||||
sync_state_20260615T073000Z,recent_ipo_list_refresh,2026-06-15T07:30:00Z,2026-06-15T07:30:00Z,2026-06-15T07:30:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T081500Z,hkex_document_archive,2026-06-15T08:15:00Z,2026-06-15T08:15:00Z,2026-06-15T08:15:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T082000Z,hkex_document_archive,2026-06-15T08:20:00Z,2026-06-15T08:20:00Z,2026-06-15T08:20:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T082500Z,hkex_document_archive,2026-06-15T08:25:00Z,2026-06-15T08:25:00Z,2026-06-15T08:25:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T083000Z,hkex_document_archive,2026-06-15T08:30:00Z,2026-06-15T08:30:00Z,2026-06-15T08:30:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T083500Z,hkex_document_archive,2026-06-15T08:35:00Z,2026-06-15T08:35:00Z,2026-06-15T08:35:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T084500Z,hkex_document_archive,2026-06-15T08:45:00Z,2026-06-15T08:45:00Z,2026-06-15T08:45:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_20260615T085000Z,hkex_document_archive,2026-06-15T08:50:00Z,2026-06-15T08:50:00Z,2026-06-15T08:50:00Z,complete,Derived ticker sync state refreshed.
|
||||
sync_state_seed_2026_06_15,bootstrap_state_refresh,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,complete,Derived ticker sync state refreshed.
|
||||
|
||||
|
Reference in New Issue
Block a user