Backfill first HKEX IPO document batch

Request:
Start progressively filling detailed information for recent HK IPO targets.

Changes:
- Add scripts/archive_hkex_documents.py to map tickers to HKEXnews stock IDs, select official prospectus and allotment-results PDFs, archive them under data/raw/{ticker}, parse high-confidence T0/T1 facts, export snapshots, and refresh sync state.
- Document the small-batch HKEX document backfill workflow in README.md and the archivist skill.
- Archive prospectus and allotment-results PDFs for 00901, 01081, 01779, 02290, 02553, and 03388.
- Fill T0 details including application dates, expected allotment date, board lot, minimum subscription amount, and offer-share counts for the six tickers.
- Fill T1 allotment-demand details including valid/successful applications, public subscription level, international placees, international subscription level, and final offer-share allocations.
- Refresh source_refs, ipo_master, offering_terms, ipo_demand, ticker_sync_state, and sync_tasks snapshots.

Verification:
- Ran archive_hkex_documents.py in a first small batch and re-ran corrected tickers after parser hardening.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, source files exist, and SHA-256 hashes match.
- Confirmed batch field completeness for the six processed tickers.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- This batch added about 55MB of official HKEXnews PDFs.
- Sync state now has 16 complete stages, 1993 pending_due stages, and 42 pending_not_due stages.
- Continue with small --limit batches because HKEXnews title search can include historical or postponed offering documents for the same stock code.
This commit is contained in:
2026-06-15 07:07:46 +00:00
parent c65b20a1c4
commit 993d7b26fa
23 changed files with 4908 additions and 4110 deletions
+6
View File
@@ -1,2 +1,8 @@
demand_id,ticker,source_id,stage_date,valid_applications,successful_applications,public_oversubscription_times,international_placees,international_oversubscription_times,final_hk_offer_shares,final_international_offer_shares,data_as_of,notes
00901_allotment_2026_05_27_2026052700001,00901,00901_allotment_results_2026_05_27_2026052700001,2026-05-27,177196,17058,1971.99,215,2.23,1920800,17286500,2026-06-15T08:35:00Z,Parsed from HKEXnews allotment results announcement.
01081_allotment_2026_06_04_2026060402919,01081,01081_allotment_results_2026_06_04_2026060402919,2026-06-04,122627,28788,134.39,125,10.68,8696600,91314000,2026-06-15T08:35:00Z,Parsed from HKEXnews allotment results announcement.
01779_allotment_2026_06_04_2026060402923,01779,01779_allotment_results_2026_06_04_2026060402923,2026-06-04,266377,28057,4762.58,80,10.94,1419350,12773800,2026-06-15T08:35:00Z,Parsed from HKEXnews allotment results announcement.
02290_allotment_2026_06_04_2026060402521,02290,02290_allotment_results_2026_06_04_2026060402521,2026-06-04,133189,16359,664.92,78,3.18,12500000,112500000,2026-06-15T08:35:00Z,Parsed from HKEXnews allotment results announcement.
02553_allotment_2026_06_02_2026060202644,02553,02553_allotment_results_2026_06_02_2026060202644,2026-06-02,109125,21872,1421.54,97,0.95,6000000,34000000,2026-06-15T08:50:00Z,Parsed from HKEXnews allotment results announcement.
03388_allotment_2026_05_28_2026052802543,03388,03388_allotment_results_2026_05_28_2026052802543,2026-05-28,251375,44336,3829.42,183,26.8,7342800,66084750,2026-06-15T08:35:00Z,Parsed from HKEXnews allotment results announcement.
06658_allotment_2026_06_12,06658,06658_allotment_results_2026_06_12,2026-06-12,180507,11465,6586.73,64,2.64,1146500,10317600,2026-06-15T06:15:00Z,Claw-back shown as N/A in the HKEXnews allotment results.
1 demand_id ticker source_id stage_date valid_applications successful_applications public_oversubscription_times international_placees international_oversubscription_times final_hk_offer_shares final_international_offer_shares data_as_of notes
2 00901_allotment_2026_05_27_2026052700001 00901 00901_allotment_results_2026_05_27_2026052700001 2026-05-27 177196 17058 1971.99 215 2.23 1920800 17286500 2026-06-15T08:35:00Z Parsed from HKEXnews allotment results announcement.
3 01081_allotment_2026_06_04_2026060402919 01081 01081_allotment_results_2026_06_04_2026060402919 2026-06-04 122627 28788 134.39 125 10.68 8696600 91314000 2026-06-15T08:35:00Z Parsed from HKEXnews allotment results announcement.
4 01779_allotment_2026_06_04_2026060402923 01779 01779_allotment_results_2026_06_04_2026060402923 2026-06-04 266377 28057 4762.58 80 10.94 1419350 12773800 2026-06-15T08:35:00Z Parsed from HKEXnews allotment results announcement.
5 02290_allotment_2026_06_04_2026060402521 02290 02290_allotment_results_2026_06_04_2026060402521 2026-06-04 133189 16359 664.92 78 3.18 12500000 112500000 2026-06-15T08:35:00Z Parsed from HKEXnews allotment results announcement.
6 02553_allotment_2026_06_02_2026060202644 02553 02553_allotment_results_2026_06_02_2026060202644 2026-06-02 109125 21872 1421.54 97 0.95 6000000 34000000 2026-06-15T08:50:00Z Parsed from HKEXnews allotment results announcement.
7 03388_allotment_2026_05_28_2026052802543 03388 03388_allotment_results_2026_05_28_2026052802543 2026-05-28 251375 44336 3829.42 183 26.8 7342800 66084750 2026-06-15T08:35:00Z Parsed from HKEXnews allotment results announcement.
8 06658_allotment_2026_06_12 06658 06658_allotment_results_2026_06_12 2026-06-12 180507 11465 6586.73 64 2.64 1146500 10317600 2026-06-15T06:15:00Z Claw-back shown as N/A in the HKEXnews allotment results.