Archive recent HKEX IPO targets
Request: Use the project archivist workflow to update IPO target coverage for the most recent three-year window. Changes: - Add scripts/update_recent_ipo_list.py to discover HKEXnews annual new listing reports, archive XLSX sources, parse subscription-relevant IPO rows, and update SQLite plus snapshots. - Add new_listing_report_entries to preserve annual report row-level evidence. - Archive 2023-2026 Main Board new listing reports and 2024-2026 GEM new listing reports. - Seed 290 report-backed IPO targets for 2023-06-15 through 2026-06-15, skipping 10 non-IPO rows without numeric offer prices. - Refresh ipo_master, missing offering_terms fields, source_refs, ticker_sync_state, and sync_tasks. - Add openpyxl as the XLSX parser dependency and document the archivist refresh flow. - Limit sync summary output while keeping the full queue in SQLite and CSV snapshots. Verification: - Ran update_recent_ipo_list.py for 2023-06-15 to 2026-06-15 with as-of 2026-06-15T07:30:00Z. - Parsed project Python scripts with ast.parse. - Checked SQLite integrity and DB-to-snapshot row counts. - Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match. - Ran git diff --check and git diff --cached --check. - Checked for Python cache and SQLite transient files. Next useful context: - ipo_master now has 293 tickers; new_listing_report_entries has 290 report-backed targets. - Current sync queue has 2005 open tasks and 42 waiting_until_due tasks for deeper per-ticker archival stages.
This commit is contained in:
@@ -97,6 +97,18 @@ python3 -m venv .venv
|
||||
|
||||
The extractor reads PDF paths from `data/hk_ipo.sqlite`, writes derived text files under `data/extracted_text/`, and exports `data/snapshots/extracted_text_manifest.csv` with page counts, text hashes, and extraction status.
|
||||
|
||||
## Recent IPO Target Refresh
|
||||
|
||||
Use HKEXnews annual new listing reports to seed recent subscription-relevant IPO targets:
|
||||
|
||||
```bash
|
||||
.venv/bin/python scripts/update_recent_ipo_list.py --start-date 2023-06-15 --end-date 2026-06-15 --as-of 2026-06-15T07:30:00Z
|
||||
```
|
||||
|
||||
The updater archives the HKEXnews XLSX reports under `data/raw/hkex_new_listing_reports/`, records report-backed source references, writes `new_listing_report_entries`, updates `ipo_master` and missing `offering_terms` fields, exports CSV snapshots, and refreshes `sync_tasks`.
|
||||
|
||||
Rows without an IPO offer price, such as transfers of listing, introductions, or de-SPAC transactions, are skipped by default because they are not ordinary public subscription targets.
|
||||
|
||||
## Incremental Archive Sync
|
||||
|
||||
The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
|
||||
|
||||
Reference in New Issue
Block a user