6d05056609
Request: - Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text. Changes: - Add an incremental T1 demand text backfill script. - Parse existing allotment-result extracted text into ipo_demand. - Archive linked Summary PDFs from old HKEX HTML allotment-result pages. - Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices. - Add robust line-aware allotment parsing and document the workflow in archivist and README. - Record the backfill result in a report. Execution: - Selected 137 source-only T1 demand gaps. - Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows. - Archived 38 new HKEX allotment-result PDFs and extracted their text. - Confirmed an incremental rerun selects 0 gaps and writes 0 rows. Verification: - Ran git diff --cached --check. - Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py. - Checked SQLite integrity and foreign keys. - Confirmed DB row counts match CSV snapshots. - Verified no T1 complete row is missing ipo_demand. - Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes. Next useful context: - T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due. - T2 grey-market and due price-performance gaps remain separate archivist priorities. - Analyst output should be regenerated before using the new T1 demand facts for scoring.
51 lines
2.4 KiB
Markdown
51 lines
2.4 KiB
Markdown
# T1 Demand Text Backfill
|
|
|
|
Run date: 2026-06-15
|
|
Archive mode: `t1_demand_text_backfill`
|
|
Target: T1 allotment rows with archived allotment-results sources but missing `ipo_demand`
|
|
|
|
## Result
|
|
|
|
The T1 source-only demand gap was closed.
|
|
|
|
- Initial source-only T1 demand gaps: 137.
|
|
- `ipo_demand` rows before backfill: 154.
|
|
- `ipo_demand` rows after backfill: 291.
|
|
- T1 complete rows without `ipo_demand`: 0.
|
|
- T1 pending-not-due rows: 2 (`06106`, `06675`).
|
|
|
|
## Source Handling
|
|
|
|
- Existing extracted PDF text supplied most of the backfill.
|
|
- Old HKEX HTML allotment-result pages were followed to their linked Summary PDFs.
|
|
- Clarification or supplemental notices that had been selected as allotment-results sources were corrected by archiving the primary allotment-results announcement from the same HKEX title-search window.
|
|
- New allotment-result PDF sources archived: 38.
|
|
- PDF source refs after backfill: 595.
|
|
- Extracted-text manifest rows after backfill: 595.
|
|
- Extracted-text manifest status: 595 `ok`.
|
|
|
|
## Field Policy
|
|
|
|
Only explicitly disclosed demand fields were stored.
|
|
|
|
No missing demand field was inferred from share counts or other derived calculations. For example, where a Summary PDF disclosed valid applications, public subscription, international placee count, and final share counts but omitted successful applicants or international subscription level, the omitted fields were left null.
|
|
|
|
## Verification
|
|
|
|
- SQLite integrity check: `ok`.
|
|
- Foreign-key violations: 0.
|
|
- DB row counts match CSV snapshots for `ipo_master`, `offering_terms`, `ipo_demand`, `source_refs`, `sync_runs`, `ticker_sync_state`, and `sync_tasks`.
|
|
- `source_refs`: 1,187 rows, 0 bad paths, 0 missing files, 0 hash mismatches.
|
|
- PDF manifest reconciliation: 595 PDF sources, 595 manifest rows, 0 missing manifest rows, 0 orphan manifest rows, 0 missing text files, 0 PDF hash mismatches, 0 text hash mismatches.
|
|
- Incremental empty rerun selected 0 source-only gaps and wrote 0 rows.
|
|
|
|
## Remaining Non-T1 Gaps
|
|
|
|
The T1 structural gap is closed, but historical completeness is still not fully complete:
|
|
|
|
- T2 grey-market remains blocked for 291 listed tickers pending a reproducible source strategy.
|
|
- Price-performance open tasks remain for D1/D5/D20/D60.
|
|
- Context fields such as industry label, market cap, and net proceeds remain incomplete.
|
|
|
|
The v0 analysis dataset should be regenerated by `analyst` before using the new T1 demand facts for scoring or calibration.
|