Files
hk-ipo/reports/2026-06-15_t1_demand_text_backfill.md
T
geometrybase 6d05056609 Backfill structured T1 demand from archived text
Request:
- Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text.

Changes:
- Add an incremental T1 demand text backfill script.
- Parse existing allotment-result extracted text into ipo_demand.
- Archive linked Summary PDFs from old HKEX HTML allotment-result pages.
- Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices.
- Add robust line-aware allotment parsing and document the workflow in archivist and README.
- Record the backfill result in a report.

Execution:
- Selected 137 source-only T1 demand gaps.
- Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows.
- Archived 38 new HKEX allotment-result PDFs and extracted their text.
- Confirmed an incremental rerun selects 0 gaps and writes 0 rows.

Verification:
- Ran git diff --cached --check.
- Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py.
- Checked SQLite integrity and foreign keys.
- Confirmed DB row counts match CSV snapshots.
- Verified no T1 complete row is missing ipo_demand.
- Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes.

Next useful context:
- T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due.
- T2 grey-market and due price-performance gaps remain separate archivist priorities.
- Analyst output should be regenerated before using the new T1 demand facts for scoring.
2026-06-15 13:59:06 +00:00

2.4 KiB

T1 Demand Text Backfill

Run date: 2026-06-15 Archive mode: t1_demand_text_backfill Target: T1 allotment rows with archived allotment-results sources but missing ipo_demand

Result

The T1 source-only demand gap was closed.

  • Initial source-only T1 demand gaps: 137.
  • ipo_demand rows before backfill: 154.
  • ipo_demand rows after backfill: 291.
  • T1 complete rows without ipo_demand: 0.
  • T1 pending-not-due rows: 2 (06106, 06675).

Source Handling

  • Existing extracted PDF text supplied most of the backfill.
  • Old HKEX HTML allotment-result pages were followed to their linked Summary PDFs.
  • Clarification or supplemental notices that had been selected as allotment-results sources were corrected by archiving the primary allotment-results announcement from the same HKEX title-search window.
  • New allotment-result PDF sources archived: 38.
  • PDF source refs after backfill: 595.
  • Extracted-text manifest rows after backfill: 595.
  • Extracted-text manifest status: 595 ok.

Field Policy

Only explicitly disclosed demand fields were stored.

No missing demand field was inferred from share counts or other derived calculations. For example, where a Summary PDF disclosed valid applications, public subscription, international placee count, and final share counts but omitted successful applicants or international subscription level, the omitted fields were left null.

Verification

  • SQLite integrity check: ok.
  • Foreign-key violations: 0.
  • DB row counts match CSV snapshots for ipo_master, offering_terms, ipo_demand, source_refs, sync_runs, ticker_sync_state, and sync_tasks.
  • source_refs: 1,187 rows, 0 bad paths, 0 missing files, 0 hash mismatches.
  • PDF manifest reconciliation: 595 PDF sources, 595 manifest rows, 0 missing manifest rows, 0 orphan manifest rows, 0 missing text files, 0 PDF hash mismatches, 0 text hash mismatches.
  • Incremental empty rerun selected 0 source-only gaps and wrote 0 rows.

Remaining Non-T1 Gaps

The T1 structural gap is closed, but historical completeness is still not fully complete:

  • T2 grey-market remains blocked for 291 listed tickers pending a reproducible source strategy.
  • Price-performance open tasks remain for D1/D5/D20/D60.
  • Context fields such as industry label, market cap, and net proceeds remain incomplete.

The v0 analysis dataset should be regenerated by analyst before using the new T1 demand facts for scoring or calibration.