Backfill structured T1 demand from archived text

Request:
- Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text.

Changes:
- Add an incremental T1 demand text backfill script.
- Parse existing allotment-result extracted text into ipo_demand.
- Archive linked Summary PDFs from old HKEX HTML allotment-result pages.
- Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices.
- Add robust line-aware allotment parsing and document the workflow in archivist and README.
- Record the backfill result in a report.

Execution:
- Selected 137 source-only T1 demand gaps.
- Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows.
- Archived 38 new HKEX allotment-result PDFs and extracted their text.
- Confirmed an incremental rerun selects 0 gaps and writes 0 rows.

Verification:
- Ran git diff --cached --check.
- Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py.
- Checked SQLite integrity and foreign keys.
- Confirmed DB row counts match CSV snapshots.
- Verified no T1 complete row is missing ipo_demand.
- Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes.

Next useful context:
- T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due.
- T2 grey-market and due price-performance gaps remain separate archivist priorities.
- Analyst output should be regenerated before using the new T1 demand facts for scoring.
This commit is contained in:
2026-06-15 13:59:06 +00:00
parent 33d0bc056e
commit 6d05056609
88 changed files with 55785 additions and 2456 deletions
+4
View File
@@ -14,4 +14,8 @@ sync_state_20260615T100500Z,hkex_document_archive,2026-06-15T10:05:00Z,2026-06-1
sync_state_20260615T110000Z,hkex_document_archive,2026-06-15T11:00:00Z,2026-06-15T11:00:00Z,2026-06-15T11:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T120000Z,hkex_document_archive,2026-06-15T12:00:00Z,2026-06-15T12:00:00Z,2026-06-15T12:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T120500Z,grey_market_gap_review,2026-06-15T12:05:00Z,2026-06-15T12:05:00Z,2026-06-15T12:05:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T140000Z,t1_demand_text_backfill,2026-06-15T14:00:00Z,2026-06-15T14:00:00Z,2026-06-15T14:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T141000Z,t1_demand_text_backfill,2026-06-15T14:10:00Z,2026-06-15T14:10:00Z,2026-06-15T14:10:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T141500Z,t1_demand_text_backfill,2026-06-15T14:15:00Z,2026-06-15T14:15:00Z,2026-06-15T14:15:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T142000Z,t1_demand_text_backfill,2026-06-15T14:20:00Z,2026-06-15T14:20:00Z,2026-06-15T14:20:00Z,complete,Derived ticker sync state refreshed.
sync_state_seed_2026_06_15,bootstrap_state_refresh,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,complete,Derived ticker sync state refreshed.
1 sync_run_id mode as_of started_at finished_at status notes
14 sync_state_20260615T110000Z hkex_document_archive 2026-06-15T11:00:00Z 2026-06-15T11:00:00Z 2026-06-15T11:00:00Z complete Derived ticker sync state refreshed.
15 sync_state_20260615T120000Z hkex_document_archive 2026-06-15T12:00:00Z 2026-06-15T12:00:00Z 2026-06-15T12:00:00Z complete Derived ticker sync state refreshed.
16 sync_state_20260615T120500Z grey_market_gap_review 2026-06-15T12:05:00Z 2026-06-15T12:05:00Z 2026-06-15T12:05:00Z complete Derived ticker sync state refreshed.
17 sync_state_20260615T140000Z t1_demand_text_backfill 2026-06-15T14:00:00Z 2026-06-15T14:00:00Z 2026-06-15T14:00:00Z complete Derived ticker sync state refreshed.
18 sync_state_20260615T141000Z t1_demand_text_backfill 2026-06-15T14:10:00Z 2026-06-15T14:10:00Z 2026-06-15T14:10:00Z complete Derived ticker sync state refreshed.
19 sync_state_20260615T141500Z t1_demand_text_backfill 2026-06-15T14:15:00Z 2026-06-15T14:15:00Z 2026-06-15T14:15:00Z complete Derived ticker sync state refreshed.
20 sync_state_20260615T142000Z t1_demand_text_backfill 2026-06-15T14:20:00Z 2026-06-15T14:20:00Z 2026-06-15T14:20:00Z complete Derived ticker sync state refreshed.
21 sync_state_seed_2026_06_15 bootstrap_state_refresh 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z complete Derived ticker sync state refreshed.