Backfill structured T1 demand from archived text
Request: - Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text. Changes: - Add an incremental T1 demand text backfill script. - Parse existing allotment-result extracted text into ipo_demand. - Archive linked Summary PDFs from old HKEX HTML allotment-result pages. - Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices. - Add robust line-aware allotment parsing and document the workflow in archivist and README. - Record the backfill result in a report. Execution: - Selected 137 source-only T1 demand gaps. - Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows. - Archived 38 new HKEX allotment-result PDFs and extracted their text. - Confirmed an incremental rerun selects 0 gaps and writes 0 rows. Verification: - Ran git diff --cached --check. - Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py. - Checked SQLite integrity and foreign keys. - Confirmed DB row counts match CSV snapshots. - Verified no T1 complete row is missing ipo_demand. - Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes. Next useful context: - T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due. - T2 grey-market and due price-performance gaps remain separate archivist priorities. - Analyst output should be regenerated before using the new T1 demand facts for scoring.
This commit is contained in:
@@ -123,6 +123,16 @@ The archiver maps stock codes to HKEXnews title-search stock IDs, downloads the
|
||||
|
||||
HKEX `.htm`/`.html` notices and Yahoo Finance JSON market data stay in `data/raw/`; they are not copied into `data/extracted_text/`.
|
||||
|
||||
## T1 Demand Text Backfill
|
||||
|
||||
Use the T1 demand text backfill after HKEX allotment-result sources have been archived and PDF text extraction is available:
|
||||
|
||||
```bash
|
||||
.venv/bin/python scripts/backfill_t1_demand_from_text.py --as-of 2026-06-15T14:15:00Z
|
||||
```
|
||||
|
||||
The backfill is incremental. It fills only `T1_allotment` rows that have an archived allotment-results source but no `ipo_demand` row. For old HKEX HTML allotment-result pages, it archives the linked Summary PDF, extracts text, records the new source, and stores only demand fields that are explicitly present.
|
||||
|
||||
## Price Performance Backfill
|
||||
|
||||
Use the price-performance archiver to fill due D1/D5/D20/D60 review checkpoints:
|
||||
|
||||
Reference in New Issue
Block a user