Improve IPO archive gap handling

Request:
- Rework archivist handling for stubborn T0/T1 HKEX document gaps and unresolved T2 grey-market gaps.

Changes:
- Query HKEXnews titleSearchServlet with IPO-date windows instead of only the latest title-search page.
- Recognize SHARE OFFER listing documents and archive official HTML allotment-result notices when no PDF is published.
- Mark source-only allotment completion clearly when structured demand parsing is not yet covered.
- Add a reusable grey-market gap marker and archivist source policy for T2 data.
- Archive newly discovered HKEX raw sources, update SQLite, and refresh CSV snapshots.
- Treat raw evidence files as binary in Git attributes.

Verification:
- Ran py_compile for archive_hkex_documents.py, update_sync_state.py, and mark_grey_market_gaps.py.
- Ran HKEX document archive backfill and grey-market gap marker.
- Checked SQLite integrity, foreign keys, source paths, source hashes, and DB-vs-snapshot row counts.
- Ran git diff --cached --check after marking raw archives binary.

Next useful context:
- T0 is now complete for 293 tickers.
- T1 has 291 complete and 2 pending_not_due tickers.
- T2 has 291 blocked gaps pending an approved grey-market source strategy.
This commit is contained in:
2026-06-15 09:47:36 +00:00
parent 078f56998b
commit 5f9546b16c
184 changed files with 1204626 additions and 2857 deletions
+3
View File
@@ -11,4 +11,7 @@ sync_state_20260615T085000Z,hkex_document_archive,2026-06-15T08:50:00Z,2026-06-1
sync_state_20260615T090000Z,hkex_document_archive,2026-06-15T09:00:00Z,2026-06-15T09:00:00Z,2026-06-15T09:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T100000Z,price_performance_archive,2026-06-15T10:00:00Z,2026-06-15T10:00:00Z,2026-06-15T10:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T100500Z,hkex_document_archive,2026-06-15T10:05:00Z,2026-06-15T10:05:00Z,2026-06-15T10:05:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T110000Z,hkex_document_archive,2026-06-15T11:00:00Z,2026-06-15T11:00:00Z,2026-06-15T11:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T120000Z,hkex_document_archive,2026-06-15T12:00:00Z,2026-06-15T12:00:00Z,2026-06-15T12:00:00Z,complete,Derived ticker sync state refreshed.
sync_state_20260615T120500Z,grey_market_gap_review,2026-06-15T12:05:00Z,2026-06-15T12:05:00Z,2026-06-15T12:05:00Z,complete,Derived ticker sync state refreshed.
sync_state_seed_2026_06_15,bootstrap_state_refresh,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,2026-06-15T06:30:00Z,complete,Derived ticker sync state refreshed.
1 sync_run_id mode as_of started_at finished_at status notes
11 sync_state_20260615T090000Z hkex_document_archive 2026-06-15T09:00:00Z 2026-06-15T09:00:00Z 2026-06-15T09:00:00Z complete Derived ticker sync state refreshed.
12 sync_state_20260615T100000Z price_performance_archive 2026-06-15T10:00:00Z 2026-06-15T10:00:00Z 2026-06-15T10:00:00Z complete Derived ticker sync state refreshed.
13 sync_state_20260615T100500Z hkex_document_archive 2026-06-15T10:05:00Z 2026-06-15T10:05:00Z 2026-06-15T10:05:00Z complete Derived ticker sync state refreshed.
14 sync_state_20260615T110000Z hkex_document_archive 2026-06-15T11:00:00Z 2026-06-15T11:00:00Z 2026-06-15T11:00:00Z complete Derived ticker sync state refreshed.
15 sync_state_20260615T120000Z hkex_document_archive 2026-06-15T12:00:00Z 2026-06-15T12:00:00Z 2026-06-15T12:00:00Z complete Derived ticker sync state refreshed.
16 sync_state_20260615T120500Z grey_market_gap_review 2026-06-15T12:05:00Z 2026-06-15T12:05:00Z 2026-06-15T12:05:00Z complete Derived ticker sync state refreshed.
17 sync_state_seed_2026_06_15 bootstrap_state_refresh 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z 2026-06-15T06:30:00Z complete Derived ticker sync state refreshed.