3 Commits

Author SHA1 Message Date
geometrybase 5f9546b16c Improve IPO archive gap handling
Request:
- Rework archivist handling for stubborn T0/T1 HKEX document gaps and unresolved T2 grey-market gaps.

Changes:
- Query HKEXnews titleSearchServlet with IPO-date windows instead of only the latest title-search page.
- Recognize SHARE OFFER listing documents and archive official HTML allotment-result notices when no PDF is published.
- Mark source-only allotment completion clearly when structured demand parsing is not yet covered.
- Add a reusable grey-market gap marker and archivist source policy for T2 data.
- Archive newly discovered HKEX raw sources, update SQLite, and refresh CSV snapshots.
- Treat raw evidence files as binary in Git attributes.

Verification:
- Ran py_compile for archive_hkex_documents.py, update_sync_state.py, and mark_grey_market_gaps.py.
- Ran HKEX document archive backfill and grey-market gap marker.
- Checked SQLite integrity, foreign keys, source paths, source hashes, and DB-vs-snapshot row counts.
- Ran git diff --cached --check after marking raw archives binary.

Next useful context:
- T0 is now complete for 293 tickers.
- T1 has 291 complete and 2 pending_not_due tickers.
- T2 has 291 blocked gaps pending an approved grey-market source strategy.
2026-06-15 09:47:36 +00:00
geometrybase c65b20a1c4 Archive recent HKEX IPO targets
Request:
Use the project archivist workflow to update IPO target coverage for the most recent three-year window.

Changes:
- Add scripts/update_recent_ipo_list.py to discover HKEXnews annual new listing reports, archive XLSX sources, parse subscription-relevant IPO rows, and update SQLite plus snapshots.
- Add new_listing_report_entries to preserve annual report row-level evidence.
- Archive 2023-2026 Main Board new listing reports and 2024-2026 GEM new listing reports.
- Seed 290 report-backed IPO targets for 2023-06-15 through 2026-06-15, skipping 10 non-IPO rows without numeric offer prices.
- Refresh ipo_master, missing offering_terms fields, source_refs, ticker_sync_state, and sync_tasks.
- Add openpyxl as the XLSX parser dependency and document the archivist refresh flow.
- Limit sync summary output while keeping the full queue in SQLite and CSV snapshots.

Verification:
- Ran update_recent_ipo_list.py for 2023-06-15 to 2026-06-15 with as-of 2026-06-15T07:30:00Z.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- ipo_master now has 293 tickers; new_listing_report_entries has 290 report-backed targets.
- Current sync queue has 2005 open tasks and 42 waiting_until_due tasks for deeper per-ticker archival stages.
2026-06-15 06:42:31 +00:00
geometrybase 08db218b6d Add archivist incremental sync state
Request:
Add archivist support for remembering which IPO archive stages have already been synced and which stages should be updated next.

Changes:
- Add sync_runs, ticker_sync_state, sync_tasks, and price_performance tables to the archive schema.
- Add scripts/update_sync_state.py to derive per-ticker stage status and rebuild the next-sync task queue.
- Export the new sync-state tables as Git-friendly CSV snapshots.
- Document the incremental archive flow in the archivist skill and README.

Verification:
- Ran scripts/bootstrap_historical_data.py.
- Ran scripts/update_sync_state.py with a deterministic as-of timestamp.
- Checked SQLite integrity and DB-to-snapshot row counts with Python sqlite3.
- Parsed Python scripts with ast.parse.
- Ran git diff --check and checked for temporary SQLite/cache files.

Next useful context:
- Current derived queue has 2 open tasks for 06658 and 15 waiting_until_due tasks for future stages.
2026-06-15 06:29:54 +00:00