Request:
Run the scheduled HK IPO analyst refresh as of 2026-06-23T23:00:03Z, refresh online archive facts first, rebuild the analysis dataset, write the latest Chinese broad candidate report, mirror it to reports/README.md, and preserve stage discipline.
Changes:
- Refreshed HKEX current-listing pages, current HKEX document searches, VBKR/Jieli T0.95 market heat, ipohk external history, and A/H quote evidence at the new as-of timestamp.
- Preserved unofficial subscription heat only in ipo_market_heat and kept official T1 demand fields sourced from HKEX allotment-result documents.
- Marked due or Hong Kong-time elapsed T2 grey-market source-strategy gaps for 01392, 02335, 02667, 06067, 06106, 06132, and 06675.
- Backfilled 06675 stock_short_name from archived Yahoo shortName evidence and rebuilt analysis_model_v0_dataset.csv.
- Updated reports/2026-06-23_latest_ipo_candidates_analysis.md and mirrored the same content to reports/README.md with actionable ranking, fundamentals, unresolved-D1 risk/reward coverage, closed/waiting names, A/H overlay, recent 30-day review, guardrails, and sources.
Verification:
- git diff --check
- Rebuilt analysis dataset for 2026-06-23T23:00:03Z
- Python check that reports/README.md matches the dated report and required current/recent tickers are present
- Python check that 23:00Z heat has 8 T0_95_final_heat rows and active heat tickers have no official ipo_demand rows
- Python check that 02335 and 06106 official T1 fields match HKEX allotment results
- Python check that 77 source refs archived at 2026-06-23T23:00:03Z use repo-relative paths, files exist, and hashes match
- Python check that T2 data gaps are marked for 01392, 02335, 02667, 06067, 06106, 06132, and 06675
Next useful context:
- VBKR/Jieli heat values for the 8 still-actionable names were unchanged from 15:00Z but now have a fresh 23:00Z observed_at.
- 02335 and 06106 have official T1 demand, but D1 had not opened by the report as-of time; T2 remains a source-strategy data_gap.
- 00901 Yahoo D1 fetch still returns 404; ipohk remains only a third-party cross-check.
Request:
- Rework archivist handling for stubborn T0/T1 HKEX document gaps and unresolved T2 grey-market gaps.
Changes:
- Query HKEXnews titleSearchServlet with IPO-date windows instead of only the latest title-search page.
- Recognize SHARE OFFER listing documents and archive official HTML allotment-result notices when no PDF is published.
- Mark source-only allotment completion clearly when structured demand parsing is not yet covered.
- Add a reusable grey-market gap marker and archivist source policy for T2 data.
- Archive newly discovered HKEX raw sources, update SQLite, and refresh CSV snapshots.
- Treat raw evidence files as binary in Git attributes.
Verification:
- Ran py_compile for archive_hkex_documents.py, update_sync_state.py, and mark_grey_market_gaps.py.
- Ran HKEX document archive backfill and grey-market gap marker.
- Checked SQLite integrity, foreign keys, source paths, source hashes, and DB-vs-snapshot row counts.
- Ran git diff --cached --check after marking raw archives binary.
Next useful context:
- T0 is now complete for 293 tickers.
- T1 has 291 complete and 2 pending_not_due tickers.
- T2 has 291 blocked gaps pending an approved grey-market source strategy.
Request:
- Provide a way to install or develop a PDF extraction tool for archived HK IPO documents.
Changes:
- Add requirements.txt with pypdf as the lightweight PDF text extraction dependency.
- Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files.
- Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs.
- Document the extraction workflow in README.md.
- Ignore .venv and keep generated SQLite/Python transient files out of git.
- Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms.
Verification:
- Installed python3.14-venv system support, created a local .venv, and installed requirements.txt.
- Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py.
- Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv.
- Verified SQLite integrity and snapshot row counts.
- Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
Request:
- Use the project archivist workflow to update historical IPO data.
Changes:
- Add an embedded SQLite archive at data/hk_ipo.sqlite.
- Add schema/hk_ipo.schema.sql and scripts/bootstrap_historical_data.py for reproducible archive generation.
- Archive HKEXnews source PDFs for 06658, 06675, and 06106 under repo-relative data/raw paths.
- Export Git-friendly snapshots for ipo_master, offering_terms, ipo_demand, source_refs, and data_gaps.
- Add .gitignore rules for Python cache and SQLite transient files.
Verification:
- Re-ran the bootstrap script successfully.
- Ran PRAGMA integrity_check on the SQLite database.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Verified snapshot row counts match SQLite table counts.
- Ran git diff --check and searched generated durable files for machine-specific absolute paths.