Request:
Run the scheduled HK IPO analyst refresh as of 2026-06-23T15:00:19Z, refresh online archive facts first, rebuild the analysis dataset, write the latest Chinese broad candidate report, mirror it to reports/README.md, and preserve stage discipline.
Changes:
- Refreshed HKEX current-listing pages, VBKR/Jieli T0.95 market heat, ipohk external history, A/H quote evidence, and current HKEX document searches.
- Archived official HKEX allotment-result PDFs and extracted text for 02335 and 06106; parsed official T1 demand into ipo_demand without copying market heat into official fields.
- Rebuilt analysis_model_v0_dataset.csv and refreshed sync/source snapshots.
- Updated reports/2026-06-23_latest_ipo_candidates_analysis.md and mirrored the same content to reports/README.md, including current ranking, fundamentals, unresolved-D1 risk/reward table, closed/waiting names, 30-day review, guardrails, and sources.
Verification:
- git diff --check
- Rebuilt analysis dataset for 2026-06-23T15:00:19Z
- Python check that reports/README.md matches the dated report and required new facts are present
- Python check that 15:00Z heat has 8 ipo_market_heat rows and current actionable names have no official ipo_demand rows
- Python check that 02335 and 06106 official T1 fields match HKEX allotment results
- Python check that 77 source refs archived at 2026-06-23T15:00:19Z use repo-relative paths, files exist, and hashes match
Next useful context:
- 02335 and 06106 now have official T1 demand, but D1/T2 remain data_gap until listing-day evidence is archived.
- 00901 Yahoo D1 fetch still returns 404; ipohk remains only a third-party cross-check.
Request:
- Run the scheduled hk-ipo-analyst refresh as of 2026-06-23T07:00:25Z.
- Refresh the latest IPO candidate universe and online facts through the archivist before analysis.
- Rebuild the analysis dataset and publish the latest broad candidate report in Simplified Chinese.
Changes:
- Archived the HKEX current new-listing page for 2026-06-23 and added the new 00668 prospectus plus extracted text.
- Archived a fresh VBKR/Jieli T0.95 market-heat snapshot with 8 still-actionable rows while leaving same-day closed names out of the live T0.95 set.
- Archived 06067 and 06132 D1 price-performance responses, refreshed ipohk external history, and archived the 06067 2026-06-23 allotment-results clarification PDF.
- Rebuilt data/snapshots/analysis_model_v0_dataset.csv after the archive refresh.
- Wrote reports/2026-06-23_latest_ipo_candidates_analysis.md and mirrored the same content to reports/README.md.
Verification:
- Ran archive_hkex_current_new_listings.py, archive_hkex_documents.py, backfill_t1_demand_from_text.py, archive_t0_5_market_heat.py, archive_price_performance.py, archive_ipohk_history.py, extract_pdf_text.py, update_sync_state.py, and build_analysis_dataset.py with as-of 2026-06-23T07:00:25Z.
- Confirmed reports/README.md matches the dated report with cmp.
- Ran git diff --check and git diff --cached --check.
- Checked source_refs paths are repo-relative, existing, and hash-matching.
- Checked the latest 8 T0.95 live heat rows remain separate from official T1 demand rows.
Next useful context:
- As of 2026-06-23T07:00:25Z, 02335 and 06106 still have no archived official T1 demand rows and remain T1 data_gap names.
- 06067 D1 was positive at about +30.0%, while 06132 D1 was negative at about -49.6%, reinforcing the 18A-B risk guardrail.
- The 2026-06-23 HKEX current page shows 00668 as a new live candidate; its first VBKR/Jieli heat snapshot was only 0.32x.
Request:
- Run the scheduled hk-ipo-analyst refresh as of 2026-06-22T15:47:32Z.
- Refresh the IPO candidate universe and network facts through the archivist before analysis.
- Rebuild the analysis dataset and publish the latest broad candidate report in Simplified Chinese.
Changes:
- Archived the HKEX current new-listing page, new official allotment-result PDFs for 06067 and 06132, extracted text, a fresh VBKR/Jieli T0.95 market-heat snapshot, ipohk external history, and recent Yahoo price-performance responses.
- Updated structured SQLite facts and CSV snapshots, including official T1 demand for 06067 and 06132 while keeping live subscription heat in ipo_market_heat.
- Rebuilt data/snapshots/analysis_model_v0_dataset.csv after the archive refresh.
- Rewrote reports/2026-06-22_latest_ipo_candidates_analysis.md and mirrored the same content to reports/README.md.
Verification:
- Ran archive_hkex_current_new_listings.py, archive_hkex_documents.py, backfill_t1_demand_from_text.py, archive_t0_5_market_heat.py, archive_price_performance.py, archive_ipohk_history.py, and build_analysis_dataset.py with as-of 2026-06-22T15:47:32Z.
- Confirmed reports/README.md matches the dated report with cmp.
- Ran git diff --check and git diff --cached --check.
- Checked source_refs paths are repo-relative and existing.
- Checked the latest 13 T0.95 live heat rows remain separate from official T1 demand rows.
Next useful context:
- 06067 and 06132 now have official T1 demand in the archive; 06106 and 02335 remain T1 data_gap names as of this run.
- The 15:47Z VBKR/Jieli live heat values matched the earlier 13:57Z values for active candidates.
- Price refresh still has provider gaps for some historical tickers, including internal D1 price data for 00901.
Request:
- Update the latest Hong Kong IPO candidate list and rescore it based on subscription multiples.
Changes:
- Archived the 2026-06-22 HKEX Main Board New Listing Information page, adding 02697, 03952, 06715, and 06915 to the current candidate set.
- Archived and extracted the four new prospectuses, refreshed current HKEX document facts, and rebuilt the v0 analysis dataset to 311 rows.
- Archived a 2026-06-22T08:55:00Z VBKR/Jieli market-heat snapshot and wrote only still-actionable T0.95 rows to avoid look-ahead leakage for already-closed IPOs.
- Improved prospectus date parsing for split weekday/month text, glued noon/commence phrases, and current new-listing expected listing-date updates.
- Added a Chinese 2026-06-22 latest IPO report ranking candidates after the subscription-multiple overlay.
Verification:
- Ran py_compile for archive_hkex_documents.py, archive_t0_5_market_heat.py, archive_hkex_current_new_listings.py, and build_analysis_dataset.py.
- Re-ran HKEX current-page seeding, document archiving, market-heat archiving, and analysis dataset build as of 2026-06-22T08:55:00Z.
- Ran git diff --check and git diff --cached --check.
- Ran SQLite integrity_check and foreign_key_check.
- Verified source_refs paths, file existence, and SHA-256 hashes.
Next useful context:
- 01956 is the only current candidate with both strong T0 structure and >100x actionable heat in this snapshot.
- Recheck 03952 and 06715 near the 2026-06-25 cutoff; their structure is strong but 2026-06-22 heat is below 10x.
- Official T1 allotment facts for 06067 and 06132 were still unavailable at this archive timestamp.
Request:
- Use the analyst workflow to analyze the latest Hong Kong IPOs, connect their source data, and produce a current report.
Changes:
- Added a current HKEX New Listing Information page seeder that archives the official page, seeds visible tickers, and records source_refs.
- Archived current HKEX prospectus and allotment-result sources for the 16 visible Main Board candidates and extracted their text.
- Extended prospectus parsing for offer price, derived gross proceeds, HDR offerings, and listing-date text extracted with split characters.
- Rebuilt the analysis dataset and added a Chinese 2026-06-21 latest IPO report separating live T0 watchlist names from past-cutoff T1/D1 candidates.
Verification:
- Ran py_compile for update_recent_ipo_list.py, archive_hkex_current_new_listings.py, archive_hkex_documents.py, and build_analysis_dataset.py.
- Re-ran HKEX current page seeding, document archiving, and analysis dataset build as of 2026-06-21T08:44:59Z.
- Ran git diff --check and git diff --cached --check.
- Ran SQLite integrity_check and foreign_key_check.
- Verified source_refs paths, file existence, SHA-256 hashes, and report source paths.
Next useful context:
- Capture T0.95 market heat before the 2026-06-23 and 2026-06-24 order cutoffs before converting the new watchlist into execution calls.
- Treat 02667 as a stale/special HKEX page item until a fresh June timetable or official result appears.
Request:
- Use the project analyst workflow to analyze the latest upcoming Hong Kong IPO candidates.
Changes:
- Refreshed recent HK IPO target coverage through 2026-06-17 and archived current HKEX source updates.
- Archived 06675 allotment results and D1 Yahoo price performance for boundary-case review.
- Archived a 2026-06-17 T0.5 VBKR/Jieli market-heat snapshot for still-actionable 02335 and 06106.
- Rebuilt the v0 analysis dataset and snapshots at 2026-06-17T08:20:00Z.
- Added a Chinese horizontal analyst report ranking 06106, 02335, 06132, 06067, 01392, with 06675 separated as a T1/D1 review sample.
Verification:
- Ran SQLite PRAGMA integrity_check and foreign_key_check.
- Ran git diff --check and git diff --cached --check.
- Confirmed report source paths exist.
Next useful context:
- 06106 is the top still-actionable T0.5 candidate at this as-of time.
- 02335 needs another pre-deadline heat sample before a stronger call.
- 01392, 06067, and 06132 are now mainly waiting for T1 official allotment results.
Request:
- Combine the currently selected T0 IPO reports into one cross-sectional analysis report.
Changes:
- Add a Chinese horizontal T0 report comparing 01392, 02335, 06067, 06106, and 06132.
- Rank the selected IPOs by the current T0 model and short-exit discipline focused on T2/D1 selling.
- Backfill 02335's Chinese company name from its Chinese HKEX prospectus and archive the source PDF plus extracted text.
- Refresh the v0 analysis dataset and sync-state snapshots at 2026-06-15T18:20:00Z.
Verification:
- .venv/bin/python -m py_compile scripts/build_analysis_dataset.py scripts/generate_ipo_report.py scripts/extract_pdf_text.py scripts/update_sync_state.py
- Python sqlite3 PRAGMA integrity_check returned ok and foreign_key_check returned zero rows.
- Confirmed 02335 Chinese source_ref, extracted text manifest row, and selected horizontal report content.
- git diff --cached --check
Next useful context:
- Untracked PDF exports of individual reports and the horizontal report were left out of this focused commit.
Request:
- Generate an analyst report for HK IPO ticker 02335.
Changes:
- Archived the official HKEXnews 02335 prospectus PDF and extracted text under project-relative data paths.
- Seeded 02335 T0 prospectus facts, source references, sync state, and analysis snapshots.
- Generated reports/2026-06-15_02335_T0_prospectus_analysis.md in Simplified Chinese with concrete T0/T1/T2/D1 dates and short-exit T2/D1 discipline.
- Made PDF text extraction tolerant of invalid Unicode surrogate characters emitted by pypdf.
Verification:
- Compiled archive_hkex_documents.py, generate_ipo_report.py, build_analysis_dataset.py, extract_pdf_text.py, and update_sync_state.py.
- Ran SQLite integrity_check and foreign_key_check.
- Verified the archived 02335 PDF hash, extracted-text manifest row, and analysis dataset row.
- Ran git diff --check.
Next useful context:
- 02335 is currently T0_prospectus; T1_allotment is pending for 2026-06-23.
Request:
- Generate an analyst report for HK IPO ticker 06067.
Changes:
- Archived the official HKEXnews 06067 prospectus PDF and extracted text under project-relative data paths.
- Seeded 06067 T0 prospectus facts, source references, sync state, and analysis snapshots.
- Generated reports/2026-06-15_06067_T0_prospectus_analysis.md in Simplified Chinese with concrete T0/T1/T2/D1 dates and short-exit T2/D1 discipline.
- Updated the HKEX document archiver so over-allotment shares are only recorded when the prospectus supports them, with explicit no-option cases stored as zero.
Verification:
- Compiled archive_hkex_documents.py, generate_ipo_report.py, build_analysis_dataset.py, extract_pdf_text.py, and update_sync_state.py.
- Ran SQLite integrity_check and foreign_key_check.
- Verified the archived 06067 PDF hash, extracted-text manifest row, and analysis dataset row.
- Ran git diff --check.
Next useful context:
- 06067 is currently T0_prospectus; T1_allotment is pending for 2026-06-22.
Request:
- Analyze HK IPO ticker 01392 with the analyst skill.
- Preserve the in-flight 06132 archive/report work already created for the prior request.
Changes:
- Archived official HKEX prospectus PDFs and extracted text for 01392 and 06132.
- Seeded structured T0 facts into the SQLite archive and refreshed CSV snapshots and sync state.
- Rebuilt the v0 analysis dataset and model calibration report.
- Generated Simplified Chinese T0 prospectus-stage analyst reports for 01392 and 06132.
- Adjusted report stage calendars so T2 uses the previous business day before D1 when listing is separated from allocation by a weekend.
Verification:
- Compiled modified Python scripts with in-memory syntax checks.
- Ran SQLite quick_check and foreign_key_check.
- Confirmed DB row counts match CSV snapshots for key tables.
- Verified 01392/06132 source paths are repo-relative, raw files exist, hashes match, and PDF text manifest rows are ok.
- Ran git diff --cached --check.
Next useful context:
- 01392 T1 is due on 2026-06-18; rerun analyst after allotment results are archived.
- 06132 T1 is due on 2026-06-22; rerun analyst after allotment results are archived.
Request:
- Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text.
Changes:
- Add an incremental T1 demand text backfill script.
- Parse existing allotment-result extracted text into ipo_demand.
- Archive linked Summary PDFs from old HKEX HTML allotment-result pages.
- Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices.
- Add robust line-aware allotment parsing and document the workflow in archivist and README.
- Record the backfill result in a report.
Execution:
- Selected 137 source-only T1 demand gaps.
- Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows.
- Archived 38 new HKEX allotment-result PDFs and extracted their text.
- Confirmed an incremental rerun selects 0 gaps and writes 0 rows.
Verification:
- Ran git diff --cached --check.
- Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py.
- Checked SQLite integrity and foreign keys.
- Confirmed DB row counts match CSV snapshots.
- Verified no T1 complete row is missing ipo_demand.
- Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes.
Next useful context:
- T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due.
- T2 grey-market and due price-performance gaps remain separate archivist priorities.
- Analyst output should be regenerated before using the new T1 demand facts for scoring.
Request:
- Add extracted PDF text generation to the archivist workflow as a standard step.
Changes:
- Run PDF text extraction automatically for newly archived HKEX PDF sources.
- Make the PDF text extractor incremental and manifest-preserving.
- Document extracted-text handling in the archivist skill and README.
- Mark generated extracted text as no-diff data evidence.
- Backfill extracted text for all archived PDF source references.
Verification:
- Ran git diff --cached --check.
- Ran .venv/bin/python -m py_compile scripts/extract_pdf_text.py scripts/archive_hkex_documents.py.
- Ran full PDF extraction, then confirmed an incremental rerun skips unchanged files.
- Verified 557 PDF source_refs, 557 manifest rows, all status ok, and zero missing text/hash/path issues.
Next useful context:
- HKEX HTML notices and Yahoo JSON market data remain under data/raw and are not expected in data/extracted_text.
Request:
- Provide a way to install or develop a PDF extraction tool for archived HK IPO documents.
Changes:
- Add requirements.txt with pypdf as the lightweight PDF text extraction dependency.
- Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files.
- Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs.
- Document the extraction workflow in README.md.
- Ignore .venv and keep generated SQLite/Python transient files out of git.
- Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms.
Verification:
- Installed python3.14-venv system support, created a local .venv, and installed requirements.txt.
- Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py.
- Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv.
- Verified SQLite integrity and snapshot row counts.
- Ran git diff --cached --check and searched durable files for machine-specific absolute paths.