1227f2c7c4
Request: - Let the analyst skill generate a Markdown report directly when a new IPO ticker is provided. Changes: - Add scripts/generate_ipo_report.py for stage-safe single-ticker reports from the v0 analysis dataset. - Auto-select T1 reports when structured allotment demand exists and otherwise use T0 prospectus-stage reporting. - Keep post-listing D1/D5/D20/D60 outcomes out of prediction reports while using historical buckets for calibration. - Document the workflow in the analyst skill and README. Verification: - Ran py_compile for scripts/generate_ipo_report.py. - Generated stdout dry-run reports for 06106 and 06658. - Wrote temporary Markdown reports under /tmp for output-path validation. - Ran git diff --check. Next useful context: - Before generating a report for a ticker absent from the analysis dataset, run archivist updates and rebuild scripts/build_analysis_dataset.py.
188 lines
7.6 KiB
Markdown
188 lines
7.6 KiB
Markdown
# HK IPO
|
|
|
|
HK IPO is a project for building a repeatable, auditable research workflow for Hong Kong new listing subscription decisions.
|
|
|
|
The project is designed around a feedback loop:
|
|
|
|
1. Archive IPO facts and source documents.
|
|
2. Freeze the analysis that was possible at each decision stage.
|
|
3. Compare predictions with post-listing outcomes.
|
|
4. Improve the scoring rules only from reviewed evidence.
|
|
|
|
## Goals
|
|
|
|
- Maintain a local, Git-tracked history of Hong Kong IPO data.
|
|
- Separate factual archiving from investment judgment.
|
|
- Keep every subscription decision tied to the information available at that time.
|
|
- Review actual IPO outcomes against prior predictions.
|
|
- Build a better IPO scoring process through structured error attribution.
|
|
|
|
## Workflow
|
|
|
|
Each IPO is evaluated by stage:
|
|
|
|
- `T0_prospectus`: prospectus and offer terms only.
|
|
- `T1_allotment`: allotment results, public subscription, placing, allocation, and final pricing.
|
|
- `T2_grey_market`: grey-market result and immediate pre-listing context.
|
|
- `D1`, `D5`, `D20`, `D60`: post-listing review checkpoints.
|
|
|
|
The key discipline is to avoid hindsight leakage. A T0 prediction should only use T0 information, even after the IPO has listed.
|
|
|
|
## Project Skills
|
|
|
|
This repository includes project-local Codex skills under `.codex/skills/`.
|
|
|
|
### `archivist`
|
|
|
|
Owns facts and source control:
|
|
|
|
- archive prospectuses, allotment results, listing facts, and market data;
|
|
- record source URLs, as-of timestamps, repo-relative paths, and file hashes;
|
|
- update the embedded SQLite database;
|
|
- export Git-friendly CSV snapshots.
|
|
|
|
It does not make investment recommendations.
|
|
|
|
### `analyst`
|
|
|
|
Owns IPO judgment and review:
|
|
|
|
- produce T0/T1/T2 prediction cards;
|
|
- score IPO candidates;
|
|
- compare multiple IPOs;
|
|
- write research memos and review cards;
|
|
- classify forecast errors;
|
|
- recommend scoring-rule updates.
|
|
|
|
It should use archived facts when available and keep prediction cards append-only.
|
|
|
|
## Storage Model
|
|
|
|
The project is intended to be self-contained and portable across machines. Durable paths should always be relative to the repository root.
|
|
|
|
Expected layout:
|
|
|
|
```text
|
|
data/
|
|
hk_ipo.sqlite
|
|
raw/
|
|
snapshots/
|
|
memos/
|
|
reports/
|
|
rules/
|
|
schema/
|
|
scripts/
|
|
references/
|
|
```
|
|
|
|
Path rules:
|
|
|
|
- store paths like `data/raw/06658/prospectus.pdf`;
|
|
- do not store absolute paths;
|
|
- do not store paths with a leading `./`;
|
|
- use POSIX `/` separators;
|
|
- store file hashes for archived source documents when practical.
|
|
|
|
SQLite is the embedded source of structured facts. CSV snapshots provide readable Git diffs. Markdown memos preserve the reasoning at each decision point.
|
|
|
|
## PDF Text Extraction
|
|
|
|
Archived PDFs can be converted into searchable text files:
|
|
|
|
```bash
|
|
python3 -m venv .venv
|
|
.venv/bin/python -m pip install -r requirements.txt
|
|
.venv/bin/python scripts/extract_pdf_text.py
|
|
```
|
|
|
|
The extractor reads PDF paths from `data/hk_ipo.sqlite`, writes derived text files under `data/extracted_text/`, and exports `data/snapshots/extracted_text_manifest.csv` with page counts, text hashes, and extraction status.
|
|
|
|
The extractor is incremental. If a PDF hash and manifest row are unchanged, the existing text output is reused. Use `--force` only when extraction behavior changes and all derived text should be regenerated.
|
|
|
|
## Recent IPO Target Refresh
|
|
|
|
Use HKEXnews annual new listing reports to seed recent subscription-relevant IPO targets:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/update_recent_ipo_list.py --start-date 2023-06-15 --end-date 2026-06-15 --as-of 2026-06-15T07:30:00Z
|
|
```
|
|
|
|
The updater archives the HKEXnews XLSX reports under `data/raw/hkex_new_listing_reports/`, records report-backed source references, writes `new_listing_report_entries`, updates `ipo_master` and missing `offering_terms` fields, exports CSV snapshots, and refreshes `sync_tasks`.
|
|
|
|
Rows without an IPO offer price, such as transfers of listing, introductions, or de-SPAC transactions, are skipped by default because they are not ordinary public subscription targets.
|
|
|
|
## HKEX Document Backfill
|
|
|
|
Use the HKEX document archiver to fill detailed T0/T1 facts for open sync tasks:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T08:30:00Z
|
|
```
|
|
|
|
The archiver maps stock codes to HKEXnews title-search stock IDs, downloads the selected prospectus and allotment-results documents under `data/raw/{ticker}/`, records `source_refs`, parses high-confidence T0/T1 fields into `ipo_master`, `offering_terms`, and `ipo_demand`, exports snapshots, refreshes `sync_tasks`, and extracts text for newly archived PDF sources.
|
|
|
|
HKEX `.htm`/`.html` notices and Yahoo Finance JSON market data stay in `data/raw/`; they are not copied into `data/extracted_text/`.
|
|
|
|
## T1 Demand Text Backfill
|
|
|
|
Use the T1 demand text backfill after HKEX allotment-result sources have been archived and PDF text extraction is available:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/backfill_t1_demand_from_text.py --as-of 2026-06-15T14:15:00Z
|
|
```
|
|
|
|
The backfill is incremental. It fills only `T1_allotment` rows that have an archived allotment-results source but no `ipo_demand` row. For old HKEX HTML allotment-result pages, it archives the linked Summary PDF, extracts text, records the new source, and stores only demand fields that are explicitly present.
|
|
|
|
## Price Performance Backfill
|
|
|
|
Use the price-performance archiver to fill due D1/D5/D20/D60 review checkpoints:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z
|
|
```
|
|
|
|
The archiver stores raw Yahoo Finance chart responses under `data/raw/{ticker}/`, records source references and hashes, writes structured rows into `price_performance`, exports snapshots, and refreshes `sync_tasks`.
|
|
|
|
## Analysis Model
|
|
|
|
Use the analyst model builder to digest archived data into a stage-safe scoring dataset and calibration report:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T13:00:00Z
|
|
```
|
|
|
|
The v0 model is documented in `rules/ipo_score_v0.yaml`. It writes `data/snapshots/analysis_model_v0_dataset.csv` and `reports/2026-06-15_analysis_model_v0.md`.
|
|
|
|
The model separates T0 prospectus inputs from T1 allotment inputs. D1/D5/D20/D60 returns are labels for calibration and review, not prediction inputs.
|
|
|
|
## Single IPO Markdown Report
|
|
|
|
Use the analyst report generator after the archive and model dataset are current:
|
|
|
|
```bash
|
|
.venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T15:00:00Z
|
|
.venv/bin/python scripts/generate_ipo_report.py 06106 --stage auto
|
|
```
|
|
|
|
The generator writes `reports/{date}_{ticker}_{stage}_analysis.md` by default. It auto-selects `T1_allotment` when structured allotment-demand facts exist; otherwise it generates a `T0_prospectus` report. Use `--stdout` for a dry run or `--output` to choose a specific Markdown path.
|
|
|
|
Prediction reports are stage-safe: T0 reports use only prospectus-stage facts and T0 calibration, while T1 reports add allotment demand and T1 calibration. Post-listing D1/D5/D20/D60 performance stays out of prediction reports and is reserved for review cards.
|
|
|
|
## Incremental Archive Sync
|
|
|
|
The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
|
|
|
|
```bash
|
|
python3 scripts/update_sync_state.py
|
|
```
|
|
|
|
This writes `ticker_sync_state` and `sync_tasks` into `data/hk_ipo.sqlite`, then exports `data/snapshots/ticker_sync_state.csv`, `data/snapshots/sync_tasks.csv`, and `data/snapshots/sync_runs.csv`.
|
|
|
|
Use `sync_tasks` as the next-sync queue. Tasks marked `open` are due now; tasks marked `waiting_until_due` are known future updates.
|
|
|
|
## Git Discipline
|
|
|
|
The repository uses automatic focused commits for completed project changes.
|
|
|
|
Before committing, check that unrelated dirty files are not included and that generated durable files use repo-relative paths.
|