Add external IPO history to heat model

Request:
- Add historical data around T0.5 margin heat and rebuild the model.

Changes:
- Add external_ipo_history to store third-party historical IPO records separately from true T0.5 market-heat snapshots.
- Add scripts/archive_ipohk_history.py to archive ipohk structured listed IPO history.
- Archive 807 ipohk rows, including final oversubscription, one-lot win rate, grey-market return, and first-day return where available.
- Extend the v0 analysis dataset with true T0.5 market-heat columns and separate external final-heat columns.
- Rebuild reports/2026-06-15_analysis_model_v0.md with T0.5 coverage and external final-heat calibration.
- Add a Chinese report explaining why historical final oversubscription cannot be treated as T0.5 margin snapshots.
- Update analyst and archivist skills to keep T0.5 and external final history separate.

Verification:
- .venv/bin/python -m py_compile scripts/build_analysis_dataset.py scripts/archive_ipohk_history.py scripts/archive_t0_5_market_heat.py
- .venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T19:20:00Z
- Python sqlite3 PRAGMA integrity_check returned ok and foreign_key_check returned zero rows.
- Confirmed 807 external_ipo_history rows, 792 rows with external final oversubscription, 5 true T0.5 market-heat rows, and 297 analysis dataset rows.
- git diff --cached --check

Next useful context:
- True T0.5 historical backtesting still requires ongoing frozen margin-heat snapshots during each IPO subscription window.
This commit is contained in:
2026-06-15 16:06:56 +00:00
parent 222f55c140
commit 943eab27cb
12 changed files with 1589 additions and 299 deletions
+29 -1
View File
@@ -1,7 +1,7 @@
# HK IPO Analysis Model v0
- Model version: `ipo_score_v0`
- Analysis as of: `2026-06-15T18:20:00Z`
- Analysis as of: `2026-06-15T19:20:00Z`
- Rule file: `rules/ipo_score_v0.yaml`
- Dataset: `data/snapshots/analysis_model_v0_dataset.csv`
@@ -21,6 +21,11 @@ The model is built for a short IPO allocation trade: sell in T2 grey market when
- Rows with offer size: 297
- Rows with public oversubscription: 281
- Rows with international oversubscription: 277
- Rows with T0.5 margin heat snapshots: 5
- Rows with T0.5 margin heat and D1 labels: 0
- Rows matched to external ipohk history: 102
- Rows with external final oversubscription: 95
- Rows with external final oversubscription and D1 labels: 85
- Rows pending T1 structure: 6 (01392, 02335, 06067, 06106, 06132, 06675)
- T1 field-level blanks: public oversubscription 10, international oversubscription 14, valid applications 6, successful applications 18
@@ -47,6 +52,29 @@ T1 adds allotment-stage demand: public subscription, international placing deman
| total_gte_26 | 59 | 94.9% | 88.1% | 86.7 | 80.0 |
| total_lt_0 | 68 | 61.8% | 23.5% | 0.4 | 1.0 |
## T0.5 Market Heat
T0.5 uses archived subscription-period margin heat snapshots. These are non-official live signals and are kept separate from T1 allotment demand. The current archive is not yet a historical training set: it has too few rows and no D1 labels for calibration.
- T0.5 margin rows: 5
- T0.5 rows with D1 labels: 0
## External Final Heat Proxy
The ipohk history archive adds final public oversubscription, one-lot win rate, grey-market return, and first-day return where available. These fields are useful for coverage checks and post-hoc calibration, but they are not T0.5 inputs because they are final or near-final history.
- External history rows matched into this dataset: 102
- Matched rows with final oversubscription: 95
- Matched rows with final oversubscription and D1 labels: 85
| Bucket | N | D1 positive | D1 >= 10% | Avg D1 return | Median D1 return |
| --- | ---: | ---: | ---: | ---: | ---: |
| external_os_1000x_to_5000x | 33 | 93.9% | 78.8% | 60.4 | 44.2 |
| external_os_100x_to_1000x | 21 | 61.9% | 38.1% | 8.8 | 4.2 |
| external_os_10x_to_100x | 7 | 28.6% | 14.3% | -23.0 | -21.9 |
| external_os_gte_5000x | 18 | 83.3% | 72.2% | 101.7 | 89.7 |
| external_os_lt_10x | 6 | 50.0% | 16.7% | 4.7 | -4.1 |
## Current Read
After the T1 demand text backfill, the strongest v0 T1 bucket is `total_gte_26` with 59 historical D1 observations and a 94.9% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.
@@ -0,0 +1,58 @@
# 2026-06-15 T0.5 历史数据与模型重建说明
## 结论
这次已经把可复现的历史数据接进模型,但不能把它称为“全部历史 T0.5 孖展倍数”。
- 真实 `T0_5_market_heat`:目前只有华盛/捷利页面的实时快照,已归档 5 条当前申购标的。
- 历史结构化数据:`ipohk` 可提供 807 条历史上市数据,其中包括最终超购倍数、一手中签率、暗盘涨幅和首日涨幅。
- 关键限制:`ipohk` 的“超购倍数”是最终或接近最终结果,不是申购过程中的 T0.5 孖展快照,不能倒灌进 T0.5。
所以,本次模型重建采用两层处理:
1. `t0_5_*` 字段:真实申购期间孖展热度,只来自可归档快照。
2. `external_*` 字段:历史最终热度与结果参照,只用于覆盖检查和后验校准。
## 本次新增数据
| 数据集 | 行数 | 用途 | 阶段安全性 |
| --- | ---: | --- | --- |
| `ipo_market_heat` | 5 | 当前申购标的 T0.5 孖展热度 | 可用于 T0.5,但历史样本不足 |
| `external_ipo_history` | 807 | 历史最终超购、中签率、暗盘、首日表现 | 后验参照,不可用于 T0/T0.5 决策 |
| `analysis_model_v0_dataset.csv` | 297 | 主模型训练/分析数据集 | 已加入 T0.5 与 external 字段 |
## 重建后模型覆盖
- 主模型 IPO 行数:297
- D1 标签:273
- 真实 T0.5 孖展快照:5
- 真实 T0.5 且有 D1 标签:0
- 匹配到 `ipohk` 历史记录:102
- 匹配到 `ipohk` 最终超购倍数:95
- `ipohk` 最终超购倍数且有 D1 标签:85
## 如何读模型结果
真实 T0.5 目前还不能做统计回测,因为 5 条都是当前未上市标的,没有 D1 结果。
`ipohk` 历史最终超购倍数可以用来验证一个方向:高最终热度是否对应更好的 D1 胜率。但它不能回答“当时申购中途看到的孖展倍数是否有效”,因为它不是当时的冻结快照。
## 已更新文件
- 原始历史数据:`data/raw/external_history/ipohk_listed_20260615T191000Z.json`
- 结构化历史数据:`data/snapshots/external_ipo_history.csv`
- 主模型数据集:`data/snapshots/analysis_model_v0_dataset.csv`
- 模型报告:`reports/2026-06-15_analysis_model_v0.md`
- 历史归档脚本:`scripts/archive_ipohk_history.py`
## 下一步
真正让 T0.5 进入可回测模型,需要从今天开始持续采样申购期快照。
建议采样节奏:
- T0.5 early:申购首日收盘附近。
- T0.5 mid:申购中段。
- T0.5 final:截止前半天或截止前一晚。
未来每只 IPO 至少保留一到三个冻结快照,等 T1/T2/D1 出来后再做 `heat_miss``structure_miss``market_window_miss` 复盘。