Add external IPO history to heat model
Request: - Add historical data around T0.5 margin heat and rebuild the model. Changes: - Add external_ipo_history to store third-party historical IPO records separately from true T0.5 market-heat snapshots. - Add scripts/archive_ipohk_history.py to archive ipohk structured listed IPO history. - Archive 807 ipohk rows, including final oversubscription, one-lot win rate, grey-market return, and first-day return where available. - Extend the v0 analysis dataset with true T0.5 market-heat columns and separate external final-heat columns. - Rebuild reports/2026-06-15_analysis_model_v0.md with T0.5 coverage and external final-heat calibration. - Add a Chinese report explaining why historical final oversubscription cannot be treated as T0.5 margin snapshots. - Update analyst and archivist skills to keep T0.5 and external final history separate. Verification: - .venv/bin/python -m py_compile scripts/build_analysis_dataset.py scripts/archive_ipohk_history.py scripts/archive_t0_5_market_heat.py - .venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T19:20:00Z - Python sqlite3 PRAGMA integrity_check returned ok and foreign_key_check returned zero rows. - Confirmed 807 external_ipo_history rows, 792 rows with external final oversubscription, 5 true T0.5 market-heat rows, and 297 analysis dataset rows. - git diff --cached --check Next useful context: - True T0.5 historical backtesting still requires ongoing frozen margin-heat snapshots during each IPO subscription window.
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# HK IPO Analysis Model v0
|
||||
|
||||
- Model version: `ipo_score_v0`
|
||||
- Analysis as of: `2026-06-15T18:20:00Z`
|
||||
- Analysis as of: `2026-06-15T19:20:00Z`
|
||||
- Rule file: `rules/ipo_score_v0.yaml`
|
||||
- Dataset: `data/snapshots/analysis_model_v0_dataset.csv`
|
||||
|
||||
@@ -21,6 +21,11 @@ The model is built for a short IPO allocation trade: sell in T2 grey market when
|
||||
- Rows with offer size: 297
|
||||
- Rows with public oversubscription: 281
|
||||
- Rows with international oversubscription: 277
|
||||
- Rows with T0.5 margin heat snapshots: 5
|
||||
- Rows with T0.5 margin heat and D1 labels: 0
|
||||
- Rows matched to external ipohk history: 102
|
||||
- Rows with external final oversubscription: 95
|
||||
- Rows with external final oversubscription and D1 labels: 85
|
||||
- Rows pending T1 structure: 6 (01392, 02335, 06067, 06106, 06132, 06675)
|
||||
- T1 field-level blanks: public oversubscription 10, international oversubscription 14, valid applications 6, successful applications 18
|
||||
|
||||
@@ -47,6 +52,29 @@ T1 adds allotment-stage demand: public subscription, international placing deman
|
||||
| total_gte_26 | 59 | 94.9% | 88.1% | 86.7 | 80.0 |
|
||||
| total_lt_0 | 68 | 61.8% | 23.5% | 0.4 | 1.0 |
|
||||
|
||||
## T0.5 Market Heat
|
||||
|
||||
T0.5 uses archived subscription-period margin heat snapshots. These are non-official live signals and are kept separate from T1 allotment demand. The current archive is not yet a historical training set: it has too few rows and no D1 labels for calibration.
|
||||
|
||||
- T0.5 margin rows: 5
|
||||
- T0.5 rows with D1 labels: 0
|
||||
|
||||
## External Final Heat Proxy
|
||||
|
||||
The ipohk history archive adds final public oversubscription, one-lot win rate, grey-market return, and first-day return where available. These fields are useful for coverage checks and post-hoc calibration, but they are not T0.5 inputs because they are final or near-final history.
|
||||
|
||||
- External history rows matched into this dataset: 102
|
||||
- Matched rows with final oversubscription: 95
|
||||
- Matched rows with final oversubscription and D1 labels: 85
|
||||
|
||||
| Bucket | N | D1 positive | D1 >= 10% | Avg D1 return | Median D1 return |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: |
|
||||
| external_os_1000x_to_5000x | 33 | 93.9% | 78.8% | 60.4 | 44.2 |
|
||||
| external_os_100x_to_1000x | 21 | 61.9% | 38.1% | 8.8 | 4.2 |
|
||||
| external_os_10x_to_100x | 7 | 28.6% | 14.3% | -23.0 | -21.9 |
|
||||
| external_os_gte_5000x | 18 | 83.3% | 72.2% | 101.7 | 89.7 |
|
||||
| external_os_lt_10x | 6 | 50.0% | 16.7% | 4.7 | -4.1 |
|
||||
|
||||
## Current Read
|
||||
|
||||
After the T1 demand text backfill, the strongest v0 T1 bucket is `total_gte_26` with 59 historical D1 observations and a 94.9% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.
|
||||
|
||||
@@ -0,0 +1,58 @@
|
||||
# 2026-06-15 T0.5 历史数据与模型重建说明
|
||||
|
||||
## 结论
|
||||
|
||||
这次已经把可复现的历史数据接进模型,但不能把它称为“全部历史 T0.5 孖展倍数”。
|
||||
|
||||
- 真实 `T0_5_market_heat`:目前只有华盛/捷利页面的实时快照,已归档 5 条当前申购标的。
|
||||
- 历史结构化数据:`ipohk` 可提供 807 条历史上市数据,其中包括最终超购倍数、一手中签率、暗盘涨幅和首日涨幅。
|
||||
- 关键限制:`ipohk` 的“超购倍数”是最终或接近最终结果,不是申购过程中的 T0.5 孖展快照,不能倒灌进 T0.5。
|
||||
|
||||
所以,本次模型重建采用两层处理:
|
||||
|
||||
1. `t0_5_*` 字段:真实申购期间孖展热度,只来自可归档快照。
|
||||
2. `external_*` 字段:历史最终热度与结果参照,只用于覆盖检查和后验校准。
|
||||
|
||||
## 本次新增数据
|
||||
|
||||
| 数据集 | 行数 | 用途 | 阶段安全性 |
|
||||
| --- | ---: | --- | --- |
|
||||
| `ipo_market_heat` | 5 | 当前申购标的 T0.5 孖展热度 | 可用于 T0.5,但历史样本不足 |
|
||||
| `external_ipo_history` | 807 | 历史最终超购、中签率、暗盘、首日表现 | 后验参照,不可用于 T0/T0.5 决策 |
|
||||
| `analysis_model_v0_dataset.csv` | 297 | 主模型训练/分析数据集 | 已加入 T0.5 与 external 字段 |
|
||||
|
||||
## 重建后模型覆盖
|
||||
|
||||
- 主模型 IPO 行数:297
|
||||
- D1 标签:273
|
||||
- 真实 T0.5 孖展快照:5
|
||||
- 真实 T0.5 且有 D1 标签:0
|
||||
- 匹配到 `ipohk` 历史记录:102
|
||||
- 匹配到 `ipohk` 最终超购倍数:95
|
||||
- `ipohk` 最终超购倍数且有 D1 标签:85
|
||||
|
||||
## 如何读模型结果
|
||||
|
||||
真实 T0.5 目前还不能做统计回测,因为 5 条都是当前未上市标的,没有 D1 结果。
|
||||
|
||||
`ipohk` 历史最终超购倍数可以用来验证一个方向:高最终热度是否对应更好的 D1 胜率。但它不能回答“当时申购中途看到的孖展倍数是否有效”,因为它不是当时的冻结快照。
|
||||
|
||||
## 已更新文件
|
||||
|
||||
- 原始历史数据:`data/raw/external_history/ipohk_listed_20260615T191000Z.json`
|
||||
- 结构化历史数据:`data/snapshots/external_ipo_history.csv`
|
||||
- 主模型数据集:`data/snapshots/analysis_model_v0_dataset.csv`
|
||||
- 模型报告:`reports/2026-06-15_analysis_model_v0.md`
|
||||
- 历史归档脚本:`scripts/archive_ipohk_history.py`
|
||||
|
||||
## 下一步
|
||||
|
||||
真正让 T0.5 进入可回测模型,需要从今天开始持续采样申购期快照。
|
||||
|
||||
建议采样节奏:
|
||||
|
||||
- T0.5 early:申购首日收盘附近。
|
||||
- T0.5 mid:申购中段。
|
||||
- T0.5 final:截止前半天或截止前一晚。
|
||||
|
||||
未来每只 IPO 至少保留一到三个冻结快照,等 T1/T2/D1 出来后再做 `heat_miss`、`structure_miss` 和 `market_window_miss` 复盘。
|
||||
Reference in New Issue
Block a user