hk-ipo/reports/2026-06-15_analysis_model_v0.md

# HK IPO Analysis Model v0

- Model version: `ipo_score_v0`
- Analysis as of: `2026-06-15T13:00:00Z`
- Rule file: `rules/ipo_score_v0.yaml`
- Dataset: `data/snapshots/analysis_model_v0_dataset.csv`

## What This Model Does

This is the first analyst model built from the downloaded archive. It creates a repeatable feature table, scores each IPO using stage-safe rules, and calibrates the score buckets against archived D1 outcomes. It is intentionally transparent: the output includes every score component and the archived source paths used for each ticker.

The model does not use grey-market data in v0 because T2 currently has no approved reproducible source. It also does not use post-listing returns as inputs; returns are labels only.

## Data Inventory

- IPO rows scored: 293
- Rows with D1 labels: 273
- Rows with structured T1 demand fields: 154
- Rows with prospectus source path: 293
- Rows with allotment source path: 291
- Rows with offer size: 293
- Rows with public oversubscription: 144
- Rows with international oversubscription: 153

## T0 Calibration

T0 uses only prospectus-stage structure: offer size, initial public offer percentage, minimum subscription amount, offer price band, and over-allotment availability.

| Bucket | N | D1 positive | D1 >= 10% | Avg D1 return | Median D1 return |
| --- | ---: | ---: | ---: | ---: | ---: |
| t0_1_to_4 | 60 | 63.3% | 40.0% | 9.6 | 3.1 |
| t0_5_to_7 | 105 | 73.3% | 51.4% | 40.1 | 13.2 |
| t0_gte_8 | 72 | 76.4% | 47.2% | 28.6 | 9.6 |
| t0_lt_1 | 36 | 58.3% | 33.3% | 12.8 | 2.3 |

## T1 Calibration

T1 adds allotment-stage demand: public subscription, international placing demand, valid application count, application success rate, and HK public offer reallocation.

| Bucket | N | D1 positive | D1 >= 10% | Avg D1 return | Median D1 return |
| --- | ---: | ---: | ---: | ---: | ---: |
| total_0_to_9 | 35 | 60.0% | 34.3% | 3.5 | 0.2 |
| total_10_to_17 | 17 | 47.1% | 23.5% | 1.6 | -0.1 |
| total_18_to_25 | 33 | 72.7% | 54.5% | 28.0 | 14.8 |
| total_gte_26 | 43 | 97.7% | 95.3% | 101.2 | 88.8 |
| total_lt_0 | 15 | 73.3% | 20.0% | 0.3 | 3.4 |

## Initial Read

The strongest v0 T1 bucket is `total_gte_26` with 43 historical D1 observations and a 97.7% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.

The middle score buckets are not monotonic yet. That is a feature, not a bug report: v0 is exposing where the current rules are too coarse and where missing T1 demand facts weaken calibration. Future rule changes should come from reviewed prediction cards, not from overfitting this initial sample.

## Usage

1. Run `scripts/build_analysis_dataset.py` after archivist updates the database.
2. Use `t0_score` for prospectus-stage watchlisting.
3. Use `total_score`, `decision_band`, and `calibrated_d1_positive_rate` for T1-stage subscription cards.
4. Treat D1/D5/D20/D60 columns as review labels only, never as prediction inputs.

## Known Gaps

- T1 demand parsing is incomplete for older HTML-only allotment announcements.
- Industry and issuer fundamentals are not sufficiently structured for model input.
- T2 grey-market signal is blocked pending an approved source.
- Extreme D1 returns should be audited before they drive rule changes.