Files

T

geometrybase 48b89552fe Add IPO analysis model baseline

Request:
- Use the analyst skill to digest downloaded IPO archive data and start building an analysis model.

Changes:
- Add ipo_score_v0 as the first transparent stage-safe scoring rule set.
- Add build_analysis_dataset.py to derive model features, scores, decision bands, and empirical D1 calibration from SQLite.
- Generate analysis_model_v0_dataset.csv with 293 scored IPO rows and archived source paths.
- Add a model calibration report documenting coverage, T0/T1 bucket performance, usage, and known gaps.
- Record the initial model entry in the rule change log and document the command in README.

Verification:
- Ran py_compile for scripts/build_analysis_dataset.py.
- Regenerated the analysis dataset and report with as-of 2026-06-15T13:00:00Z.
- Checked CSV row count, source path coverage, and repo-relative path hygiene.
- Ran git diff --cached --check.

Next useful context:
- v0 should be treated as a transparent baseline, with T1 high-score calibration strongest and middle buckets still non-monotonic.
- T2 is excluded until a reliable grey-market source is approved.

2026-06-15 12:49:48 +00:00

3.2 KiB

Raw Blame History

HK IPO Analysis Model v0

Model version: ipo_score_v0
Analysis as of: 2026-06-15T13:00:00Z
Rule file: rules/ipo_score_v0.yaml
Dataset: data/snapshots/analysis_model_v0_dataset.csv

What This Model Does

This is the first analyst model built from the downloaded archive. It creates a repeatable feature table, scores each IPO using stage-safe rules, and calibrates the score buckets against archived D1 outcomes. It is intentionally transparent: the output includes every score component and the archived source paths used for each ticker.

The model does not use grey-market data in v0 because T2 currently has no approved reproducible source. It also does not use post-listing returns as inputs; returns are labels only.

Data Inventory

IPO rows scored: 293
Rows with D1 labels: 273
Rows with structured T1 demand fields: 154
Rows with prospectus source path: 293
Rows with allotment source path: 291
Rows with offer size: 293
Rows with public oversubscription: 144
Rows with international oversubscription: 153

T0 Calibration

T0 uses only prospectus-stage structure: offer size, initial public offer percentage, minimum subscription amount, offer price band, and over-allotment availability.

Bucket	N	D1 positive	D1 >= 10%	Avg D1 return	Median D1 return
t0_1_to_4	60	63.3%	40.0%	9.6	3.1
t0_5_to_7	105	73.3%	51.4%	40.1	13.2
t0_gte_8	72	76.4%	47.2%	28.6	9.6
t0_lt_1	36	58.3%	33.3%	12.8	2.3

T1 Calibration

T1 adds allotment-stage demand: public subscription, international placing demand, valid application count, application success rate, and HK public offer reallocation.

Bucket	N	D1 positive	D1 >= 10%	Avg D1 return	Median D1 return
total_0_to_9	35	60.0%	34.3%	3.5	0.2
total_10_to_17	17	47.1%	23.5%	1.6	-0.1
total_18_to_25	33	72.7%	54.5%	28.0	14.8
total_gte_26	43	97.7%	95.3%	101.2	88.8
total_lt_0	15	73.3%	20.0%	0.3	3.4

Initial Read

The strongest v0 T1 bucket is total_gte_26 with 43 historical D1 observations and a 97.7% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.

The middle score buckets are not monotonic yet. That is a feature, not a bug report: v0 is exposing where the current rules are too coarse and where missing T1 demand facts weaken calibration. Future rule changes should come from reviewed prediction cards, not from overfitting this initial sample.

Usage

Run scripts/build_analysis_dataset.py after archivist updates the database.
Use t0_score for prospectus-stage watchlisting.
Use total_score, decision_band, and calibrated_d1_positive_rate for T1-stage subscription cards.
Treat D1/D5/D20/D60 columns as review labels only, never as prediction inputs.

Known Gaps

T1 demand parsing is incomplete for older HTML-only allotment announcements.
Industry and issuer fundamentals are not sufficiently structured for model input.
T2 grey-market signal is blocked pending an approved source.
Extreme D1 returns should be audited before they drive rule changes.

3.2 KiB Raw Blame History