Files

T

geometrybase 58ad869f84 Refresh IPO analysis model calibration

Request:
- Re-analyze the IPO model using the updated historical archive after T1 demand backfill.

Changes:
- Regenerate the v0 analysis dataset from the current SQLite archive.
- Refresh the v0 calibration report with expanded T1 coverage and new empirical bucket rates.
- Update the report template to show pending T1 rows and field-level blanks.
- Clarify v0 limitations and record why the score formula stays unchanged for this refresh.

Verification:
- Ran scripts/build_analysis_dataset.py against data/hk_ipo.sqlite.
- Ran py_compile for scripts/build_analysis_dataset.py.
- Checked dataset row count, T1 demand coverage, source-only T1 gaps, and repo-relative paths.
- Ran git diff --check.

Next useful context:
- T1 structured coverage is now 291 rows, with 06106 and 06675 still pending_not_due.
- The high-conviction T1 bucket remains differentiated, but middle and low buckets are still not monotonic enough for a v1 rule change.

2026-06-15 14:05:34 +00:00

3.5 KiB

Raw Blame History

HK IPO Analysis Model v0

Model version: ipo_score_v0
Analysis as of: 2026-06-15T14:04:34Z
Rule file: rules/ipo_score_v0.yaml
Dataset: data/snapshots/analysis_model_v0_dataset.csv

What This Model Does

This is the first analyst model built from the downloaded archive. It creates a repeatable feature table, scores each IPO using stage-safe rules, and calibrates the score buckets against archived D1 outcomes. It is intentionally transparent: the output includes every score component and the archived source paths used for each ticker.

The model does not use grey-market data in v0 because T2 currently has no approved reproducible source. It also does not use post-listing returns as inputs; returns are labels only.

Data Inventory

IPO rows scored: 293
Rows with D1 labels: 273
Rows with structured T1 demand fields: 291
Rows with prospectus source path: 293
Rows with allotment source path: 291
Rows with offer size: 293
Rows with public oversubscription: 281
Rows with international oversubscription: 277
Rows pending T1 structure: 2 (06106, 06675)
T1 field-level blanks: public oversubscription 10, international oversubscription 14, valid applications 6, successful applications 18

T0 Calibration

T0 uses only prospectus-stage structure: offer size, initial public offer percentage, minimum subscription amount, offer price band, and over-allotment availability.

Bucket	N	D1 positive	D1 >= 10%	Avg D1 return	Median D1 return
t0_1_to_4	60	63.3%	40.0%	9.6	3.1
t0_5_to_7	105	73.3%	51.4%	40.1	13.2
t0_gte_8	72	76.4%	47.2%	28.6	9.6
t0_lt_1	36	58.3%	33.3%	12.8	2.3

T1 Calibration

T1 adds allotment-stage demand: public subscription, international placing demand, valid application count, application success rate, and HK public offer reallocation.

Bucket	N	D1 positive	D1 >= 10%	Avg D1 return	Median D1 return
total_0_to_9	68	58.8%	30.9%	3.3	0.2
total_10_to_17	29	55.2%	34.5%	13.9	1.5
total_18_to_25	49	75.5%	51.0%	31.3	13.4
total_gte_26	59	94.9%	88.1%	86.7	80.0
total_lt_0	68	61.8%	23.5%	0.4	1.0

Current Read

After the T1 demand text backfill, the strongest v0 T1 bucket is total_gte_26 with 59 historical D1 observations and a 94.9% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.

The high-conviction bucket remains clearly differentiated, but the middle and low score buckets are still not monotonic. This refresh keeps the v0 score formula unchanged and updates empirical calibration only; future rule changes should come from reviewed prediction cards rather than overfitting this historical sample.

Usage

Run scripts/build_analysis_dataset.py after archivist updates the database.
Use t0_score for prospectus-stage watchlisting.
Use total_score, decision_band, and calibrated_d1_positive_rate for T1-stage subscription cards.
Treat D1/D5/D20/D60 columns as review labels only, never as prediction inputs.

Known Gaps

T1 is structurally complete for listed rows; residual field-level NULLs remain when the archived source does not explicitly state a demand field.
Industry and issuer fundamentals are not sufficiently structured for model input.
T2 grey-market signal is blocked pending an approved source.
Extreme D1 returns should be audited before they drive rule changes.

3.5 KiB Raw Blame History