Files
hk-ipo/reports/2026-06-15_analysis_model_v0.md
T
geometrybase 77b405e4f3 Add T0 analyst reports for active IPOs
Request:
- Analyze HK IPO ticker 01392 with the analyst skill.
- Preserve the in-flight 06132 archive/report work already created for the prior request.

Changes:
- Archived official HKEX prospectus PDFs and extracted text for 01392 and 06132.
- Seeded structured T0 facts into the SQLite archive and refreshed CSV snapshots and sync state.
- Rebuilt the v0 analysis dataset and model calibration report.
- Generated Simplified Chinese T0 prospectus-stage analyst reports for 01392 and 06132.
- Adjusted report stage calendars so T2 uses the previous business day before D1 when listing is separated from allocation by a weekend.

Verification:
- Compiled modified Python scripts with in-memory syntax checks.
- Ran SQLite quick_check and foreign_key_check.
- Confirmed DB row counts match CSV snapshots for key tables.
- Verified 01392/06132 source paths are repo-relative, raw files exist, hashes match, and PDF text manifest rows are ok.
- Ran git diff --cached --check.

Next useful context:
- 01392 T1 is due on 2026-06-18; rerun analyst after allotment results are archived.
- 06132 T1 is due on 2026-06-22; rerun analyst after allotment results are archived.
2026-06-15 14:51:44 +00:00

3.8 KiB

HK IPO Analysis Model v0

  • Model version: ipo_score_v0
  • Analysis as of: 2026-06-15T16:00:00Z
  • Rule file: rules/ipo_score_v0.yaml
  • Dataset: data/snapshots/analysis_model_v0_dataset.csv

What This Model Does

This is the first analyst model built from the downloaded archive. It creates a repeatable feature table, scores each IPO using stage-safe rules, and calibrates the score buckets against archived D1 sell outcomes. It is intentionally transparent: the output includes every score component and the archived source paths used for each ticker.

The model is built for a short IPO allocation trade: sell in T2 grey market when reliable executable data exists, or sell on D1 otherwise. It does not use grey-market data in v0 because T2 currently has no approved reproducible source. It also does not use post-listing returns as inputs; D1 is the primary sell label, while D5/D20/D60 are review labels only.

Data Inventory

  • IPO rows scored: 295
  • Rows with D1 labels: 273
  • Rows with structured T1 demand fields: 291
  • Rows with prospectus source path: 295
  • Rows with allotment source path: 291
  • Rows with offer size: 295
  • Rows with public oversubscription: 281
  • Rows with international oversubscription: 277
  • Rows pending T1 structure: 4 (01392, 06106, 06132, 06675)
  • T1 field-level blanks: public oversubscription 10, international oversubscription 14, valid applications 6, successful applications 18

T0 Calibration

T0 uses only prospectus-stage structure: offer size, initial public offer percentage, minimum subscription amount, offer price band, and over-allotment availability.

Bucket N D1 positive D1 >= 10% Avg D1 return Median D1 return
t0_1_to_4 60 63.3% 40.0% 9.6 3.1
t0_5_to_7 105 73.3% 51.4% 40.1 13.2
t0_gte_8 72 76.4% 47.2% 28.6 9.6
t0_lt_1 36 58.3% 33.3% 12.8 2.3

T1 Calibration

T1 adds allotment-stage demand: public subscription, international placing demand, valid application count, application success rate, and HK public offer reallocation.

Bucket N D1 positive D1 >= 10% Avg D1 return Median D1 return
total_0_to_9 68 58.8% 30.9% 3.3 0.2
total_10_to_17 29 55.2% 34.5% 13.9 1.5
total_18_to_25 49 75.5% 51.0% 31.3 13.4
total_gte_26 59 94.9% 88.1% 86.7 80.0
total_lt_0 68 61.8% 23.5% 0.4 1.0

Current Read

After the T1 demand text backfill, the strongest v0 T1 bucket is total_gte_26 with 59 historical D1 observations and a 94.9% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.

The high-conviction bucket remains clearly differentiated, but the middle and low score buckets are still not monotonic. This refresh keeps the v0 score formula unchanged and updates empirical calibration only; future rule changes should come from reviewed prediction cards rather than overfitting this historical sample.

Usage

  1. Run scripts/build_analysis_dataset.py after archivist updates the database.
  2. Use t0_score for prospectus-stage watchlisting.
  3. Use total_score, decision_band, and calibrated_d1_positive_rate for T1-stage subscription cards.
  4. Frame live decisions around a T2 or D1 sell, not long-term holding.
  5. Treat D5/D20/D60 columns as review labels only, never as prediction inputs or holding targets.

Known Gaps

  • T1 is structurally complete for listed rows; residual field-level NULLs remain when the archived source does not explicitly state a demand field.
  • Industry and issuer fundamentals are not sufficiently structured for model input.
  • T2 grey-market signal is blocked pending an approved source.
  • Extreme D1 returns should be audited before they drive rule changes.