Files
hk-ipo/reports/2026-06-15_analysis_model_v0.md
T
geometrybase 5f94bbfde9 Refresh latest HK IPO report with A/H and T2 overlays
Request:
- Regenerate the latest HK IPO candidate report.
- Include current IPO candidate refresh, updated subscription heat, A/H dual-listed pricing analysis for 03661 and 01688, and T2 grey-market review context.

Changes:
- Refreshed HKEX current-listing, VBKR/Jieli T0.95 heat, ipohk external history, sync-state, and analysis-model snapshots as of 2026-06-23T08:53:26Z.
- Archived raw Yahoo chart evidence for 300661.SZ, 002600.SZ, and HKD/CNY so the A/H discount overlay has a reproducible local source.
- Regenerated reports/2026-06-23_latest_ipo_candidates_analysis.md and mirrored it to reports/README.md.
- Added the generated model v0 report from the rebuilt analysis dataset.
- Marked T2 grey-market evidence quality explicitly: ipohk grey-market returns are Tier 3 historical summaries, while newer June IPOs remain data_gap.

Verification:
- Confirmed reports/README.md matches the dated latest report with cmp.
- Ran git diff --check and git diff --cached --check.
- Verified all repo-relative paths referenced by the latest report exist.
- Verified source_refs paths are repo-relative, existing, and hash-matching.
- Recomputed A/H overlay values from archived raw JSON: 03661 discount about 46.2%, 01688 discount about 45.1%.

Next useful context:
- The 2026-06-23T08:53:26Z heat snapshot leaves the same 8 currently actionable candidates as the 07:00 refresh, with small heat updates.
- 02335 and 06106 still have no official T1 demand rows in the project archive.
- A/H discounts are valuation anchors, not direct arbitrage, because the A and H shares are not fungible.
2026-06-23 09:02:48 +00:00

5.7 KiB

HK IPO Analysis Model v0

  • Model version: ipo_score_v0
  • Analysis as of: 2026-06-23T08:53:26Z
  • Rule file: rules/ipo_score_v0.yaml
  • Dataset: data/snapshots/analysis_model_v0_dataset.csv

What This Model Does

This is the first analyst model built from the downloaded archive. It creates a repeatable feature table, scores each IPO using stage-safe rules, and calibrates the score buckets against archived D1 sell outcomes. It is intentionally transparent: the output includes every score component and the archived source paths used for each ticker.

The model is built for a short IPO allocation trade: sell in T2 grey market when reliable executable data exists, or sell on D1 otherwise. It does not use grey-market data in v0 because T2 currently has no approved reproducible source. It also does not use post-listing returns as inputs; D1 is the primary sell label, while D5/D20/D60 are review labels only.

Data Inventory

  • IPO rows scored: 312
  • Rows with D1 labels: 278
  • Rows with structured T1 demand fields: 295
  • Rows with prospectus source path: 312
  • Rows with allotment source path: 295
  • Rows with offer size: 312
  • Rows with public oversubscription: 285
  • Rows with international oversubscription: 280
  • Rows with market heat snapshots: 19
  • Rows with T0.5 margin heat snapshots: 5
  • Rows with T0.95 late-order heat snapshots: 14
  • Rows with T0.5 margin heat and D1 labels: 3
  • Rows with T0.95 late-order heat and D1 labels: 0
  • Rows matched to external ipohk history: 102
  • Rows with external final oversubscription: 95
  • Rows with external final oversubscription and D1 labels: 86
  • Rows pending T1 structure: 17 (00668, 01191, 01688, 01956, 02272, 02335, 02667, 02672, 02697, 03661, 03952, 06106, 06228, 06715, 06915, 09630, 09637)
  • T1 field-level blanks: public oversubscription 10, international oversubscription 15, valid applications 6, successful applications 18

T0 Calibration

T0 uses only prospectus-stage structure: offer size, initial public offer percentage, minimum subscription amount, offer price band, and over-allotment availability.

Bucket N D1 positive D1 >= 10% Avg D1 return Median D1 return
t0_1_to_4 60 63.3% 40.0% 9.6 3.1
t0_5_to_7 107 73.8% 52.3% 42.6 14.1
t0_gte_8 75 76.0% 48.0% 28.8 9.8
t0_lt_1 36 58.3% 33.3% 12.8 2.3

T1 Calibration

T1 adds allotment-stage demand: public subscription, international placing demand, valid application count, application success rate, and HK public offer reallocation.

Bucket N D1 positive D1 >= 10% Avg D1 return Median D1 return
total_0_to_9 68 58.8% 30.9% 3.3 0.2
total_10_to_17 29 55.2% 34.5% 13.9 1.5
total_18_to_25 49 75.5% 51.0% 31.3 13.4
total_gte_26 64 93.8% 87.5% 86.9 78.1
total_lt_0 68 61.8% 23.5% 0.4 1.0

T0.5 Market Heat

T0.5 uses archived subscription-period margin heat snapshots. T0.95 is the near-deadline subset that is still actionable before the user's order cutoff. These are non-official live signals and are kept separate from T1 allotment demand. The current archive is not yet a historical training set: it has too few rows and no D1 labels for calibration.

  • Total market heat rows: 19
  • T0.5 margin rows: 5
  • T0.5 rows with D1 labels: 3
  • T0.95 late-order heat rows: 14
  • T0.95 rows with D1 labels: 0

External Final Heat Proxy

The ipohk history archive adds final public oversubscription, one-lot win rate, grey-market return, and first-day return where available. These fields are useful for coverage checks and post-hoc calibration, but they are not T0.5 inputs because they are final or near-final history.

  • External history rows matched into this dataset: 102
  • Matched rows with final oversubscription: 95
  • Matched rows with final oversubscription and D1 labels: 86
Bucket N D1 positive D1 >= 10% Avg D1 return Median D1 return
external_os_1000x_to_5000x 34 94.1% 79.4% 60.7 56.7
external_os_100x_to_1000x 21 61.9% 38.1% 8.8 4.2
external_os_10x_to_100x 7 28.6% 14.3% -23.0 -21.9
external_os_gte_5000x 18 83.3% 72.2% 101.7 89.7
external_os_lt_10x 6 50.0% 16.7% 4.7 -4.1

Current Read

After the T1 demand text backfill, the strongest v0 T1 bucket is total_gte_26 with 64 historical D1 observations and a 93.8% D1 positive rate. The model is most useful after allotment results are available; T0 is a watchlist filter rather than a final subscription call.

The high-conviction bucket remains clearly differentiated, but the middle and low score buckets are still not monotonic. This refresh keeps the v0 score formula unchanged and updates empirical calibration only; future rule changes should come from reviewed prediction cards rather than overfitting this historical sample.

Usage

  1. Run scripts/build_analysis_dataset.py after archivist updates the database.
  2. Use t0_score for prospectus-stage watchlisting.
  3. Use total_score, decision_band, and calibrated_d1_positive_rate for T1-stage subscription cards.
  4. Frame live decisions around a T2 or D1 sell, not long-term holding.
  5. Treat D5/D20/D60 columns as review labels only, never as prediction inputs or holding targets.

Known Gaps

  • T1 is structurally complete for listed rows; residual field-level NULLs remain when the archived source does not explicitly state a demand field.
  • Industry and issuer fundamentals are not sufficiently structured for model input.
  • T2 grey-market signal is blocked pending an approved source.
  • Extreme D1 returns should be audited before they drive rule changes.