Files
geometrybase 7cbdd533b0 Add A/H share-class mapping workflow
Request:
- Add a repeatable mechanism so HK IPO reports detect issuers that already have Mainland A shares.
- Include a third internet/official-exchange cross-check layer beyond structured history and prospectus scans.

Changes:
- Add listed_share_classes schema support for same-issuer A-share mappings and evidence links.
- Add scripts/archive_a_share_mappings.py to scan prospectus extracted text, reject sponsor/portfolio/cornerstone false positives, archive optional official web evidence and A-share/FX quote evidence, and export snapshots on write.
- Surface a_share_* fields in the analysis dataset and single-ticker report output.
- Update hk-ipo analyst/archivist skill rules and scheduled refresh prompt to require the three-layer A/H mapping check.

Verification:
- python3 -m py_compile scripts/archive_a_share_mappings.py scripts/build_analysis_dataset.py scripts/generate_ipo_report.py
- .venv/bin/python scripts/archive_a_share_mappings.py --as-of 2026-06-24T00:00:00Z --tickers 00668,01688,03661,09630 --dry-run
- .venv/bin/python scripts/build_analysis_dataset.py --db /tmp/hk_ipo_ah_dataset_test.sqlite --dataset /tmp/hk_ipo_ah_dataset_test.csv --report /tmp/hk_ipo_ah_model_test.md --as-of 2026-06-24T00:00:00Z
- .venv/bin/python scripts/generate_ipo_report.py 09630 --dataset /tmp/hk_ipo_ah_dataset_test.csv --stdout --as-of 2026-06-24T00:00:00Z
- git diff --check

Next useful context:
- Dry-run detected 00668->300866.SZ, 01688->002600.SZ, 03661->300661.SZ, and 09630->688630.SH.
- A false positive 01688->300476.SZ from a cornerstone investor parent was rejected by the issuer-context filter.
2026-06-24 07:21:21 +00:00

23 KiB

name, description
name description
hk-ipo-analyst Use for Hong Kong IPO subscription analysis in this project: T0/T1/T2 prediction cards, scoring, cross-IPO comparison, research reports, A/H dual-listed valuation overlays, T2 grey-market source-quality reviews, post-listing reviews, error attribution, and rule-update recommendations. Use archived facts when available and keep predictions append-only.

HK IPO Analyst

Purpose

Assess Hong Kong IPO subscription candidates using the project's archived facts, scoring rules, prediction cards, and review history. This skill owns judgment: whether to subscribe, wait, avoid, sell in grey market or on D1, or revise a rule after outcomes arrive.

Use hk-ipo-archivist first when source documents, listing facts, allotment results, prices, or database snapshots need to be updated.

Core Discipline

Separate the decision stage from later facts:

  • T0_prospectus: prospectus and offer terms only.
  • T0_5_market_heat: subscription-period non-official market heat, such as broker-aggregated margin subscription multiples, observed before official allotment results.
  • T0_95_final_heat: near-deadline heat observed while the user can still place, amend, or cancel an IPO order; this is an actionable late-order stage, not an allotment-results stage.
  • T1_allotment: allotment results, public subscription, international placing, allocation, and final pricing.
  • T2_grey_market: grey-market result and immediate pre-listing trading context.
  • D1, D5, D20, D60: post-listing review checkpoints.

Do not let later facts leak into earlier prediction cards. When reviewing an older call, compare the frozen prediction against the actual outcome instead of rewriting the original judgment.

Trading Horizon

The analyst model is a short-exit IPO subscription model, not a long-term holding model.

  • The intended exit is T2_grey_market when a reliable grey-market signal and executable price are available, or D1 otherwise.
  • The default assumption is to sell allocated shares by D1 unless a later rule explicitly creates a documented exception.
  • D5/D20/D60 are review labels for learning, not holding targets and not inputs for subscription decisions.
  • Reports should frame expected return, triggers, and exit discipline around T2/D1 realization rather than long-term fundamentals.
  • Recommendations should avoid long-hold language unless the user explicitly asks for a separate long-term investment thesis.

Project Storage Contract

Use repo-relative paths everywhere:

  • Memos: memos/{ticker}_{stage}_{date}.md
  • Reports: reports/{date}_{topic}.md or another repo-relative report path requested by the user.
  • Latest candidate report mirror: reports/README.md
  • Rules: rules/ipo_score_v*.yaml and rules/rule_change_log.md
  • Source references: cite archived files using paths such as data/raw/06658/prospectus.pdf

Never store or cite machine-specific absolute paths in durable project files.

Responsibilities

  • Produce IPO subscription analysis and cross-candidate rankings.
  • Write append-only T0/T1/T2 prediction cards.
  • Include probability forecasts, score breakdowns, key reasons, risks, triggers, and exit discipline.
  • Review actual outcomes against prior predictions.
  • Attribute errors using stable tags such as fundamental_miss, valuation_miss, heat_miss, structure_miss, market_window_miss, execution_miss, and data_gap.
  • Recommend scoring-rule changes only after evidence supports them.

Boundaries

Do not silently mutate archived source facts. If facts are missing or stale, call out the data gap and use hk-ipo-archivist to update the archive before relying on them.

Do not overwrite prediction cards. If a view changes, write a new stage card or review card that references the earlier prediction.

Workflow

  1. Inspect current repo state and recent commits before changing files.
  2. Determine the requested stage: T0, T1, T2, or post-listing review.
  3. Load available archived facts and rules from repo-relative project files.
  4. If facts are missing or stale, update the archive through hk-ipo-archivist or state the gap clearly.
  5. Score the IPO using the current rule version.
  6. Record probability forecasts rather than only directional language.
  7. Write a memo/report with data-as-of time, rule version, sources, score, decision, and triggers.
  8. For reviews, compare the frozen prediction to actual outcomes and classify the error type.
  9. Commit only the related memo/report/rule changes after verification.

Broad Candidate Report Layout

For broad/latest candidate reports whose purpose is deciding what can be subscribed now, order the body for action first:

  1. Scoring/ranking table for currently actionable IPOs.
  2. Fundamentals cross-check for the current batch.
  3. Break-probability, risk/reward, and capital-efficiency framework for every current or recently closed ticker whose D1 break/non-break outcome is not yet confirmed.
  4. Per-IPO notes, execution guidance, and closed/waiting names.
  5. Recent 30-day listed-IPO review as post-hoc calibration.
  6. Data-refresh guardrails and sources.

Do not lead broad candidate reports with post-listing reviews; keep those after the live decision layers so the reader sees the actionable ranking first.

Report Name Display Rules

For all analyst-generated Markdown reports, prediction cards, review cards, and broad candidate tables, display the market-facing stock name rather than the issuer's full legal name.

  • In every table that identifies IPOs, keep the stock-code column as ticker/code only and add a dedicated 股票名 column immediately after it.
  • Do not put company names in the stock-code column, and do not use legal company names as the primary display label in report tables.
  • Treat 股票名 as the stock-app-style display name. Prefer stock_short_name first, then an official or archived Chinese short name from HKEX Chinese pages, broker pages, or prospectus/allotment text. Examples: use 圣邦股份, not 圣邦微电子(北京)股份有限公司; use 领益智造, not 广东领益智造股份有限公司.
  • If no sourced Chinese short stock name is available but an archived source gives an English stock/display name, use that English stock name as-is. Example: use MERDEKAGOLD-DRS for 06228 instead of leaving the report display name blank.
  • For recent listed-IPO review rows, if stock_short_name is missing, an archived external_ipo_history.stock_name may be used as the 股票名 display fallback. Treat this only as a display name from the external history source, not as official HKEX issuer-name evidence.
  • Keep the full legal Chinese issuer name in company_name_zh for archival/source disambiguation, and mention it only in prose when the legal entity identity matters.
  • If no sourced stock/display name is available in any language, use hk-ipo-archivist to refresh or backfill it before writing the report. If it still cannot be sourced, write data_gap in the 股票名 column and state the gap; do not invent a machine-translated short name.
  • English legal names may remain in source paths, URLs, raw source titles, quoted source context, or narrow notes where needed for disambiguation, but they should not replace sourced stock/display names in the report body.
  • Apply this rule consistently to the actionable ranking, fundamentals table, closed/waiting list, A/H overlay, recent 30-day review, and generated single-ticker reports.

Latest Candidate Report Refresh

For scheduled runs, latest IPO list refreshes, and broad candidate reports, first use hk-ipo-archivist to refresh the latest internet-sourced IPO universe and archive updates before making analyst judgments. This includes the HKEX current new-listing page, newly available prospectuses, allotment-result announcements, listing-calendar changes, recent price-performance rows for review, and subscription-period market heat such as broker-aggregated margin subscription multiples.

Before rebuilding the analysis dataset or latest report, refresh A/H or other onshore share-class mappings with a three-layer check:

  1. Use the structured listed_share_classes table and data/snapshots/listed_share_classes.csv from prior archive runs.
  2. Scan archived prospectus extracted text for issuer-context A-share evidence with scripts/archive_a_share_mappings.py; require same-issuer wording such as existing A Shares, SSE/SZSE, ChiNext, STAR Market, or an issuer stock code, and reject sponsor, portfolio-company, or unrelated shareholder contexts.
  3. Use internet search and supported official exchange pages as a public cross-check for current or recently closed candidates. Archive reproducible official exchange evidence when supported; otherwise state that web cross-check evidence is a data_gap and rely on prospectus evidence as primary source.

Refresh the report content as a complete current snapshot, not as a partial patch. The dated report should be written to reports/{date}_latest_ipo_candidates_analysis.md and should update all relevant sections: actionable ranking, fundamentals, break-probability/risk-reward, capital efficiency, per-IPO notes, closed/waiting names, recent 30-day listed-IPO review with T2 grey-market context when available, data gaps, and sources.

The break-probability, risk/reward, and capital-efficiency section must cover all current or recently closed IPOs that do not yet have confirmed D1 break/non-break information, including names whose subscription window has already closed and are waiting for T1/T2/D1. Do not drop a ticker from this section merely because the user can no longer subscribe; keep its probability/risk view visible until D1 outcome is archived or explicitly confirmed. Once D1 is confirmed, move it out of the probability table and into the recent listed-IPO review.

If any current or recently closed candidate already has A Shares or another onshore-listed share class, the latest report must include the A/H dual-listed overlay described below. Do not leave this only implicit in the fundamentals section.

Market heat and subscription multiples must remain stage-safe:

  • Unofficial live subscription or margin multiples belong in archived ipo_market_heat rows with provider, observed_at, stage, and source path.
  • Do not copy T0.5/T0.95 market heat into official T1 public-oversubscription fields.
  • Official public subscription, international placing demand, one-lot success, and final offer-price fields should come only from archived allotment-result sources or other documented official T1 sources.
  • If a current network source cannot be refreshed, label the affected field as data_gap and keep the stale timestamp visible.

After writing the dated latest report, copy the same final Markdown content to reports/README.md. This README is an overwriteable mirror of the latest broad candidate report so opening the reports/ directory surfaces the current report first; the dated report remains the durable historical copy.

A/H Dual-Listed IPO Overlay

When an issuer already has Mainland A Shares or another onshore listed share class, explicitly mark it as an A/H or dual-listed valuation case in every broad report, single-ticker report, and relevant prediction card. Treat this as a different pricing setup from a pure first-time HK IPO.

Detection cues include prospectus language such as existing A Shares, Shenzhen Stock Exchange, Shanghai Stock Exchange, ChiNext, STAR Market, an A-share stock code such as 300661.SZ or 002600.SZ, or HKEX waiver/pricing text that references the A-share closing price.

Do not rely on manual memory or a short hard-coded whitelist. Use archived structured mappings first, refresh them from prospectus text, and add internet/official-exchange search cross-checks for live candidates:

.venv/bin/python scripts/archive_a_share_mappings.py --as-of YYYY-MM-DDTHH:MM:SSZ --web-cross-check --archive-quotes
.venv/bin/python scripts/build_analysis_dataset.py --as-of YYYY-MM-DDTHH:MM:SSZ

If the dataset has a_share_ticker, the report must include the A/H overlay. If the prospectus text suggests an A-share relation but the mapping is uncertain, include it as an explicit data_gap or low-confidence mapping rather than omitting it.

The report should include a compact A/H note or table covering:

  • A-share ticker, exchange, and whether it is the same issuer, parent, subsidiary, or only a comparable/affiliate.
  • A-share latest close and recent 5/20-day trend as of data_as_of, with source and timestamp.
  • H-share maximum/final offer price, exchange-rate assumption, and implied H/A premium or discount.
  • A-share market capitalization/liquidity and the H-share offer size/free-float context.
  • Whether the prospectus says the H-share offer price references the A-share closing price or sets a maximum price linked to A-share trading.
  • A/H fungibility guardrail: A Shares and H Shares are usually not interchangeable or fungible, and there is no direct arbitrage unless the issuer has a documented full-circulation or conversion mechanism.
  • Analyst conclusion on how the A-share anchor changes D1/T2 logic: valuation support, premium/discount risk, A-share momentum spillover, HK liquidity gap, and whether hot A-share pricing is already embedded.

Do not mechanically treat an H-share discount to A shares as a guaranteed upside. Separate valuation anchor from trading demand: an A-share reference can reduce valuation uncertainty, but T0.95/T1 heat, Hong Kong float, cornerstone lockup, and market-window quality still drive short-exit IPO payoff.

Single-Ticker Markdown Report

When the user gives a single IPO ticker and asks for an analyst report, use the report generator after archived facts and the analysis dataset are current:

.venv/bin/python scripts/build_analysis_dataset.py --as-of YYYY-MM-DDTHH:MM:SSZ
.venv/bin/python scripts/generate_ipo_report.py 06658 --stage auto

The generator writes reports/{date}_{ticker}_{stage}_analysis.md by default. Use --stdout for a dry run, --stage T0_prospectus to force a prospectus-stage report, or --stage T1_allotment only when structured T1 demand exists.

If the ticker is absent from data/snapshots/analysis_model_v0_dataset.csv, use hk-ipo-archivist first to archive the IPO facts and rebuild the analysis dataset before generating the report.

Generated prediction reports must remain stage-safe:

  • Analyst Markdown reports should be written in Simplified Chinese by default, while preserving ticker symbols, stage codes, rule ids, source paths, and stable error tags in their original machine-readable form.
  • T0 reports use only prospectus-stage fields and T0 calibration.
  • T0.5 reports may add archived ipo_market_heat rows, but must label them as non-official, live market-heat snapshots and include observed_at, provider, and source path.
  • T1 reports may add allotment demand fields and T1 calibration.
  • T2/D1 is the intended sell window; D5/D20/D60 returns are never shown as prediction inputs and are reserved for later review cards.
  • Every report must include a concrete stage calendar for the ticker: T0 subscription window, T1 allotment-result date, T2 grey-market date/window, and D1 listing date.

T0.5 Market Heat Overlay

Use rules/ipo_score_v0_5_market_heat_trial.yaml when the user asks to include subscription-period heat before official allotment results.

T0.5 discipline:

  • Use hk-ipo-archivist first to archive the raw source page and structured ipo_market_heat rows.
  • Keep T0.5 separate from official T1 demand. Do not copy T0.5 margin multiples into ipo_demand.public_oversubscription_times.
  • Keep third-party final history, such as external_ipo_history.public_oversubscription_times, separate from T0.5. It is useful for post-hoc calibration but is not available at the original T0.5 decision time.
  • Treat raw margin multiples as less reliable when IPOs are at different points in their subscription windows.
  • Freeze the observed_at timestamp in the report so later T1/D1 reviews can test whether the heat signal helped.
  • Write T0.5 conclusions as watchlist upgrades/downgrades, not as final high-conviction subscription calls.

T0.95 Late-Order Heat

Use rules/ipo_score_v0_95_final_heat_trial.yaml when the user can still place an IPO order near the subscription cutoff and asks to use near-final market heat.

T0.95 discipline:

  • Treat T0_95_final_heat as its own decision stage: later than ordinary T0.5, earlier than official T1 allotment results.
  • The key condition is executability. A T0.95 snapshot is valid only if observed_at is before the user's actual order, amend, or cancel cutoff.
  • Use archived ipo_market_heat rows with stage = 'T0_95_final_heat' when available. If only T0_5_market_heat rows exist, explicitly state that the report is using an earlier heat snapshot, not a true T0.95 snapshot.
  • Historical final public oversubscription from external_ipo_history may be used as a calibration proxy for near-final heat buckets, because T0.95 is close to the final demand state. It must still be labelled as post-hoc calibration, not as data that was visible in the live case.
  • Official allotment-result fields, final T1 public oversubscription, grey-market prices, and D1 returns remain forbidden as live T0.95 inputs unless they were actually available before the user's executable order cutoff, which should be rare and must be documented.
  • T0.95 recommendations may be stronger than T0.5 watchlist language, but they must include expected allocation probability: very strong heat often improves D1 payoff odds while sharply reducing one-lot win rate.

T2 Grey-Market Review Overlay

When producing T2 reports, post-listing reviews, recent listed-IPO review tables, error attribution, or rule-update recommendations, include T2 grey-market price evidence when a reliable archived source exists. The intended exit window is T2/D1, so reviews should compare the frozen recommendation against both the executable grey-market outcome and the D1 outcome.

Required T2 fields when available:

  • grey_market_date and observed time/window.
  • provider or venue, such as PhillipMart, Futu, Chief, KGI, Webull, broker account evidence, or a named aggregator.
  • last or close grey-market price, and open/high/low/turnover/volume when available.
  • grey-market return versus final offer price, plus one-lot mark-to-market profit/loss.
  • source tier, source path, URL, archived timestamp, and whether the quote was executable for the user.
  • comparison to D1 open/close return, with a short read on whether T2 gave a better exit than waiting for D1.

Use this source credibility hierarchy:

  1. Tier 1 executable broker evidence: user-provided order/trade confirmation, account statement, or broker export from a grey-market platform, and direct broker grey-market pages or APIs that show timestamped executable quotes from the broker's own system. These are best for T2 execution analysis when they can be archived with permission and sanitized where needed.
  2. Tier 2 public broker quote pages: broker-operated public quote pages with stock code, trade date, timestamp, last/close price, high/low, turnover or volume. Use when archiving is allowed and the source can be reproduced. Label the provider clearly; liquidity may be venue-specific.
  3. Tier 3 attributed aggregators: pages such as financial portals or IPO-history datasets that state the underlying broker/provider or publish a historical grey_market_return_pct. Use for calibration and cross-checks when Tier 1/2 evidence is missing, but label them as non-executable third-party summaries.
  4. Tier 4 news, forum posts, screenshots, and social media: use only as qualitative color or conflict checks. Do not treat these as authoritative price evidence unless no other source exists and the report marks the field as weak evidence.

Do not scrape or redistribute proprietary grey-market feeds without an approved source strategy. If the project only has external_ipo_history.grey_market_return_pct, state that it is an external historical summary rather than a broker-executable price. If no reliable T2 source exists, mark T2_grey_market as data_gap and do not infer it from D1.

Recent Listing Review Overlay

When producing a broad candidate report, latest IPO list refresh, or cross-IPO subscription ranking, include a recent listed-IPO review unless the user explicitly asks for a narrow single-name answer.

Review discipline:

  • Use the last 30 calendar days ending at data_as_of by default, or the user-specified window when provided.
  • Define the sample explicitly by listing-date range and listing method.
  • Build a compact table with one row per recent IPO covering structure, fundamentals, T1 allotment demand, T2 grey-market performance when reliable, D1 performance, and the PM lesson.
  • Structure should include at least T0 score and the offer-size/minimum-subscription context when available.
  • Fundamentals should be a short issuer-quality read from archived prospectus facts, not a long-term thesis.
  • T1 performance should include public oversubscription, international placing demand, total score or decision band, and one-lot/application success when available.
  • T2 performance should include grey-market provider, source tier, grey-market return, and whether T2 would have improved or worsened the intended exit versus D1. If T2 data is missing or only available from weak sources, label it as data_gap or weak evidence and do not infer it from D1.
  • D1 performance should include D1 return and turnover when available. If D1 data is missing, label it as a data_gap and do not infer break/non-break from blank fields.
  • Keep the review post-hoc: use it to calibrate live rules and base rates, but do not let D1/D5/D20/D60 facts leak into current unlisted candidate scores.
  • Add a short mapping from the recent outcomes back to the current candidate batch, clearly distinguishing official T1 demand from non-official T0.5/T0.95 heat.

Output Standards

Every prediction card should include:

  • ticker
  • stage
  • data_as_of
  • concrete T0/T0.95/T1/T2/D1 dates or windows for the ticker when applicable
  • rule_version
  • decision
  • total_score
  • score breakdown
  • probability forecast
  • expected return framing
  • key bull points
  • key risks
  • triggers for upgrade/downgrade
  • exit plan
  • explicit T2/D1 sell discipline
  • source paths

Language standard:

  • Write analyst reports, prediction cards, and review cards in Simplified Chinese by default.
  • Keep field identifiers, model versions, score buckets, ticker symbols, and source paths as code-formatted English identifiers when they are part of the project data contract.

Every review card should include:

  • linked prediction card
  • actual IPO outcome
  • T2 grey-market price/return and source-quality tier when reliable, or explicit data_gap
  • direction correctness
  • magnitude error
  • reason correctness
  • execution assessment
  • error tags
  • rule-change recommendation, if any

Quality Checks

Before finishing, confirm:

  • The analysis stage matches the information set used.
  • Later facts are not used in earlier-stage conclusions.
  • Paths in durable files are repo-relative.
  • Probabilities and scores are explicit.
  • Facts, assumptions, estimates, inferences, and PM judgment are separated.
  • Any rule update has a named trigger case and an effective date.