Add A/H share-class mapping workflow
Request: - Add a repeatable mechanism so HK IPO reports detect issuers that already have Mainland A shares. - Include a third internet/official-exchange cross-check layer beyond structured history and prospectus scans. Changes: - Add listed_share_classes schema support for same-issuer A-share mappings and evidence links. - Add scripts/archive_a_share_mappings.py to scan prospectus extracted text, reject sponsor/portfolio/cornerstone false positives, archive optional official web evidence and A-share/FX quote evidence, and export snapshots on write. - Surface a_share_* fields in the analysis dataset and single-ticker report output. - Update hk-ipo analyst/archivist skill rules and scheduled refresh prompt to require the three-layer A/H mapping check. Verification: - python3 -m py_compile scripts/archive_a_share_mappings.py scripts/build_analysis_dataset.py scripts/generate_ipo_report.py - .venv/bin/python scripts/archive_a_share_mappings.py --as-of 2026-06-24T00:00:00Z --tickers 00668,01688,03661,09630 --dry-run - .venv/bin/python scripts/build_analysis_dataset.py --db /tmp/hk_ipo_ah_dataset_test.sqlite --dataset /tmp/hk_ipo_ah_dataset_test.csv --report /tmp/hk_ipo_ah_model_test.md --as-of 2026-06-24T00:00:00Z - .venv/bin/python scripts/generate_ipo_report.py 09630 --dataset /tmp/hk_ipo_ah_dataset_test.csv --stdout --as-of 2026-06-24T00:00:00Z - git diff --check Next useful context: - Dry-run detected 00668->300866.SZ, 01688->002600.SZ, 03661->300661.SZ, and 09630->688630.SH. - A false positive 01688->300476.SZ from a cornerstone investor parent was rejected by the issuer-context filter.
This commit is contained in:
@@ -109,6 +109,12 @@ For all analyst-generated Markdown reports, prediction cards, review cards, and
|
||||
|
||||
For scheduled runs, latest IPO list refreshes, and broad candidate reports, first use `hk-ipo-archivist` to refresh the latest internet-sourced IPO universe and archive updates before making analyst judgments. This includes the HKEX current new-listing page, newly available prospectuses, allotment-result announcements, listing-calendar changes, recent price-performance rows for review, and subscription-period market heat such as broker-aggregated margin subscription multiples.
|
||||
|
||||
Before rebuilding the analysis dataset or latest report, refresh A/H or other onshore share-class mappings with a three-layer check:
|
||||
|
||||
1. Use the structured `listed_share_classes` table and `data/snapshots/listed_share_classes.csv` from prior archive runs.
|
||||
2. Scan archived prospectus extracted text for issuer-context A-share evidence with `scripts/archive_a_share_mappings.py`; require same-issuer wording such as existing A Shares, SSE/SZSE, ChiNext, STAR Market, or an issuer stock code, and reject sponsor, portfolio-company, or unrelated shareholder contexts.
|
||||
3. Use internet search and supported official exchange pages as a public cross-check for current or recently closed candidates. Archive reproducible official exchange evidence when supported; otherwise state that web cross-check evidence is a `data_gap` and rely on prospectus evidence as primary source.
|
||||
|
||||
Refresh the report content as a complete current snapshot, not as a partial patch. The dated report should be written to `reports/{date}_latest_ipo_candidates_analysis.md` and should update all relevant sections: actionable ranking, fundamentals, break-probability/risk-reward, capital efficiency, per-IPO notes, closed/waiting names, recent 30-day listed-IPO review with T2 grey-market context when available, data gaps, and sources.
|
||||
|
||||
The break-probability, risk/reward, and capital-efficiency section must cover all current or recently closed IPOs that do not yet have confirmed D1 break/non-break information, including names whose subscription window has already closed and are waiting for T1/T2/D1. Do not drop a ticker from this section merely because the user can no longer subscribe; keep its probability/risk view visible until D1 outcome is archived or explicitly confirmed. Once D1 is confirmed, move it out of the probability table and into the recent listed-IPO review.
|
||||
@@ -130,6 +136,15 @@ When an issuer already has Mainland A Shares or another onshore listed share cla
|
||||
|
||||
Detection cues include prospectus language such as existing `A Shares`, `Shenzhen Stock Exchange`, `Shanghai Stock Exchange`, `ChiNext`, `STAR Market`, an A-share stock code such as `300661.SZ` or `002600.SZ`, or HKEX waiver/pricing text that references the A-share closing price.
|
||||
|
||||
Do not rely on manual memory or a short hard-coded whitelist. Use archived structured mappings first, refresh them from prospectus text, and add internet/official-exchange search cross-checks for live candidates:
|
||||
|
||||
```bash
|
||||
.venv/bin/python scripts/archive_a_share_mappings.py --as-of YYYY-MM-DDTHH:MM:SSZ --web-cross-check --archive-quotes
|
||||
.venv/bin/python scripts/build_analysis_dataset.py --as-of YYYY-MM-DDTHH:MM:SSZ
|
||||
```
|
||||
|
||||
If the dataset has `a_share_ticker`, the report must include the A/H overlay. If the prospectus text suggests an A-share relation but the mapping is uncertain, include it as an explicit `data_gap` or low-confidence mapping rather than omitting it.
|
||||
|
||||
The report should include a compact A/H note or table covering:
|
||||
|
||||
- A-share ticker, exchange, and whether it is the same issuer, parent, subsidiary, or only a comparable/affiliate.
|
||||
|
||||
@@ -157,6 +157,24 @@ The extractor is incremental: unchanged PDFs with matching manifest rows are ski
|
||||
|
||||
Do not expect `data/extracted_text/` entries for Yahoo JSON market data or HKEX `.htm`/`.html` notices. Those are already text-like raw evidence files and are tracked under `data/raw/`.
|
||||
|
||||
## A/H Share-Class Mapping Archive
|
||||
|
||||
When current or recently closed IPO candidates may already have Mainland A Shares or another onshore listed share class, refresh the structured mapping archive before analyst reports consume the dataset:
|
||||
|
||||
```bash
|
||||
.venv/bin/python scripts/archive_a_share_mappings.py --as-of YYYY-MM-DDTHH:MM:SSZ --web-cross-check --archive-quotes
|
||||
```
|
||||
|
||||
The archive uses three evidence layers:
|
||||
|
||||
1. Structured prior mappings in `listed_share_classes` and `data/snapshots/listed_share_classes.csv`.
|
||||
2. Issuer-context prospectus text scans from `data/snapshots/extracted_text_manifest.csv`.
|
||||
3. Internet search or supported official exchange pages as public cross-check evidence.
|
||||
|
||||
Prospectus text remains the primary source for same-issuer identity. Public web evidence supports the mapping when it is reproducible and can be archived as a `source_refs` row. If a web cross-check is not supported for a market or exchange page, leave `web_source_id` blank and make the analyst report state the web-evidence gap rather than inventing a source.
|
||||
|
||||
For detected same-issuer A-share mappings, archive recent A-share quote data and HKD/CNY FX evidence when `--archive-quotes` is used so the analyst can compute A/H discount or premium without relying on stale manual prices. Do not use sponsor, portfolio-company, shareholder, or comparable-company stock codes as same-issuer mappings.
|
||||
|
||||
## T1 Demand Text Backfill
|
||||
|
||||
When audit finds T1 rows where an allotment-results source is archived but `ipo_demand` is missing, use the text backfill script:
|
||||
|
||||
Reference in New Issue
Block a user