T

geometrybase 797bbde201 Prefer Chinese company names in IPO reports

Request:
- Update the selected analyst reports so stock/company names include Chinese names and use Chinese names first.

Changes:
- Updated the selected T0 reports for 01392, 06067, 06106, and 06132 to show Chinese company names in the title and summary, with English names in parentheses.
- Added company_name_zh to the analyst dataset so report generation has access to Chinese names.
- Updated the report generator to prefer Chinese company names and fall back to English names only when Chinese names are unavailable.
- Filled Chinese company names for the selected tickers in ipo_master and refreshed snapshots.

Verification:
- Compiled build_analysis_dataset.py and generate_ipo_report.py.
- Ran generator dry-runs for 06132 and 01392 to confirm Chinese-first output.
- Ran SQLite integrity_check and foreign_key_check.
- Ran git diff --check.

Next useful context:
- Future generated analyst reports now use company_name_zh first when available.

2026-06-15 15:11:15 +00:00

.codex/skills

Use Chinese for analyst reports

2026-06-15 14:37:46 +00:00

data

Prefer Chinese company names in IPO reports

2026-06-15 15:11:15 +00:00

reports

Prefer Chinese company names in IPO reports

2026-06-15 15:11:15 +00:00

rules

Use Chinese for analyst reports

2026-06-15 14:37:46 +00:00

schema

Archive recent HKEX IPO targets

2026-06-15 06:42:31 +00:00

scripts

Prefer Chinese company names in IPO reports

2026-06-15 15:11:15 +00:00

.gitattributes

Make PDF text extraction a standard archive step

2026-06-15 13:27:41 +00:00

.gitignore

Add PDF text extraction workflow

2026-06-15 06:21:16 +00:00

AGENTS.md

Require automatic upstream pushes

2026-06-15 06:22:44 +00:00

README.md

Use Chinese for analyst reports

2026-06-15 14:37:46 +00:00

requirements.txt

Archive recent HKEX IPO targets

2026-06-15 06:42:31 +00:00

README.md

HK IPO

HK IPO is a project for building a repeatable, auditable research workflow for Hong Kong new listing subscription decisions.

The project is designed around a feedback loop:

Archive IPO facts and source documents.
Freeze the analysis that was possible at each decision stage.
Compare predictions with post-listing outcomes.
Improve the scoring rules only from reviewed evidence.

The investment horizon is deliberately short. The subscription model is built for selling allocated shares in T2_grey_market when a reliable executable grey-market signal exists, or on D1 otherwise. It is not a long-term holding model.

Goals

Maintain a local, Git-tracked history of Hong Kong IPO data.
Separate factual archiving from investment judgment.
Keep every subscription decision tied to the information available at that time.
Review actual IPO outcomes against prior predictions.
Build a better IPO scoring process through structured error attribution.

Workflow

Each IPO is evaluated by stage:

T0_prospectus: prospectus and offer terms only.
T1_allotment: allotment results, public subscription, placing, allocation, and final pricing.
T2_grey_market: grey-market result and immediate pre-listing context.
D1, D5, D20, D60: post-listing review checkpoints.

The key discipline is to avoid hindsight leakage. A T0 prediction should only use T0 information, even after the IPO has listed.

The default exit discipline is T2/D1. D5/D20/D60 outcomes are kept to review whether the model missed later information, not to justify holding IPO allocations beyond the intended sell window.

Project Skills

This repository includes project-local Codex skills under .codex/skills/.

`archivist`

Owns facts and source control:

archive prospectuses, allotment results, listing facts, and market data;
record source URLs, as-of timestamps, repo-relative paths, and file hashes;
update the embedded SQLite database;
export Git-friendly CSV snapshots.

It does not make investment recommendations.

`analyst`

Owns IPO judgment and review:

produce T0/T1/T2 prediction cards;
score IPO candidates;
compare multiple IPOs;
write research memos and review cards;
classify forecast errors;
recommend scoring-rule updates.

It should use archived facts when available and keep prediction cards append-only.

Storage Model

The project is intended to be self-contained and portable across machines. Durable paths should always be relative to the repository root.

Expected layout:

data/
  hk_ipo.sqlite
  raw/
  snapshots/
memos/
reports/
rules/
schema/
scripts/
references/

Path rules:

store paths like data/raw/06658/prospectus.pdf;
do not store absolute paths;
do not store paths with a leading ./;
use POSIX / separators;
store file hashes for archived source documents when practical.

SQLite is the embedded source of structured facts. CSV snapshots provide readable Git diffs. Markdown memos preserve the reasoning at each decision point.

PDF Text Extraction

Archived PDFs can be converted into searchable text files:

python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python scripts/extract_pdf_text.py

The extractor reads PDF paths from data/hk_ipo.sqlite, writes derived text files under data/extracted_text/, and exports data/snapshots/extracted_text_manifest.csv with page counts, text hashes, and extraction status.

The extractor is incremental. If a PDF hash and manifest row are unchanged, the existing text output is reused. Use --force only when extraction behavior changes and all derived text should be regenerated.

Recent IPO Target Refresh

Use HKEXnews annual new listing reports to seed recent subscription-relevant IPO targets:

.venv/bin/python scripts/update_recent_ipo_list.py --start-date 2023-06-15 --end-date 2026-06-15 --as-of 2026-06-15T07:30:00Z

The updater archives the HKEXnews XLSX reports under data/raw/hkex_new_listing_reports/, records report-backed source references, writes new_listing_report_entries, updates ipo_master and missing offering_terms fields, exports CSV snapshots, and refreshes sync_tasks.

Rows without an IPO offer price, such as transfers of listing, introductions, or de-SPAC transactions, are skipped by default because they are not ordinary public subscription targets.

HKEX Document Backfill

Use the HKEX document archiver to fill detailed T0/T1 facts for open sync tasks:

.venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T08:30:00Z

The archiver maps stock codes to HKEXnews title-search stock IDs, downloads the selected prospectus and allotment-results documents under data/raw/{ticker}/, records source_refs, parses high-confidence T0/T1 fields into ipo_master, offering_terms, and ipo_demand, exports snapshots, refreshes sync_tasks, and extracts text for newly archived PDF sources.

HKEX .htm/.html notices and Yahoo Finance JSON market data stay in data/raw/; they are not copied into data/extracted_text/.

T1 Demand Text Backfill

Use the T1 demand text backfill after HKEX allotment-result sources have been archived and PDF text extraction is available:

.venv/bin/python scripts/backfill_t1_demand_from_text.py --as-of 2026-06-15T14:15:00Z

The backfill is incremental. It fills only T1_allotment rows that have an archived allotment-results source but no ipo_demand row. For old HKEX HTML allotment-result pages, it archives the linked Summary PDF, extracts text, records the new source, and stores only demand fields that are explicitly present.

Price Performance Backfill

Use the price-performance archiver to fill due D1/D5/D20/D60 review checkpoints:

.venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z

The archiver stores raw Yahoo Finance chart responses under data/raw/{ticker}/, records source references and hashes, writes structured rows into price_performance, exports snapshots, and refreshes sync_tasks.

Analysis Model

Use the analyst model builder to digest archived data into a stage-safe scoring dataset and calibration report:

.venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T13:00:00Z

The v0 model is documented in rules/ipo_score_v0.yaml. It writes data/snapshots/analysis_model_v0_dataset.csv and reports/2026-06-15_analysis_model_v0.md.

The model separates T0 prospectus inputs from T1 allotment inputs. Its live trading target is a T2 or D1 sale. D5/D20/D60 returns are labels for calibration and review, not prediction inputs or planned holding targets.

Single IPO Markdown Report

Use the analyst report generator after the archive and model dataset are current:

.venv/bin/python scripts/build_analysis_dataset.py --as-of 2026-06-15T15:00:00Z
.venv/bin/python scripts/generate_ipo_report.py 06106 --stage auto

The generator writes reports/{date}_{ticker}_{stage}_analysis.md by default. It auto-selects T1_allotment when structured allotment-demand facts exist; otherwise it generates a T0_prospectus report. Use --stdout for a dry run or --output to choose a specific Markdown path.

Single-IPO analyst reports are written in Simplified Chinese by default, while ticker symbols, stage codes, rule ids, and source paths remain machine-readable. Prediction reports are stage-safe: T0 reports use only prospectus-stage facts and T0 calibration, while T1 reports add allotment demand and T1 calibration. Each report includes a concrete stage calendar for that ticker: T0 subscription window, T1 allotment-result date, T2 grey-market date/window, and D1 listing date. Reports should frame the trade as a T2/D1 exit. Post-listing D5/D20/D60 performance stays out of prediction reports and is reserved for review cards.

Incremental Archive Sync

The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:

python3 scripts/update_sync_state.py

This writes ticker_sync_state and sync_tasks into data/hk_ipo.sqlite, then exports data/snapshots/ticker_sync_state.csv, data/snapshots/sync_tasks.csv, and data/snapshots/sync_runs.csv.

Use sync_tasks as the next-sync queue. Tasks marked open are due now; tasks marked waiting_until_due are known future updates.

Git Discipline

The repository uses automatic focused commits for completed project changes.

Before committing, check that unrelated dirty files are not included and that generated durable files use repo-relative paths.