Files

T

geometrybase c65b20a1c4 Archive recent HKEX IPO targets

Request:
Use the project archivist workflow to update IPO target coverage for the most recent three-year window.

Changes:
- Add scripts/update_recent_ipo_list.py to discover HKEXnews annual new listing reports, archive XLSX sources, parse subscription-relevant IPO rows, and update SQLite plus snapshots.
- Add new_listing_report_entries to preserve annual report row-level evidence.
- Archive 2023-2026 Main Board new listing reports and 2024-2026 GEM new listing reports.
- Seed 290 report-backed IPO targets for 2023-06-15 through 2026-06-15, skipping 10 non-IPO rows without numeric offer prices.
- Refresh ipo_master, missing offering_terms fields, source_refs, ticker_sync_state, and sync_tasks.
- Add openpyxl as the XLSX parser dependency and document the archivist refresh flow.
- Limit sync summary output while keeping the full queue in SQLite and CSV snapshots.

Verification:
- Ran update_recent_ipo_list.py for 2023-06-15 to 2026-06-15 with as-of 2026-06-15T07:30:00Z.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- ipo_master now has 293 tickers; new_listing_report_entries has 290 report-backed targets.
- Current sync queue has 2005 open tasks and 42 waiting_until_due tasks for deeper per-ticker archival stages.

2026-06-15 06:42:31 +00:00

4.4 KiB

Raw Blame History

HK IPO

HK IPO is a project for building a repeatable, auditable research workflow for Hong Kong new listing subscription decisions.

The project is designed around a feedback loop:

Archive IPO facts and source documents.
Freeze the analysis that was possible at each decision stage.
Compare predictions with post-listing outcomes.
Improve the scoring rules only from reviewed evidence.

Goals

Maintain a local, Git-tracked history of Hong Kong IPO data.
Separate factual archiving from investment judgment.
Keep every subscription decision tied to the information available at that time.
Review actual IPO outcomes against prior predictions.
Build a better IPO scoring process through structured error attribution.

Workflow

Each IPO is evaluated by stage:

T0_prospectus: prospectus and offer terms only.
T1_allotment: allotment results, public subscription, placing, allocation, and final pricing.
T2_grey_market: grey-market result and immediate pre-listing context.
D1, D5, D20, D60: post-listing review checkpoints.

The key discipline is to avoid hindsight leakage. A T0 prediction should only use T0 information, even after the IPO has listed.

Project Skills

This repository includes project-local Codex skills under .codex/skills/.

`archivist`

Owns facts and source control:

archive prospectuses, allotment results, listing facts, and market data;
record source URLs, as-of timestamps, repo-relative paths, and file hashes;
update the embedded SQLite database;
export Git-friendly CSV snapshots.

It does not make investment recommendations.

`analyst`

Owns IPO judgment and review:

produce T0/T1/T2 prediction cards;
score IPO candidates;
compare multiple IPOs;
write research memos and review cards;
classify forecast errors;
recommend scoring-rule updates.

It should use archived facts when available and keep prediction cards append-only.

Storage Model

The project is intended to be self-contained and portable across machines. Durable paths should always be relative to the repository root.

Expected layout:

data/
  hk_ipo.sqlite
  raw/
  snapshots/
memos/
reports/
rules/
schema/
scripts/
references/

Path rules:

store paths like data/raw/06658/prospectus.pdf;
do not store absolute paths;
do not store paths with a leading ./;
use POSIX / separators;
store file hashes for archived source documents when practical.

SQLite is the embedded source of structured facts. CSV snapshots provide readable Git diffs. Markdown memos preserve the reasoning at each decision point.

PDF Text Extraction

Archived PDFs can be converted into searchable text files:

python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python scripts/extract_pdf_text.py

The extractor reads PDF paths from data/hk_ipo.sqlite, writes derived text files under data/extracted_text/, and exports data/snapshots/extracted_text_manifest.csv with page counts, text hashes, and extraction status.

Recent IPO Target Refresh

Use HKEXnews annual new listing reports to seed recent subscription-relevant IPO targets:

.venv/bin/python scripts/update_recent_ipo_list.py --start-date 2023-06-15 --end-date 2026-06-15 --as-of 2026-06-15T07:30:00Z

The updater archives the HKEXnews XLSX reports under data/raw/hkex_new_listing_reports/, records report-backed source references, writes new_listing_report_entries, updates ipo_master and missing offering_terms fields, exports CSV snapshots, and refreshes sync_tasks.

Rows without an IPO offer price, such as transfers of listing, introductions, or de-SPAC transactions, are skipped by default because they are not ordinary public subscription targets.

Incremental Archive Sync

The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:

python3 scripts/update_sync_state.py

This writes ticker_sync_state and sync_tasks into data/hk_ipo.sqlite, then exports data/snapshots/ticker_sync_state.csv, data/snapshots/sync_tasks.csv, and data/snapshots/sync_runs.csv.

Use sync_tasks as the next-sync queue. Tasks marked open are due now; tasks marked waiting_until_due are known future updates.

Git Discipline

The repository uses automatic focused commits for completed project changes.

Before committing, check that unrelated dirty files are not included and that generated durable files use repo-relative paths.

4.4 KiB Raw Blame History