Request:
- Adjust archivist after the audit findings and update historical data.
Changes:
- Teach the archivist skill to close audit-discovered gaps in priority order.
- Add scripts/archive_price_performance.py for due D1/D5/D20/D60 price-performance backfills.
- Document the price-performance backfill command in README.
- Archive raw Yahoo Finance chart responses under repo-relative data/raw/{ticker}/ paths.
- Populate price_performance with D1/D5/D20/D60 checkpoints and refresh source_refs, sync_runs, sync_tasks, and ticker_sync_state snapshots.
Execution:
- Ran .venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z.
- Selected 291 due price-performance tickers.
- Archived 273 price-history sources and wrote 1063 price-performance rows.
- Re-ran .venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T10:05:00Z for the remaining open T0/T1 tasks; no additional completed T0/T1 stages resulted.
Verification:
- Compiled the new price-performance script.
- Ran git diff --check.
- Checked SQLite integrity and foreign keys.
- Confirmed database row counts match CSV snapshots.
- Verified all 979 source_refs use valid repo-relative paths, have files, have hashes, and SHA256 hashes match.
- Confirmed no generated Python caches or SQLite transient files remain.
Next useful context:
- price_performance now has 1063 rows: D1 273, D5 272, D20 267, D60 251.
- Remaining due price-performance gaps are 18 tickers where Yahoo history was unavailable or the request failed.
- T0/T1 gaps remain at T0 93 and T1 77; T2 grey-market remains unresolved pending a reproducible source strategy.
5.3 KiB
HK IPO
HK IPO is a project for building a repeatable, auditable research workflow for Hong Kong new listing subscription decisions.
The project is designed around a feedback loop:
- Archive IPO facts and source documents.
- Freeze the analysis that was possible at each decision stage.
- Compare predictions with post-listing outcomes.
- Improve the scoring rules only from reviewed evidence.
Goals
- Maintain a local, Git-tracked history of Hong Kong IPO data.
- Separate factual archiving from investment judgment.
- Keep every subscription decision tied to the information available at that time.
- Review actual IPO outcomes against prior predictions.
- Build a better IPO scoring process through structured error attribution.
Workflow
Each IPO is evaluated by stage:
T0_prospectus: prospectus and offer terms only.T1_allotment: allotment results, public subscription, placing, allocation, and final pricing.T2_grey_market: grey-market result and immediate pre-listing context.D1,D5,D20,D60: post-listing review checkpoints.
The key discipline is to avoid hindsight leakage. A T0 prediction should only use T0 information, even after the IPO has listed.
Project Skills
This repository includes project-local Codex skills under .codex/skills/.
archivist
Owns facts and source control:
- archive prospectuses, allotment results, listing facts, and market data;
- record source URLs, as-of timestamps, repo-relative paths, and file hashes;
- update the embedded SQLite database;
- export Git-friendly CSV snapshots.
It does not make investment recommendations.
analyst
Owns IPO judgment and review:
- produce T0/T1/T2 prediction cards;
- score IPO candidates;
- compare multiple IPOs;
- write research memos and review cards;
- classify forecast errors;
- recommend scoring-rule updates.
It should use archived facts when available and keep prediction cards append-only.
Storage Model
The project is intended to be self-contained and portable across machines. Durable paths should always be relative to the repository root.
Expected layout:
data/
hk_ipo.sqlite
raw/
snapshots/
memos/
reports/
rules/
schema/
scripts/
references/
Path rules:
- store paths like
data/raw/06658/prospectus.pdf; - do not store absolute paths;
- do not store paths with a leading
./; - use POSIX
/separators; - store file hashes for archived source documents when practical.
SQLite is the embedded source of structured facts. CSV snapshots provide readable Git diffs. Markdown memos preserve the reasoning at each decision point.
PDF Text Extraction
Archived PDFs can be converted into searchable text files:
python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python scripts/extract_pdf_text.py
The extractor reads PDF paths from data/hk_ipo.sqlite, writes derived text files under data/extracted_text/, and exports data/snapshots/extracted_text_manifest.csv with page counts, text hashes, and extraction status.
Recent IPO Target Refresh
Use HKEXnews annual new listing reports to seed recent subscription-relevant IPO targets:
.venv/bin/python scripts/update_recent_ipo_list.py --start-date 2023-06-15 --end-date 2026-06-15 --as-of 2026-06-15T07:30:00Z
The updater archives the HKEXnews XLSX reports under data/raw/hkex_new_listing_reports/, records report-backed source references, writes new_listing_report_entries, updates ipo_master and missing offering_terms fields, exports CSV snapshots, and refreshes sync_tasks.
Rows without an IPO offer price, such as transfers of listing, introductions, or de-SPAC transactions, are skipped by default because they are not ordinary public subscription targets.
HKEX Document Backfill
Use the HKEX document archiver to fill detailed T0/T1 facts for open sync tasks:
.venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T08:30:00Z
The archiver maps stock codes to HKEXnews title-search stock IDs, downloads the selected prospectus and allotment-results PDFs under data/raw/{ticker}/, records source_refs, parses high-confidence T0/T1 fields into ipo_master, offering_terms, and ipo_demand, exports snapshots, and refreshes sync_tasks.
Price Performance Backfill
Use the price-performance archiver to fill due D1/D5/D20/D60 review checkpoints:
.venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z
The archiver stores raw Yahoo Finance chart responses under data/raw/{ticker}/, records source references and hashes, writes structured rows into price_performance, exports snapshots, and refreshes sync_tasks.
Incremental Archive Sync
The archivist keeps a per-ticker sync ledger so repeated updates can focus on missing stages:
python3 scripts/update_sync_state.py
This writes ticker_sync_state and sync_tasks into data/hk_ipo.sqlite, then exports data/snapshots/ticker_sync_state.csv, data/snapshots/sync_tasks.csv, and data/snapshots/sync_runs.csv.
Use sync_tasks as the next-sync queue. Tasks marked open are due now; tasks marked waiting_until_due are known future updates.
Git Discipline
The repository uses automatic focused commits for completed project changes.
Before committing, check that unrelated dirty files are not included and that generated durable files use repo-relative paths.