Private
Public Access
0
0
Files
hk-ipo/.codex/skills/archivist/SKILL.md
T
geometrybase c65b20a1c4 Archive recent HKEX IPO targets
Request:
Use the project archivist workflow to update IPO target coverage for the most recent three-year window.

Changes:
- Add scripts/update_recent_ipo_list.py to discover HKEXnews annual new listing reports, archive XLSX sources, parse subscription-relevant IPO rows, and update SQLite plus snapshots.
- Add new_listing_report_entries to preserve annual report row-level evidence.
- Archive 2023-2026 Main Board new listing reports and 2024-2026 GEM new listing reports.
- Seed 290 report-backed IPO targets for 2023-06-15 through 2026-06-15, skipping 10 non-IPO rows without numeric offer prices.
- Refresh ipo_master, missing offering_terms fields, source_refs, ticker_sync_state, and sync_tasks.
- Add openpyxl as the XLSX parser dependency and document the archivist refresh flow.
- Limit sync summary output while keeping the full queue in SQLite and CSV snapshots.

Verification:
- Ran update_recent_ipo_list.py for 2023-06-15 to 2026-06-15 with as-of 2026-06-15T07:30:00Z.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- ipo_master now has 293 tickers; new_listing_report_entries has 290 report-backed targets.
- Current sync queue has 2005 open tasks and 42 waiting_until_due tasks for deeper per-ticker archival stages.
2026-06-15 06:42:31 +00:00

4.9 KiB


name: archivist description: Use for Hong Kong IPO fact archiving in this project: downloading or recording prospectuses, allotment results, listing facts, market data, source references, file hashes, SQLite updates, and CSV snapshots. Do not use for investment conclusions, subscription recommendations, score interpretation, or research memos.

HK IPO Archivist

Purpose

Maintain the project-local Hong Kong IPO evidence archive and structured fact database. This skill owns facts, sources, database updates, path hygiene, and reproducible snapshots.

It does not decide whether an IPO is worth subscribing for. Route judgment, scoring, prediction cards, review cards, and reports to analyst.

Project Storage Contract

Use repo-relative paths everywhere. Never store machine-specific absolute paths.

  • Resolve the repo root at runtime, for example with git rev-parse --show-toplevel.
  • Store paths without a leading ./.
  • Store paths with POSIX separators, such as data/raw/06658/prospectus.pdf.
  • Store path_base = "repo_root" when a table needs an explicit base.
  • Store file_sha256 for archived source files whenever practical.

Expected project layout:

data/
  hk_ipo.sqlite
  raw/
  snapshots/
memos/
reports/
rules/
schema/
scripts/
references/

Responsibilities

  • Archive primary source files under data/raw/{ticker}/.
  • Record source references, URLs, as-of timestamps, relative paths, and hashes.
  • Update embedded SQLite tables for IPO facts.
  • Export Git-friendly CSV snapshots after database updates.
  • Maintain sync_runs, ticker_sync_state, and sync_tasks so repeated syncs know what is already archived and what remains pending.
  • Use HKEXnews annual new listing reports to seed broad recent-IPO target coverage before collecting deeper per-ticker documents.
  • Preserve raw source files; do not overwrite without first checking whether the contents changed.
  • Label missing, stale, inconsistent, or estimated fields explicitly.

Boundaries

Do not write:

  • Subscription decisions.
  • Investment ratings.
  • Scoring interpretations.
  • Prediction cards.
  • Review conclusions.
  • Rule-change recommendations.

If a user asks for both data update and analysis, complete the archive/update step first, then hand the frozen as-of dataset to analyst.

Workflow

  1. Inspect current repo state and recent commits before changing files.
  2. Identify the IPO ticker, company, stage, and source documents needed.
  3. Save raw source files under data/raw/{ticker}/ using descriptive names.
  4. Compute hashes for archived files.
  5. Insert or update structured facts in data/hk_ipo.sqlite.
  6. Record every source in the source reference table using repo-relative paths.
  7. Refresh sync state with scripts/update_sync_state.py after fact updates.
  8. Export key tables to data/snapshots/ for readable Git diffs.
  9. Verify path rules, required fields, hash checks, sync state, and snapshot generation.
  10. Commit only the related archive/database/snapshot changes.

Incremental Sync State

Use ticker_sync_state as the per-ticker stage ledger and sync_tasks as the next-sync queue.

Stages:

  • T0_prospectus
  • T1_allotment
  • T2_grey_market
  • D1
  • D5
  • D20
  • D60

Status values:

  • complete: required facts or source files are archived.
  • pending_not_due: the stage is expected in the future.
  • pending_due: the stage is due and should be updated on the next sync.
  • blocked: the missing data has no known resolution date or needs manual intervention.
  • not_applicable: the stage does not apply.

Default incremental flow:

python3 scripts/update_sync_state.py

Then update only rows in sync_tasks whose task_status is open or blocked. Do not re-download existing source files unless the upstream source changed or the stored hash no longer matches.

Recent IPO Target Coverage

Use the recent IPO updater when the user asks to update a broad date range of HK IPO targets:

.venv/bin/python scripts/update_recent_ipo_list.py --start-date YYYY-MM-DD --end-date YYYY-MM-DD --as-of YYYY-MM-DDTHH:MM:SSZ

The script discovers HKEXnews annual new listing report XLSX files, archives them under data/raw/hkex_new_listing_reports/, inserts new_listing_report_entries, updates ipo_master and missing offering_terms fields, records report-backed source_refs, exports snapshots, and refreshes sync state.

By default, exclude report rows without a numeric IPO offer price because transfers, introductions, and de-SPAC transactions are not ordinary public subscription targets.

Quality Checks

Before finishing, confirm:

  • No stored local path is absolute.
  • No stored local path starts with ./.
  • Raw files referenced by the database exist.
  • Source hashes match current file contents.
  • CSV snapshots reflect the database update.
  • sync_tasks reflects only missing or future work, not completed stages.
  • Any unavailable field is marked as a data gap rather than invented.