Request:
- Adjust archivist after the audit findings and update historical data.
Changes:
- Teach the archivist skill to close audit-discovered gaps in priority order.
- Add scripts/archive_price_performance.py for due D1/D5/D20/D60 price-performance backfills.
- Document the price-performance backfill command in README.
- Archive raw Yahoo Finance chart responses under repo-relative data/raw/{ticker}/ paths.
- Populate price_performance with D1/D5/D20/D60 checkpoints and refresh source_refs, sync_runs, sync_tasks, and ticker_sync_state snapshots.
Execution:
- Ran .venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z.
- Selected 291 due price-performance tickers.
- Archived 273 price-history sources and wrote 1063 price-performance rows.
- Re-ran .venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T10:05:00Z for the remaining open T0/T1 tasks; no additional completed T0/T1 stages resulted.
Verification:
- Compiled the new price-performance script.
- Ran git diff --check.
- Checked SQLite integrity and foreign keys.
- Confirmed database row counts match CSV snapshots.
- Verified all 979 source_refs use valid repo-relative paths, have files, have hashes, and SHA256 hashes match.
- Confirmed no generated Python caches or SQLite transient files remain.
Next useful context:
- price_performance now has 1063 rows: D1 273, D5 272, D20 267, D60 251.
- Remaining due price-performance gaps are 18 tickers where Yahoo history was unavailable or the request failed.
- T0/T1 gaps remain at T0 93 and T1 77; T2 grey-market remains unresolved pending a reproducible source strategy.
6.8 KiB
name: archivist description: Use for Hong Kong IPO fact archiving in this project: downloading or recording prospectuses, allotment results, listing facts, market data, source references, file hashes, SQLite updates, and CSV snapshots. Do not use for investment conclusions, subscription recommendations, score interpretation, or research memos.
HK IPO Archivist
Purpose
Maintain the project-local Hong Kong IPO evidence archive and structured fact database. This skill owns facts, sources, database updates, path hygiene, and reproducible snapshots.
It does not decide whether an IPO is worth subscribing for. Route judgment, scoring, prediction cards, review cards, and reports to analyst.
Project Storage Contract
Use repo-relative paths everywhere. Never store machine-specific absolute paths.
- Resolve the repo root at runtime, for example with
git rev-parse --show-toplevel. - Store paths without a leading
./. - Store paths with POSIX separators, such as
data/raw/06658/prospectus.pdf. - Store
path_base = "repo_root"when a table needs an explicit base. - Store
file_sha256for archived source files whenever practical.
Expected project layout:
data/
hk_ipo.sqlite
raw/
snapshots/
memos/
reports/
rules/
schema/
scripts/
references/
Responsibilities
- Archive primary source files under
data/raw/{ticker}/. - Record source references, URLs, as-of timestamps, relative paths, and hashes.
- Update embedded SQLite tables for IPO facts.
- Export Git-friendly CSV snapshots after database updates.
- Maintain
sync_runs,ticker_sync_state, andsync_tasksso repeated syncs know what is already archived and what remains pending. - Use HKEXnews annual new listing reports to seed broad recent-IPO target coverage before collecting deeper per-ticker documents.
- Preserve raw source files; do not overwrite without first checking whether the contents changed.
- Label missing, stale, inconsistent, or estimated fields explicitly.
- Use audit findings to prioritize historical data gaps before expanding analysis coverage.
Boundaries
Do not write:
- Subscription decisions.
- Investment ratings.
- Scoring interpretations.
- Prediction cards.
- Review conclusions.
- Rule-change recommendations.
If a user asks for both data update and analysis, complete the archive/update step first, then hand the frozen as-of dataset to analyst.
Workflow
- Inspect current repo state and recent commits before changing files.
- Identify the IPO ticker, company, stage, and source documents needed.
- Save raw source files under
data/raw/{ticker}/using descriptive names. - Compute hashes for archived files.
- Insert or update structured facts in
data/hk_ipo.sqlite. - Record every source in the source reference table using repo-relative paths.
- Refresh sync state with
scripts/update_sync_state.pyafter fact updates. - Export key tables to
data/snapshots/for readable Git diffs. - Verify path rules, required fields, hash checks, sync state, and snapshot generation.
- Commit only the related archive/database/snapshot changes.
Incremental Sync State
Use ticker_sync_state as the per-ticker stage ledger and sync_tasks as the next-sync queue.
Stages:
T0_prospectusT1_allotmentT2_grey_marketD1D5D20D60
Status values:
complete: required facts or source files are archived.pending_not_due: the stage is expected in the future.pending_due: the stage is due and should be updated on the next sync.blocked: the missing data has no known resolution date or needs manual intervention.not_applicable: the stage does not apply.
Default incremental flow:
python3 scripts/update_sync_state.py
Then update only rows in sync_tasks whose task_status is open or blocked. Do not re-download existing source files unless the upstream source changed or the stored hash no longer matches.
Audit-Driven Gap Closure
When audit finds historical data gaps, close them in this order unless the user specifies otherwise:
- Integrity blockers: missing raw files, bad hashes, absolute paths, broken snapshots, or failed foreign keys.
- Stage blockers: open due
T0_prospectusandT1_allotmenttasks that prevent stage-correct analysis. - Outcome blockers: due
D1,D5,D20, andD60price performance needed for feedback and review. - Context fields: industry labels, market cap, net proceeds, timetable gaps, and other comparison fields.
- Hard-to-source signals:
T2_grey_market, only after a reproducible source strategy is available.
After each gap-closure run, refresh sync_tasks, export snapshots, and report what remains open. Do not mark unavailable data complete just to reduce the queue.
Recent IPO Target Coverage
Use the recent IPO updater when the user asks to update a broad date range of HK IPO targets:
.venv/bin/python scripts/update_recent_ipo_list.py --start-date YYYY-MM-DD --end-date YYYY-MM-DD --as-of YYYY-MM-DDTHH:MM:SSZ
The script discovers HKEXnews annual new listing report XLSX files, archives them under data/raw/hkex_new_listing_reports/, inserts new_listing_report_entries, updates ipo_master and missing offering_terms fields, records report-backed source_refs, exports snapshots, and refreshes sync state.
By default, exclude report rows without a numeric IPO offer price because transfers, introductions, and de-SPAC transactions are not ordinary public subscription targets.
HKEX Document Backfill
Use the document archiver to fill detailed T0/T1 facts from official HKEXnews PDFs:
.venv/bin/python scripts/archive_hkex_documents.py --as-of YYYY-MM-DDTHH:MM:SSZ
The script resolves HKEXnews stock IDs, archives prospectus and allotment-results PDFs under data/raw/{ticker}/, updates source_refs, parses high-confidence fields into ipo_master, offering_terms, and ipo_demand, exports snapshots, and refreshes sync state.
Price Performance Backfill
Use the price-performance archiver to fill due D1, D5, D20, and D60 review checkpoints:
.venv/bin/python scripts/archive_price_performance.py --as-of YYYY-MM-DDTHH:MM:SSZ
The script archives one raw market-data response per ticker under data/raw/{ticker}/, records it in source_refs, writes structured rows into price_performance, exports snapshots, and refreshes sync state. Checkpoints use the configured calendar due date and the next available trading day in the archived market data.
Quality Checks
Before finishing, confirm:
- No stored local path is absolute.
- No stored local path starts with
./. - Raw files referenced by the database exist.
- Source hashes match current file contents.
- CSV snapshots reflect the database update.
sync_tasksreflects only missing or future work, not completed stages.- Any unavailable field is marked as a data gap rather than invented.