Private
Public Access
0
0

rename .codex to .agents

This commit is contained in:
2026-06-22 15:20:37 +00:00
parent 05adcb1ec3
commit 8f0d8d5013
5 changed files with 0 additions and 0 deletions
+198
View File
@@ -0,0 +1,198 @@
---
name: hk-ipo-analyst
description: Use for Hong Kong IPO subscription analysis in this project: T0/T1/T2 prediction cards, scoring, cross-IPO comparison, research reports, post-listing reviews, error attribution, and rule-update recommendations. Use archived facts when available and keep predictions append-only.
---
# HK IPO Analyst
## Purpose
Assess Hong Kong IPO subscription candidates using the project's archived facts, scoring rules, prediction cards, and review history. This skill owns judgment: whether to subscribe, wait, avoid, sell in grey market or on D1, or revise a rule after outcomes arrive.
Use `hk-ipo-archivist` first when source documents, listing facts, allotment results, prices, or database snapshots need to be updated.
## Core Discipline
Separate the decision stage from later facts:
- `T0_prospectus`: prospectus and offer terms only.
- `T0_5_market_heat`: subscription-period non-official market heat, such as broker-aggregated margin subscription multiples, observed before official allotment results.
- `T0_95_final_heat`: near-deadline heat observed while the user can still place, amend, or cancel an IPO order; this is an actionable late-order stage, not an allotment-results stage.
- `T1_allotment`: allotment results, public subscription, international placing, allocation, and final pricing.
- `T2_grey_market`: grey-market result and immediate pre-listing trading context.
- `D1`, `D5`, `D20`, `D60`: post-listing review checkpoints.
Do not let later facts leak into earlier prediction cards. When reviewing an older call, compare the frozen prediction against the actual outcome instead of rewriting the original judgment.
## Trading Horizon
The analyst model is a short-exit IPO subscription model, not a long-term holding model.
- The intended exit is `T2_grey_market` when a reliable grey-market signal and executable price are available, or `D1` otherwise.
- The default assumption is to sell allocated shares by D1 unless a later rule explicitly creates a documented exception.
- D5/D20/D60 are review labels for learning, not holding targets and not inputs for subscription decisions.
- Reports should frame expected return, triggers, and exit discipline around T2/D1 realization rather than long-term fundamentals.
- Recommendations should avoid long-hold language unless the user explicitly asks for a separate long-term investment thesis.
## Project Storage Contract
Use repo-relative paths everywhere:
- Memos: `memos/{ticker}_{stage}_{date}.md`
- Reports: `reports/{date}_{topic}.md` or another repo-relative report path requested by the user.
- Rules: `rules/ipo_score_v*.yaml` and `rules/rule_change_log.md`
- Source references: cite archived files using paths such as `data/raw/06658/prospectus.pdf`
Never store or cite machine-specific absolute paths in durable project files.
## Responsibilities
- Produce IPO subscription analysis and cross-candidate rankings.
- Write append-only T0/T1/T2 prediction cards.
- Include probability forecasts, score breakdowns, key reasons, risks, triggers, and exit discipline.
- Review actual outcomes against prior predictions.
- Attribute errors using stable tags such as `fundamental_miss`, `valuation_miss`, `heat_miss`, `structure_miss`, `market_window_miss`, `execution_miss`, and `data_gap`.
- Recommend scoring-rule changes only after evidence supports them.
## Boundaries
Do not silently mutate archived source facts. If facts are missing or stale, call out the data gap and use `hk-ipo-archivist` to update the archive before relying on them.
Do not overwrite prediction cards. If a view changes, write a new stage card or review card that references the earlier prediction.
## Workflow
1. Inspect current repo state and recent commits before changing files.
2. Determine the requested stage: T0, T1, T2, or post-listing review.
3. Load available archived facts and rules from repo-relative project files.
4. If facts are missing or stale, update the archive through `hk-ipo-archivist` or state the gap clearly.
5. Score the IPO using the current rule version.
6. Record probability forecasts rather than only directional language.
7. Write a memo/report with data-as-of time, rule version, sources, score, decision, and triggers.
8. For reviews, compare the frozen prediction to actual outcomes and classify the error type.
9. Commit only the related memo/report/rule changes after verification.
## Broad Candidate Report Layout
For broad/latest candidate reports whose purpose is deciding what can be subscribed now, order the body for action first:
1. Scoring/ranking table for currently actionable IPOs.
2. Fundamentals cross-check for the current batch.
3. Break-probability, risk/reward, and capital-efficiency framework.
4. Per-IPO notes, execution guidance, and closed/waiting names.
5. Recent 30-day listed-IPO review as post-hoc calibration.
6. Data-refresh guardrails and sources.
Do not lead broad candidate reports with post-listing reviews; keep those after the live decision layers so the reader sees the actionable ranking first.
## Single-Ticker Markdown Report
When the user gives a single IPO ticker and asks for an analyst report, use the report generator after archived facts and the analysis dataset are current:
```bash
.venv/bin/python scripts/build_analysis_dataset.py --as-of YYYY-MM-DDTHH:MM:SSZ
.venv/bin/python scripts/generate_ipo_report.py 06658 --stage auto
```
The generator writes `reports/{date}_{ticker}_{stage}_analysis.md` by default. Use `--stdout` for a dry run, `--stage T0_prospectus` to force a prospectus-stage report, or `--stage T1_allotment` only when structured T1 demand exists.
If the ticker is absent from `data/snapshots/analysis_model_v0_dataset.csv`, use `hk-ipo-archivist` first to archive the IPO facts and rebuild the analysis dataset before generating the report.
Generated prediction reports must remain stage-safe:
- Analyst Markdown reports should be written in Simplified Chinese by default, while preserving ticker symbols, stage codes, rule ids, source paths, and stable error tags in their original machine-readable form.
- T0 reports use only prospectus-stage fields and T0 calibration.
- T0.5 reports may add archived `ipo_market_heat` rows, but must label them as non-official, live market-heat snapshots and include `observed_at`, provider, and source path.
- T1 reports may add allotment demand fields and T1 calibration.
- T2/D1 is the intended sell window; D5/D20/D60 returns are never shown as prediction inputs and are reserved for later review cards.
- Every report must include a concrete stage calendar for the ticker: T0 subscription window, T1 allotment-result date, T2 grey-market date/window, and D1 listing date.
## T0.5 Market Heat Overlay
Use `rules/ipo_score_v0_5_market_heat_trial.yaml` when the user asks to include subscription-period heat before official allotment results.
T0.5 discipline:
- Use `hk-ipo-archivist` first to archive the raw source page and structured `ipo_market_heat` rows.
- Keep T0.5 separate from official T1 demand. Do not copy T0.5 margin multiples into `ipo_demand.public_oversubscription_times`.
- Keep third-party final history, such as `external_ipo_history.public_oversubscription_times`, separate from T0.5. It is useful for post-hoc calibration but is not available at the original T0.5 decision time.
- Treat raw margin multiples as less reliable when IPOs are at different points in their subscription windows.
- Freeze the `observed_at` timestamp in the report so later T1/D1 reviews can test whether the heat signal helped.
- Write T0.5 conclusions as watchlist upgrades/downgrades, not as final high-conviction subscription calls.
## T0.95 Late-Order Heat
Use `rules/ipo_score_v0_95_final_heat_trial.yaml` when the user can still place an IPO order near the subscription cutoff and asks to use near-final market heat.
T0.95 discipline:
- Treat `T0_95_final_heat` as its own decision stage: later than ordinary T0.5, earlier than official T1 allotment results.
- The key condition is executability. A T0.95 snapshot is valid only if `observed_at` is before the user's actual order, amend, or cancel cutoff.
- Use archived `ipo_market_heat` rows with `stage = 'T0_95_final_heat'` when available. If only `T0_5_market_heat` rows exist, explicitly state that the report is using an earlier heat snapshot, not a true T0.95 snapshot.
- Historical final public oversubscription from `external_ipo_history` may be used as a calibration proxy for near-final heat buckets, because T0.95 is close to the final demand state. It must still be labelled as post-hoc calibration, not as data that was visible in the live case.
- Official allotment-result fields, final T1 public oversubscription, grey-market prices, and D1 returns remain forbidden as live T0.95 inputs unless they were actually available before the user's executable order cutoff, which should be rare and must be documented.
- T0.95 recommendations may be stronger than T0.5 watchlist language, but they must include expected allocation probability: very strong heat often improves D1 payoff odds while sharply reducing one-lot win rate.
## Recent Listing Review Overlay
When producing a broad candidate report, latest IPO list refresh, or cross-IPO subscription ranking, include a recent listed-IPO review unless the user explicitly asks for a narrow single-name answer.
Review discipline:
- Use the last 30 calendar days ending at `data_as_of` by default, or the user-specified window when provided.
- Define the sample explicitly by listing-date range and listing method.
- Build a compact table with one row per recent IPO covering structure, fundamentals, T1 allotment demand, D1 performance, and the PM lesson.
- Structure should include at least T0 score and the offer-size/minimum-subscription context when available.
- Fundamentals should be a short issuer-quality read from archived prospectus facts, not a long-term thesis.
- T1 performance should include public oversubscription, international placing demand, total score or decision band, and one-lot/application success when available.
- D1 performance should include D1 return and turnover when available. If D1 data is missing, label it as a `data_gap` and do not infer break/non-break from blank fields.
- Keep the review post-hoc: use it to calibrate live rules and base rates, but do not let D1/D5/D20/D60 facts leak into current unlisted candidate scores.
- Add a short mapping from the recent outcomes back to the current candidate batch, clearly distinguishing official T1 demand from non-official T0.5/T0.95 heat.
## Output Standards
Every prediction card should include:
- `ticker`
- `stage`
- `data_as_of`
- concrete T0/T0.95/T1/T2/D1 dates or windows for the ticker when applicable
- `rule_version`
- `decision`
- `total_score`
- score breakdown
- probability forecast
- expected return framing
- key bull points
- key risks
- triggers for upgrade/downgrade
- exit plan
- explicit T2/D1 sell discipline
- source paths
Language standard:
- Write analyst reports, prediction cards, and review cards in Simplified Chinese by default.
- Keep field identifiers, model versions, score buckets, ticker symbols, and source paths as code-formatted English identifiers when they are part of the project data contract.
Every review card should include:
- linked prediction card
- actual IPO outcome
- direction correctness
- magnitude error
- reason correctness
- execution assessment
- error tags
- rule-change recommendation, if any
## Quality Checks
Before finishing, confirm:
- The analysis stage matches the information set used.
- Later facts are not used in earlier-stage conclusions.
- Paths in durable files are repo-relative.
- Probabilities and scores are explicit.
- Facts, assumptions, estimates, inferences, and PM judgment are separated.
- Any rule update has a named trigger case and an effective date.
@@ -0,0 +1,7 @@
interface:
display_name: "hk-ipo-analyst"
short_description: "Score HK IPOs and review prediction quality"
default_prompt: "Use $hk-ipo-analyst to evaluate a Hong Kong IPO subscription candidate from archived project data."
policy:
allow_implicit_invocation: true
+238
View File
@@ -0,0 +1,238 @@
---
name: hk-ipo-archivist
description: Use for Hong Kong IPO fact archiving in this project: downloading or recording prospectuses, allotment results, listing facts, market data, source references, file hashes, SQLite updates, and CSV snapshots. Do not use for investment conclusions, subscription recommendations, score interpretation, or research memos.
---
# HK IPO Archivist
## Purpose
Maintain the project-local Hong Kong IPO evidence archive and structured fact database. This skill owns facts, sources, database updates, path hygiene, and reproducible snapshots.
It does not decide whether an IPO is worth subscribing for. Route judgment, scoring, prediction cards, review cards, and reports to `hk-ipo-analyst`.
## Project Storage Contract
Use repo-relative paths everywhere. Never store machine-specific absolute paths.
- Resolve the repo root at runtime, for example with `git rev-parse --show-toplevel`.
- Store paths without a leading `./`.
- Store paths with POSIX separators, such as `data/raw/06658/prospectus.pdf`.
- Store `path_base = "repo_root"` when a table needs an explicit base.
- Store `file_sha256` for archived source files whenever practical.
Expected project layout:
```text
data/
hk_ipo.sqlite
raw/
snapshots/
memos/
reports/
rules/
schema/
scripts/
references/
```
## Responsibilities
- Archive primary source files under `data/raw/{ticker}/`.
- Record source references, URLs, as-of timestamps, relative paths, and hashes.
- Update embedded SQLite tables for IPO facts.
- Export Git-friendly CSV snapshots after database updates.
- Maintain `sync_runs`, `ticker_sync_state`, and `sync_tasks` so repeated syncs know what is already archived and what remains pending.
- Use HKEXnews annual new listing reports to seed broad recent-IPO target coverage before collecting deeper per-ticker documents.
- Preserve raw source files; do not overwrite without first checking whether the contents changed.
- Label missing, stale, inconsistent, or estimated fields explicitly.
- Use audit findings to prioritize historical data gaps before expanding analysis coverage.
## Boundaries
Do not write:
- Subscription decisions.
- Investment ratings.
- Scoring interpretations.
- Prediction cards.
- Review conclusions.
- Rule-change recommendations.
If a user asks for both data update and analysis, complete the archive/update step first, then hand the frozen as-of dataset to `hk-ipo-analyst`.
## Workflow
1. Inspect current repo state and recent commits before changing files.
2. Identify the IPO ticker, company, stage, and source documents needed.
3. Save raw source files under `data/raw/{ticker}/` using descriptive names.
4. Compute hashes for archived files.
5. Insert or update structured facts in `data/hk_ipo.sqlite`.
6. Record every source in the source reference table using repo-relative paths.
7. Extract text for archived PDF sources with `scripts/extract_pdf_text.py`.
8. Refresh sync state with `scripts/update_sync_state.py` after fact updates.
9. Export key tables to `data/snapshots/` for readable Git diffs.
10. Verify path rules, required fields, hash checks, extracted text manifest, sync state, and snapshot generation.
11. Commit only the related archive/database/snapshot changes.
## Incremental Sync State
Use `ticker_sync_state` as the per-ticker stage ledger and `sync_tasks` as the next-sync queue.
Stages:
- `T0_prospectus`
- `T0_5_market_heat`
- `T1_allotment`
- `T2_grey_market`
- `D1`
- `D5`
- `D20`
- `D60`
Status values:
- `complete`: required facts or source files are archived.
- `pending_not_due`: the stage is expected in the future.
- `pending_due`: the stage is due and should be updated on the next sync.
- `blocked`: the missing data has no known resolution date or needs manual intervention.
- `not_applicable`: the stage does not apply.
Default incremental flow:
```bash
python3 scripts/update_sync_state.py
```
Then update only rows in `sync_tasks` whose `task_status` is `open` or `blocked`. Do not re-download existing source files unless the upstream source changed or the stored hash no longer matches.
## Audit-Driven Gap Closure
When `hk-ipo-audit` finds historical data gaps, close them in this order unless the user specifies otherwise:
1. Integrity blockers: missing raw files, bad hashes, absolute paths, broken snapshots, or failed foreign keys.
2. Stage blockers: open due `T0_prospectus` and `T1_allotment` tasks that prevent stage-correct analysis.
3. Outcome blockers: due `D1`, `D5`, `D20`, and `D60` price performance needed for feedback and review.
4. Context fields: industry labels, market cap, net proceeds, timetable gaps, and other comparison fields.
5. Hard-to-source signals: `T2_grey_market`, only after a reproducible source strategy is available.
After each gap-closure run, refresh `sync_tasks`, export snapshots, and report what remains open. Do not mark unavailable data complete just to reduce the queue.
## Recent IPO Target Coverage
Use the recent IPO updater when the user asks to update a broad date range of HK IPO targets:
```bash
.venv/bin/python scripts/update_recent_ipo_list.py --start-date YYYY-MM-DD --end-date YYYY-MM-DD --as-of YYYY-MM-DDTHH:MM:SSZ
```
The script discovers HKEXnews annual new listing report XLSX files, archives them under `data/raw/hkex_new_listing_reports/`, inserts `new_listing_report_entries`, updates `ipo_master` and missing `offering_terms` fields, records report-backed `source_refs`, exports snapshots, and refreshes sync state.
By default, exclude report rows without a numeric IPO offer price because transfers, introductions, and de-SPAC transactions are not ordinary public subscription targets.
## HKEX Document Backfill
Use the document archiver to fill detailed T0/T1 facts from official HKEXnews documents:
```bash
.venv/bin/python scripts/archive_hkex_documents.py --as-of YYYY-MM-DDTHH:MM:SSZ
```
The script resolves HKEXnews stock IDs, archives prospectus and allotment-results documents under `data/raw/{ticker}/`, updates `source_refs`, parses high-confidence fields into `ipo_master`, `offering_terms`, and `ipo_demand`, exports snapshots, and refreshes sync state.
The document archiver should use HKEXnews date-window title search around the IPO timetable, not only the latest title-search page. IPO documents for active listed companies are often buried behind later post-listing announcements. Treat official HKEXnews `.pdf`, `.htm`, and `.html` allotment-result notices as valid archived sources; parse structured demand facts only where parser coverage is reliable.
PDF text extraction is a standard HKEX document post-processing step. `scripts/archive_hkex_documents.py` extracts text for newly archived PDFs by default after source references are written:
```bash
.venv/bin/python scripts/extract_pdf_text.py
```
The extractor is incremental: unchanged PDFs with matching manifest rows are skipped, and `data/snapshots/extracted_text_manifest.csv` is preserved and updated. Use `--force` only when parser behavior changes and derived text should be regenerated.
Do not expect `data/extracted_text/` entries for Yahoo JSON market data or HKEX `.htm`/`.html` notices. Those are already text-like raw evidence files and are tracked under `data/raw/`.
## T1 Demand Text Backfill
When audit finds T1 rows where an allotment-results source is archived but `ipo_demand` is missing, use the text backfill script:
```bash
.venv/bin/python scripts/backfill_t1_demand_from_text.py --as-of YYYY-MM-DDTHH:MM:SSZ
```
The script is incremental. It selects only `T1_allotment` rows that are complete from source evidence but have no `ipo_demand` row. It parses archived PDF extracted text, follows old HKEX HTML allotment-result pages to their linked Summary PDFs, archives those PDFs, extracts their text, writes `ipo_demand`, exports snapshots, and refreshes sync state only when facts or sources changed.
Do not infer missing demand fields. If a Summary PDF gives valid applications and public subscription but omits successful applicants or international subscription level, store the available fields and leave unavailable fields null.
## T0.5 Market Heat Archive
When the user asks to include subscription-period heat before official T1 allotment results, archive a reproducible market-heat snapshot:
```bash
.venv/bin/python scripts/archive_t0_5_market_heat.py --as-of YYYY-MM-DDTHH:MM:SSZ --tickers 01392,06067
```
When the user can still place, amend, or cancel an order near the subscription cutoff, archive the late actionable snapshot as `T0_95_final_heat`:
```bash
.venv/bin/python scripts/archive_t0_5_market_heat.py --stage T0_95_final_heat --as-of YYYY-MM-DDTHH:MM:SSZ --tickers 01392,06067
```
The script stores the raw page under `data/raw/market_heat/`, records per-ticker `source_refs`, writes structured rows to `ipo_market_heat`, exports `data/snapshots/ipo_market_heat.csv`, and refreshes sync state. The default `source_type` is `t0_5_market_heat`.
For T0.95 runs, `source_type` is `t0_95_final_heat` and `ipo_market_heat.stage` is `T0_95_final_heat`. Use this stage only when the snapshot was observed before the user's actual executable order cutoff; otherwise store it as ordinary `T0_5_market_heat` or post-hoc research evidence.
Market-heat data is non-official and live. It may include broker-aggregated margin subscription multiples or similar estimates. Never store it as `ipo_demand`, never treat it as final HKEX subscription data, and always preserve provider, source URL, raw path, `observed_at`, and the intended decision stage.
## External IPO History Archive
When a historical third-party table is useful for coverage checks or calibration research, archive it separately from official HKEX data:
```bash
.venv/bin/python scripts/archive_ipohk_history.py --as-of YYYY-MM-DDTHH:MM:SSZ
```
The ipohk archive stores raw JSON under `data/raw/external_history/`, writes structured rows to `external_ipo_history`, and exports `data/snapshots/external_ipo_history.csv`.
Treat this as external historical context. Fields such as final oversubscription, one-lot win rate, grey-market return, and first-day return are not T0.5 margin snapshots and must not be backfilled into `ipo_market_heat`.
## Grey-Market Source Policy
`T2_grey_market` is not an HKEX official disclosure stage. Grey-market trading is broker or third-party OTC activity, so do not bulk archive a grey-market feed unless the source is reproducible and redistribution-safe.
Accept a T2 source only when one of these conditions is met:
- A licensed vendor or broker export is provided for this project and may be stored in Git.
- A user-provided evidence file is added under `data/raw/{ticker}/` with clear source notes.
- A public historical source has stable ticker/date records and clear reuse terms.
Until one of those conditions is met, mark due T2 tasks as blocked data gaps instead of repeatedly leaving them as open sync failures:
```bash
.venv/bin/python scripts/mark_grey_market_gaps.py --as-of YYYY-MM-DDTHH:MM:SSZ
```
Do not mark T2 complete from screenshots, unsourced forum posts, or proprietary pages whose terms prohibit copying or redistribution.
## Price Performance Backfill
Use the price-performance archiver to fill due `D1`, `D5`, `D20`, and `D60` review checkpoints:
```bash
.venv/bin/python scripts/archive_price_performance.py --as-of YYYY-MM-DDTHH:MM:SSZ
```
The script archives one raw market-data response per ticker under `data/raw/{ticker}/`, records it in `source_refs`, writes structured rows into `price_performance`, exports snapshots, and refreshes sync state. Checkpoints use the configured calendar due date and the next available trading day in the archived market data.
## Quality Checks
Before finishing, confirm:
- No stored local path is absolute.
- No stored local path starts with `./`.
- Raw files referenced by the database exist.
- Source hashes match current file contents.
- Extracted text exists or has a manifest status for archived PDF source references.
- CSV snapshots reflect the database update.
- `sync_tasks` reflects only missing or future work, not completed stages.
- Any unavailable field is marked as a data gap rather than invented.
@@ -0,0 +1,7 @@
interface:
display_name: "hk-ipo-archivist"
short_description: "Archive HK IPO facts, sources, and snapshots"
default_prompt: "Use $hk-ipo-archivist to update the project IPO archive for a Hong Kong new listing."
policy:
allow_implicit_invocation: true
+132
View File
@@ -0,0 +1,132 @@
---
name: hk-ipo-audit
description: Use for independent audit of Hong Kong IPO archive quality and analysis quality in this project: confirm data completeness, data sufficiency, source integrity, stage-appropriate evidence, and self-consistency of IPO subscription analysis logic. Do not archive new facts or make investment recommendations; route fact updates to hk-ipo-archivist and investment conclusions to hk-ipo-analyst.
---
# HK IPO Audit
## Purpose
Audit the evidence base and reasoning quality before a Hong Kong IPO analysis is trusted, compared with outcomes, or used to refine rules.
This skill answers two questions:
- Is the data complete and sufficient for the requested stage and conclusion?
- Is the analysis logic self-consistent, stage-correct, and supported by the available evidence?
Use `hk-ipo-archivist` first when required facts or source files are missing. Use `hk-ipo-analyst` for subscription decisions, score interpretation, prediction cards, and rule changes.
## Core Principles
Separate three standards:
- `integrity`: source files, hashes, repo-relative paths, database rows, and snapshots are internally consistent.
- `completeness`: the expected facts and sources for the analysis stage are present or explicitly marked as gaps.
- `sufficiency`: the available facts are strong enough to support the claims, scores, probabilities, and decision.
Do not treat a filled field as sufficient evidence by itself. A conclusion is only audit-ready when the source, stage, assumption, and reasoning chain can be followed.
Treat derived evidence as first-class audit material, not optional convenience. If an archived raw source is expected to generate a reusable derived artifact, the audit must reconcile the raw source, derived artifact, manifest row, and hashes.
## Stage Data Checklist
Use the stage being audited to decide what must exist:
- `T0_prospectus`: prospectus source, offer terms, timetable, sponsor, industry, business model, financial profile, valuation basis, cornerstone/lock-up facts when applicable, and explicit data gaps.
- `T1_allotment`: allotment-results source, final price, public subscription level, international placing signal when available, allocation outcome, clawback/reallocation facts, and demand-quality interpretation inputs.
- `T2_grey_market`: grey-market source, price move, turnover/liquidity context, and whether the signal is usable or noisy.
- `D1`, `D5`, `D20`, `D60`: post-listing prices, benchmark/market-window context, realized return, drawdown, liquidity, and comparison to the frozen prediction.
For broad historical or cross-IPO work, also check that the sample definition, inclusion/exclusion rules, and date range are explicit.
## Derived Evidence Checklist
For broad historical audits and any analysis-readiness audit:
- Every archived PDF in `source_refs` must have one row in `data/snapshots/extracted_text_manifest.csv`.
- Every extracted-text manifest row must point back to an existing PDF `source_id`.
- `pdf_sha256` in the manifest must match `source_refs.file_sha256`.
- `text_local_path` must be repo-relative, must exist, and must match `text_sha256`.
- Manifest extraction status must be reviewed. `error`, missing text, missing manifest rows, orphan manifest rows, hash mismatches, or non-repo-relative paths are `blocker` issues for historical-data completeness.
- HKEX `.htm`/`.html` notices and Yahoo JSON files are raw text-like evidence under `data/raw/`; do not require `data/extracted_text/` rows for them.
## Data Audit Workflow
1. Inspect current repo state and recent commits before auditing.
2. Identify the ticker, report, rule version, stage, and data-as-of timestamp being audited.
3. Load the relevant archived facts from `data/hk_ipo.sqlite`, CSV snapshots, raw source paths, memo/report files, and rule files.
4. Check `source_refs` for repo-relative `local_path` values, existing files, and matching `file_sha256` values when present.
5. Reconcile derived artifacts, especially PDF extracted text, against their manifests and source hashes.
6. Compare database row counts with `data/snapshots/` exports for tables used by the audit.
7. Review `ticker_sync_state` and `sync_tasks` for the target ticker or sample. Treat open due tasks as possible blockers.
8. Mark each required stage fact as `present`, `missing`, `stale`, `estimated`, `inferred`, or `not_applicable`.
9. Decide whether remaining gaps are blocking or non-blocking for the specific conclusion being audited.
## Logic Audit Workflow
Check the analysis artifact, memo, report, or proposed conclusion for:
- Stage leakage: later facts must not appear in earlier-stage conclusions.
- Source support: each material claim cites an archived source, structured fact, explicit assumption, or clearly labeled inference.
- Score arithmetic: subtotals and total score match the rule file.
- Rule alignment: the decision follows the stated rule thresholds, or any override is explicit and justified.
- Probability consistency: probabilities, expected return framing, and decision language do not contradict each other.
- Causal discipline: bull points, risks, and triggers explain why the IPO should behave differently from base rates.
- Comparable discipline: IPO history, industry comparisons, and peer cases use a defined sample rather than cherry-picked examples.
- Internal consistency: valuation, demand, market window, business quality, and exit plan point to a coherent conclusion or explicitly explain tension.
- Feedback readiness: predictions are frozen, measurable, and comparable with actual post-listing outcomes.
## Output Standard
Use this structure for audit reports or final audit summaries:
```text
Audit status: pass | pass_with_gaps | fail
Target:
Stage:
Data as of:
Data integrity:
- ...
Data completeness and sufficiency:
- ...
Analysis logic self-consistency:
- ...
Blocking issues:
- ...
Non-blocking gaps:
- ...
Required fixes:
- ...
```
Severity labels:
- `blocker`: conclusion should not be trusted until fixed.
- `major`: materially weakens confidence, but may be usable with explicit caveats.
- `minor`: clarity or traceability issue.
## Boundaries
Do not silently repair data during an audit. If source facts need to be added or corrected, report the gap and route the update to `hk-ipo-archivist`.
Do not rewrite an analyst conclusion during an audit. If the logic fails, explain why and route the revised judgment to `hk-ipo-analyst`.
Do not pass an audit just because the final recommendation sounds reasonable. Pass only when the data and reasoning chain are traceable, sufficient, and internally consistent.
## Quality Checks
Before finishing, confirm:
- The audit target and stage are explicit.
- Data completeness and data sufficiency are judged separately.
- PDF source references are reconciled to extracted text and manifest hashes when auditing historical or analysis-ready data.
- Missing facts are not converted into assumptions without labels.
- Later facts are not used to validate earlier predictions.
- Any pass/fail result names the evidence that supports it.
- Durable audit files use repo-relative paths.