Commit Graph

24 Commits

Author SHA1 Message Date
geometrybase 29ed22e476 Clarify IPO short-exit strategy horizon
Request:
- Emphasize that the analyst model is for selling allocated IPO shares in T2 grey market or on D1, not for long-term holding.

Changes:
- Add explicit T2/D1 sell discipline to the analyst skill.
- Update ipo_score_v0 targets and holding policy so D1 sell return is primary and T2 is the intended extension when reliable grey-market data exists.
- Clarify that D5/D20/D60 are review labels only, not planned holding targets.
- Update the model report, single-ticker report generator, README, and the 06106 report language to reflect the short-exit horizon.

Verification:
- Rebuilt the model report with the same dataset timestamp and confirmed the analysis dataset did not change.
- Ran py_compile for build_analysis_dataset.py and generate_ipo_report.py.
- Generated a 06106 dry-run report showing T2/D1 exit discipline.
- Ran git diff --check.

Next useful context:
- T2 is still disabled in v0 until archivist approves a reliable grey-market data source; D1 remains the live modeled sell label.
2026-06-15 14:20:56 +00:00
geometrybase bd5a06465d Add 06106 T0 analyst report
Request:
- Use the analyst skill to analyze upcoming IPO 06106 and generate a Markdown report.

Changes:
- Add a stage-safe T0_prospectus analyst report for 06106.
- Record the v0 score, calibrated historical probability, key T0 positives, risks, triggers, and source path.

Verification:
- Confirmed 06106 has no structured T1 demand yet, so the report is T0_prospectus.
- Reviewed the report for stage safety and repo-relative source paths.
- Ran git diff --check.

Next useful context:
- Re-run analyst after 06106 allotment results are archived around 2026-06-23 to generate a T1_allotment report.
2026-06-15 14:13:27 +00:00
geometrybase 1227f2c7c4 Add single IPO analyst report generator
Request:
- Let the analyst skill generate a Markdown report directly when a new IPO ticker is provided.

Changes:
- Add scripts/generate_ipo_report.py for stage-safe single-ticker reports from the v0 analysis dataset.
- Auto-select T1 reports when structured allotment demand exists and otherwise use T0 prospectus-stage reporting.
- Keep post-listing D1/D5/D20/D60 outcomes out of prediction reports while using historical buckets for calibration.
- Document the workflow in the analyst skill and README.

Verification:
- Ran py_compile for scripts/generate_ipo_report.py.
- Generated stdout dry-run reports for 06106 and 06658.
- Wrote temporary Markdown reports under /tmp for output-path validation.
- Ran git diff --check.

Next useful context:
- Before generating a report for a ticker absent from the analysis dataset, run archivist updates and rebuild scripts/build_analysis_dataset.py.
2026-06-15 14:11:18 +00:00
geometrybase 58ad869f84 Refresh IPO analysis model calibration
Request:
- Re-analyze the IPO model using the updated historical archive after T1 demand backfill.

Changes:
- Regenerate the v0 analysis dataset from the current SQLite archive.
- Refresh the v0 calibration report with expanded T1 coverage and new empirical bucket rates.
- Update the report template to show pending T1 rows and field-level blanks.
- Clarify v0 limitations and record why the score formula stays unchanged for this refresh.

Verification:
- Ran scripts/build_analysis_dataset.py against data/hk_ipo.sqlite.
- Ran py_compile for scripts/build_analysis_dataset.py.
- Checked dataset row count, T1 demand coverage, source-only T1 gaps, and repo-relative paths.
- Ran git diff --check.

Next useful context:
- T1 structured coverage is now 291 rows, with 06106 and 06675 still pending_not_due.
- The high-conviction T1 bucket remains differentiated, but middle and low buckets are still not monotonic enough for a v1 rule change.
2026-06-15 14:05:34 +00:00
geometrybase 6d05056609 Backfill structured T1 demand from archived text
Request:
- Use archivist to close the 137 T1 ipo_demand source-only gaps using extracted PDF text.

Changes:
- Add an incremental T1 demand text backfill script.
- Parse existing allotment-result extracted text into ipo_demand.
- Archive linked Summary PDFs from old HKEX HTML allotment-result pages.
- Correct allotment-result selection to prefer primary result announcements over clarification or supplemental notices.
- Add robust line-aware allotment parsing and document the workflow in archivist and README.
- Record the backfill result in a report.

Execution:
- Selected 137 source-only T1 demand gaps.
- Wrote 137 ipo_demand rows, increasing ipo_demand from 154 to 291 rows.
- Archived 38 new HKEX allotment-result PDFs and extracted their text.
- Confirmed an incremental rerun selects 0 gaps and writes 0 rows.

Verification:
- Ran git diff --cached --check.
- Ran py_compile for archive_hkex_documents.py and backfill_t1_demand_from_text.py.
- Checked SQLite integrity and foreign keys.
- Confirmed DB row counts match CSV snapshots.
- Verified no T1 complete row is missing ipo_demand.
- Verified source_refs paths/files/hashes and PDF extracted-text manifest hashes.

Next useful context:
- T1 demand structure is complete for listed rows; 06106 and 06675 remain pending_not_due.
- T2 grey-market and due price-performance gaps remain separate archivist priorities.
- Analyst output should be regenerated before using the new T1 demand facts for scoring.
2026-06-15 13:59:06 +00:00
geometrybase 33d0bc056e Tighten historical data audit coverage
Request:
- Use the audit skill to check historical data completeness and self-correct the audit criteria after the missed PDF extracted-text gap.

Changes:
- Add a mandatory derived-evidence checklist to the audit skill.
- Require broad historical audits to reconcile PDF source_refs, extracted text files, manifest rows, and hashes.
- Add a historical data completeness audit report for the current archive.

Findings:
- Source integrity and PDF extracted-text completeness now pass.
- Full historical completeness still fails due to incomplete structured T1 demand, unresolved T2 grey-market data, open due price-performance tasks, and missing context fields.

Verification:
- Ran SQLite integrity, foreign-key, source hash, snapshot, PDF manifest, extracted-text hash, stage coverage, and analysis-dataset checks.
- Ran scripts/extract_pdf_text.py and confirmed 557 PDF sources were skipped unchanged with 557 manifest rows.
- Ran git diff --check.
2026-06-15 13:43:22 +00:00
geometrybase 8a0dfd88f0 Make PDF text extraction a standard archive step
Request:
- Add extracted PDF text generation to the archivist workflow as a standard step.

Changes:
- Run PDF text extraction automatically for newly archived HKEX PDF sources.
- Make the PDF text extractor incremental and manifest-preserving.
- Document extracted-text handling in the archivist skill and README.
- Mark generated extracted text as no-diff data evidence.
- Backfill extracted text for all archived PDF source references.

Verification:
- Ran git diff --cached --check.
- Ran .venv/bin/python -m py_compile scripts/extract_pdf_text.py scripts/archive_hkex_documents.py.
- Ran full PDF extraction, then confirmed an incremental rerun skips unchanged files.
- Verified 557 PDF source_refs, 557 manifest rows, all status ok, and zero missing text/hash/path issues.

Next useful context:
- HKEX HTML notices and Yahoo JSON market data remain under data/raw and are not expected in data/extracted_text.
2026-06-15 13:27:41 +00:00
geometrybase 48b89552fe Add IPO analysis model baseline
Request:
- Use the analyst skill to digest downloaded IPO archive data and start building an analysis model.

Changes:
- Add ipo_score_v0 as the first transparent stage-safe scoring rule set.
- Add build_analysis_dataset.py to derive model features, scores, decision bands, and empirical D1 calibration from SQLite.
- Generate analysis_model_v0_dataset.csv with 293 scored IPO rows and archived source paths.
- Add a model calibration report documenting coverage, T0/T1 bucket performance, usage, and known gaps.
- Record the initial model entry in the rule change log and document the command in README.

Verification:
- Ran py_compile for scripts/build_analysis_dataset.py.
- Regenerated the analysis dataset and report with as-of 2026-06-15T13:00:00Z.
- Checked CSV row count, source path coverage, and repo-relative path hygiene.
- Ran git diff --cached --check.

Next useful context:
- v0 should be treated as a transparent baseline, with T1 high-score calibration strongest and middle buckets still non-monotonic.
- T2 is excluded until a reliable grey-market source is approved.
2026-06-15 12:49:48 +00:00
geometrybase 5f9546b16c Improve IPO archive gap handling
Request:
- Rework archivist handling for stubborn T0/T1 HKEX document gaps and unresolved T2 grey-market gaps.

Changes:
- Query HKEXnews titleSearchServlet with IPO-date windows instead of only the latest title-search page.
- Recognize SHARE OFFER listing documents and archive official HTML allotment-result notices when no PDF is published.
- Mark source-only allotment completion clearly when structured demand parsing is not yet covered.
- Add a reusable grey-market gap marker and archivist source policy for T2 data.
- Archive newly discovered HKEX raw sources, update SQLite, and refresh CSV snapshots.
- Treat raw evidence files as binary in Git attributes.

Verification:
- Ran py_compile for archive_hkex_documents.py, update_sync_state.py, and mark_grey_market_gaps.py.
- Ran HKEX document archive backfill and grey-market gap marker.
- Checked SQLite integrity, foreign keys, source paths, source hashes, and DB-vs-snapshot row counts.
- Ran git diff --cached --check after marking raw archives binary.

Next useful context:
- T0 is now complete for 293 tickers.
- T1 has 291 complete and 2 pending_not_due tickers.
- T2 has 291 blocked gaps pending an approved grey-market source strategy.
2026-06-15 09:47:36 +00:00
geometrybase 078f56998b Backfill IPO price performance history
Request:
- Adjust archivist after the audit findings and update historical data.

Changes:
- Teach the archivist skill to close audit-discovered gaps in priority order.
- Add scripts/archive_price_performance.py for due D1/D5/D20/D60 price-performance backfills.
- Document the price-performance backfill command in README.
- Archive raw Yahoo Finance chart responses under repo-relative data/raw/{ticker}/ paths.
- Populate price_performance with D1/D5/D20/D60 checkpoints and refresh source_refs, sync_runs, sync_tasks, and ticker_sync_state snapshots.

Execution:
- Ran .venv/bin/python scripts/archive_price_performance.py --as-of 2026-06-15T10:00:00Z.
- Selected 291 due price-performance tickers.
- Archived 273 price-history sources and wrote 1063 price-performance rows.
- Re-ran .venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T10:05:00Z for the remaining open T0/T1 tasks; no additional completed T0/T1 stages resulted.

Verification:
- Compiled the new price-performance script.
- Ran git diff --check.
- Checked SQLite integrity and foreign keys.
- Confirmed database row counts match CSV snapshots.
- Verified all 979 source_refs use valid repo-relative paths, have files, have hashes, and SHA256 hashes match.
- Confirmed no generated Python caches or SQLite transient files remain.

Next useful context:
- price_performance now has 1063 rows: D1 273, D5 272, D20 267, D60 251.
- Remaining due price-performance gaps are 18 tickers where Yahoo history was unavailable or the request failed.
- T0/T1 gaps remain at T0 93 and T1 77; T2 grey-market remains unresolved pending a reproducible source strategy.
2026-06-15 09:16:08 +00:00
geometrybase 53e5649ff4 Add HK IPO audit skill
Request:
- Add a project-local audit skill for checking IPO data completeness, sufficiency, and analysis self-consistency.

Changes:
- Create .codex/skills/audit/SKILL.md.
- Define audit scope across source integrity, stage data completeness, data sufficiency, and reasoning consistency.
- Separate responsibilities from archivist fact updates and analyst investment conclusions.

Verification:
- Reviewed the new skill body.
- Ran git diff --check.
- Confirmed the project-local skill list includes archivist, analyst, and audit.
2026-06-15 08:22:03 +00:00
geometrybase 098ae2073f Clarify HKEX backfill wording
Request:
- Remove wording that could imply batched HKEX document backfills.

Changes:
- Replace the remaining progressive-fill phrase in README with direct full-run wording.

Verification:
- Confirmed README, archivist skill, and archiver script contain no batch, small-batch, --limit 5, progressively, or 分批 wording.
- Ran git diff --check.
2026-06-15 08:08:01 +00:00
geometrybase 9aab267f80 Run full HKEX document backfill
Request:
- Remove small-batch guidance and execute the HKEX document archiver across all open T0/T1 sync tasks in one run.

Changes:
- Make archive_hkex_documents.py process every open T0/T1 ticker by default when --limit is omitted.
- Add per-ticker progress output and keep full refreshes moving if one ticker fails.
- Suppress noisy pypdf warnings during large official document extraction.
- Update archivist and README instructions to show the full-run command without batch notes.
- Archive official HKEXnews prospectus and allotment-results PDFs under repo-relative data/raw paths.
- Refresh hk_ipo.sqlite and CSV snapshots for parsed T0/T1 fields, source_refs, sync_runs, sync_tasks, and ticker_sync_state.

Execution:
- Ran .venv/bin/python scripts/archive_hkex_documents.py --as-of 2026-06-15T09:00:00Z.
- Selected 284 open T0/T1 tickers, processed 210 tickers, and archived 398 source files.
- Left 74 tickers as missing target docs because title search did not return target prospectus/allotment documents for this pass.

Verification:
- Parsed archivist scripts with Python ast.
- Confirmed README, archivist skill, and archiver script no longer contain batch guidance.
- Ran git diff --check.
- Checked SQLite integrity and DB/snapshot row counts.
- Verified 706 source_refs use relative local paths, all files exist, and SHA256 hashes match.

Next useful context:
- Current source_refs count is 706 and ipo_demand count is 134.
- Sync ledger now reports 414 complete, 1595 pending_due, and 42 pending_not_due states.
2026-06-15 07:57:33 +00:00
geometrybase 993d7b26fa Backfill first HKEX IPO document batch
Request:
Start progressively filling detailed information for recent HK IPO targets.

Changes:
- Add scripts/archive_hkex_documents.py to map tickers to HKEXnews stock IDs, select official prospectus and allotment-results PDFs, archive them under data/raw/{ticker}, parse high-confidence T0/T1 facts, export snapshots, and refresh sync state.
- Document the small-batch HKEX document backfill workflow in README.md and the archivist skill.
- Archive prospectus and allotment-results PDFs for 00901, 01081, 01779, 02290, 02553, and 03388.
- Fill T0 details including application dates, expected allotment date, board lot, minimum subscription amount, and offer-share counts for the six tickers.
- Fill T1 allotment-demand details including valid/successful applications, public subscription level, international placees, international subscription level, and final offer-share allocations.
- Refresh source_refs, ipo_master, offering_terms, ipo_demand, ticker_sync_state, and sync_tasks snapshots.

Verification:
- Ran archive_hkex_documents.py in a first small batch and re-ran corrected tickers after parser hardening.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, source files exist, and SHA-256 hashes match.
- Confirmed batch field completeness for the six processed tickers.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- This batch added about 55MB of official HKEXnews PDFs.
- Sync state now has 16 complete stages, 1993 pending_due stages, and 42 pending_not_due stages.
- Continue with small --limit batches because HKEXnews title search can include historical or postponed offering documents for the same stock code.
2026-06-15 07:07:46 +00:00
geometrybase c65b20a1c4 Archive recent HKEX IPO targets
Request:
Use the project archivist workflow to update IPO target coverage for the most recent three-year window.

Changes:
- Add scripts/update_recent_ipo_list.py to discover HKEXnews annual new listing reports, archive XLSX sources, parse subscription-relevant IPO rows, and update SQLite plus snapshots.
- Add new_listing_report_entries to preserve annual report row-level evidence.
- Archive 2023-2026 Main Board new listing reports and 2024-2026 GEM new listing reports.
- Seed 290 report-backed IPO targets for 2023-06-15 through 2026-06-15, skipping 10 non-IPO rows without numeric offer prices.
- Refresh ipo_master, missing offering_terms fields, source_refs, ticker_sync_state, and sync_tasks.
- Add openpyxl as the XLSX parser dependency and document the archivist refresh flow.
- Limit sync summary output while keeping the full queue in SQLite and CSV snapshots.

Verification:
- Ran update_recent_ipo_list.py for 2023-06-15 to 2026-06-15 with as-of 2026-06-15T07:30:00Z.
- Parsed project Python scripts with ast.parse.
- Checked SQLite integrity and DB-to-snapshot row counts.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Ran git diff --check and git diff --cached --check.
- Checked for Python cache and SQLite transient files.

Next useful context:
- ipo_master now has 293 tickers; new_listing_report_entries has 290 report-backed targets.
- Current sync queue has 2005 open tasks and 42 waiting_until_due tasks for deeper per-ticker archival stages.
2026-06-15 06:42:31 +00:00
geometrybase 08db218b6d Add archivist incremental sync state
Request:
Add archivist support for remembering which IPO archive stages have already been synced and which stages should be updated next.

Changes:
- Add sync_runs, ticker_sync_state, sync_tasks, and price_performance tables to the archive schema.
- Add scripts/update_sync_state.py to derive per-ticker stage status and rebuild the next-sync task queue.
- Export the new sync-state tables as Git-friendly CSV snapshots.
- Document the incremental archive flow in the archivist skill and README.

Verification:
- Ran scripts/bootstrap_historical_data.py.
- Ran scripts/update_sync_state.py with a deterministic as-of timestamp.
- Checked SQLite integrity and DB-to-snapshot row counts with Python sqlite3.
- Parsed Python scripts with ast.parse.
- Ran git diff --check and checked for temporary SQLite/cache files.

Next useful context:
- Current derived queue has 2 open tasks for 06658 and 15 waiting_until_due tasks for future stages.
2026-06-15 06:29:54 +00:00
geometrybase fb715cd446 Require automatic upstream pushes
Request:
- Make AGENTS.md explicitly require automatic pushes to the remote after commits.

Changes:
- Add a Git Workflow rule to push the current branch to its configured upstream remote after a successful commit.
- Add a reporting rule for missing upstream configuration or failed pushes.

Verification:
- Reviewed the updated Git Workflow section.
- Ran git diff --check.
2026-06-15 06:22:44 +00:00
geometrybase eae427d85b Add PDF text extraction workflow
Request:
- Provide a way to install or develop a PDF extraction tool for archived HK IPO documents.

Changes:
- Add requirements.txt with pypdf as the lightweight PDF text extraction dependency.
- Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files.
- Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs.
- Document the extraction workflow in README.md.
- Ignore .venv and keep generated SQLite/Python transient files out of git.
- Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms.

Verification:
- Installed python3.14-venv system support, created a local .venv, and installed requirements.txt.
- Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py.
- Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv.
- Verified SQLite integrity and snapshot row counts.
- Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
2026-06-15 06:21:16 +00:00
geometrybase 7a8c648d87 Bootstrap HK IPO historical archive
Request:
- Use the project archivist workflow to update historical IPO data.

Changes:
- Add an embedded SQLite archive at data/hk_ipo.sqlite.
- Add schema/hk_ipo.schema.sql and scripts/bootstrap_historical_data.py for reproducible archive generation.
- Archive HKEXnews source PDFs for 06658, 06675, and 06106 under repo-relative data/raw paths.
- Export Git-friendly snapshots for ipo_master, offering_terms, ipo_demand, source_refs, and data_gaps.
- Add .gitignore rules for Python cache and SQLite transient files.

Verification:
- Re-ran the bootstrap script successfully.
- Ran PRAGMA integrity_check on the SQLite database.
- Verified source_refs paths are repo-relative, files exist, and SHA-256 hashes match.
- Verified snapshot row counts match SQLite table counts.
- Ran git diff --check and searched generated durable files for machine-specific absolute paths.
2026-06-15 06:13:27 +00:00
geometrybase 6b6df26271 Remove explicit push restriction
Request:
- Delete the AGENTS.md rule that allowed pushing only when explicitly requested.

Changes:
- Remove the single Git Workflow bullet that restricted push behavior.

Verification:
- Reviewed the focused diff for AGENTS.md.
- Confirmed no remaining push-related text with rg.
2026-06-15 06:05:04 +00:00
geometrybase 408ba59bc6 Document HK IPO project workflow
Request:
- Write a README introducing the project.

Changes:
- Describe the HK IPO research feedback loop.
- Document the stage-based workflow, project-local skills, storage model, path rules, and Git discipline.

Verification:
- Reviewed README contents with sed.
- Ran rg for machine-specific absolute path patterns; none found.
- Ran git diff --check.
2026-06-15 06:02:10 +00:00
geometrybase a138ef3193 Add project agent workflow instructions
Request:
- Review the project agent instructions and make the Git workflow explicitly automatic.
- Commit the project-local modification into the repository.

Changes:
- Add AGENTS.md to the repo.
- Rename the Git workflow section to emphasize automatic commits.
- Clarify that completed repository changes should be committed before the final response.
- Clarify that related project-local files such as .codex skills, schema, scripts, snapshots, memos, and documentation belong in the focused commit.

Verification:
- Reviewed the updated Git Workflow section with sed.
- Confirmed the expected automatic-commit language with rg.
- Checked the staged diff includes only AGENTS.md.
2026-06-15 06:00:26 +00:00
geometrybase 67b78cc172 Add project-local HK IPO skills
Request:
- Keep the HK IPO workflow skills inside the repo so they travel with the project.
- Use concise names while preserving clear HK IPO scope and repo-relative path rules.

Changes:
- Add .codex/skills/archivist for source archiving, SQLite fact updates, hashes, and CSV snapshots.
- Add .codex/skills/analyst for T0/T1/T2 IPO decisions, prediction cards, reviews, and rule-change recommendations.
- Add agents/openai.yaml metadata for both skills.

Verification:
- Checked staged changes include only .codex skill files.
- Searched .codex for machine-specific absolute path patterns; none found.

Next useful context:
- AGENTS.md remains untracked and was not included in this commit.
2026-06-15 05:53:53 +00:00
geometrybase 6907418731 first commit 2026-06-15 05:43:41 +00:00