Files
hk-ipo/data/snapshots/offering_terms.csv
T
geometrybase eae427d85b Add PDF text extraction workflow
Request:
- Provide a way to install or develop a PDF extraction tool for archived HK IPO documents.

Changes:
- Add requirements.txt with pypdf as the lightweight PDF text extraction dependency.
- Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files.
- Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs.
- Document the extraction workflow in README.md.
- Ignore .venv and keep generated SQLite/Python transient files out of git.
- Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms.

Verification:
- Installed python3.14-venv system support, created a local .venv, and installed requirements.txt.
- Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py.
- Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv.
- Verified SQLite integrity and snapshot row counts.
- Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
2026-06-15 06:21:16 +00:00

836 B

1tickersource_idprospectus_dateoffer_price_hkdboard_lotmin_subscription_amount_hkdglobal_offer_shareshk_offer_shares_initialinternational_offer_shares_initialpublic_offer_pct_initialover_allotment_offer_sharesoffer_size_adjustment_offer_sharesmarket_cap_hkd_mgross_proceeds_hkd_mnet_proceeds_hkd_missued_shares_upon_listingdata_as_of
20610606106_prospectus_candidate_2026_06_152026-06-15101.6505131.241049730052490099724000.051574550157455011226.525681066.52568995.41104973002026-06-15T06:15:00Z
30665806658_prospectus_2026_06_052026-06-0543.581004401.96114641001146500103176000.13434.59499.6440.1788112082026-06-15T06:15:00Z
40667506675_global_offering_announcement_2026_06_092026-06-0918.362003709.04534070005340800480662000.180110006959.2906.73790418202026-06-15T06:15:00Z