eae427d85b
Request: - Provide a way to install or develop a PDF extraction tool for archived HK IPO documents. Changes: - Add requirements.txt with pypdf as the lightweight PDF text extraction dependency. - Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files. - Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs. - Document the extraction workflow in README.md. - Ignore .venv and keep generated SQLite/Python transient files out of git. - Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms. Verification: - Installed python3.14-venv system support, created a local .venv, and installed requirements.txt. - Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py. - Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv. - Verified SQLite integrity and snapshot row counts. - Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
836 B
836 B
| 1 | ticker | source_id | prospectus_date | offer_price_hkd | board_lot | min_subscription_amount_hkd | global_offer_shares | hk_offer_shares_initial | international_offer_shares_initial | public_offer_pct_initial | over_allotment_offer_shares | offer_size_adjustment_offer_shares | market_cap_hkd_m | gross_proceeds_hkd_m | net_proceeds_hkd_m | issued_shares_upon_listing | data_as_of |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 06106 | 06106_prospectus_candidate_2026_06_15 | 2026-06-15 | 101.6 | 50 | 5131.24 | 10497300 | 524900 | 9972400 | 0.05 | 1574550 | 1574550 | 11226.52568 | 1066.52568 | 995.4 | 110497300 | 2026-06-15T06:15:00Z |
| 3 | 06658 | 06658_prospectus_2026_06_05 | 2026-06-05 | 43.58 | 100 | 4401.96 | 11464100 | 1146500 | 10317600 | 0.1 | 3434.59 | 499.6 | 440.1 | 78811208 | 2026-06-15T06:15:00Z | ||
| 4 | 06675 | 06675_global_offering_announcement_2026_06_09 | 2026-06-09 | 18.36 | 200 | 3709.04 | 53407000 | 5340800 | 48066200 | 0.1 | 8011000 | 6959.2 | 906.7 | 379041820 | 2026-06-15T06:15:00Z |