eae427d85b
Request: - Provide a way to install or develop a PDF extraction tool for archived HK IPO documents. Changes: - Add requirements.txt with pypdf as the lightweight PDF text extraction dependency. - Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files. - Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs. - Document the extraction workflow in README.md. - Ignore .venv and keep generated SQLite/Python transient files out of git. - Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms. Verification: - Installed python3.14-venv system support, created a local .venv, and installed requirements.txt. - Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py. - Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv. - Verified SQLite integrity and snapshot row counts. - Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
1.1 KiB
1.1 KiB
| 1 | ticker | company_name_en | company_name_zh | stock_short_name | exchange | board | status | listing_date | application_start_date | application_end_date | allotment_results_expected_date | industry_label | data_as_of | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 06106 | Shanghai Seer Intelligent Technology Co., Ltd. | 上海仙工智能科技股份有限公司 | HKEX | Main Board | open_for_subscription | 2026-06-24 | 2026-06-15 | 2026-06-18 | 2026-06-23 | Industrial intelligent robots / robot controllers | 2026-06-15T06:15:00Z | Seeded from HKEXnews global offering announcement; full prospectus source classification needs follow-up. | |
| 3 | 06658 | Liuliumei Co., Ltd. | 溜溜梅股份有限公司 | LIULIUMEI | HKEX | Main Board | listed | 2026-06-15 | 2026-06-05 | 2026-06-10 | 2026-06-12 | Snack food / preserved fruit | 2026-06-15T06:15:00Z | Seeded from HKEXnews prospectus and allotment results. |
| 4 | 06675 | SENASIC Electronics Technology Co., Ltd. | 琻捷電子科技(江蘇)股份有限公司 | HKEX | Main Board | pending_listing | 2026-06-17 | 2026-06-09 | 2026-06-12 | 2026-06-16 | Automotive wireless sensing SoC / semiconductors | 2026-06-15T06:15:00Z | Seeded from HKEXnews prospectus and global offering announcement; allotment results not yet archived. |