eae427d85b
Request: - Provide a way to install or develop a PDF extraction tool for archived HK IPO documents. Changes: - Add requirements.txt with pypdf as the lightweight PDF text extraction dependency. - Add scripts/extract_pdf_text.py to extract text from PDF source_refs into repo-relative data/extracted_text files. - Add extracted text outputs and an extracted_text_manifest snapshot for the six archived HKEXnews PDFs. - Document the extraction workflow in README.md. - Ignore .venv and keep generated SQLite/Python transient files out of git. - Use extracted text to verify the 06106 full prospectus, update source_refs, remove the related data gap, and fill 06106 offering terms. Verification: - Installed python3.14-venv system support, created a local .venv, and installed requirements.txt. - Re-ran scripts/bootstrap_historical_data.py and scripts/extract_pdf_text.py. - Verified extracted text paths and hashes against data/snapshots/extracted_text_manifest.csv. - Verified SQLite integrity and snapshot row counts. - Ran git diff --cached --check and searched durable files for machine-specific absolute paths.
2.4 KiB
2.4 KiB
| 1 | source_id | ticker | source_type | title | path_base | local_path | url | file_sha256 | source_date | archived_at | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 06106_prospectus_candidate_2026_06_15 | 06106 | prospectus | Shanghai Seer Intelligent Technology Co., Ltd. Prospectus | repo_root | data/raw/06106/prospectus_candidate_2026-06-15.pdf | https://www1.hkexnews.hk/listedco/listconews/sehk/2026/0615/2026061500013.pdf | e8b129296563e43b7834be9d59ac41926fbaeb4f088da2c908b1f04b4151967b | 2026-06-15 | 2026-06-15T06:15:00Z | HKEXnews prospectus; verified by text extraction as a 424-page GLOBAL OFFERING document. |
| 3 | 06106_prospectus_notice_2026_06_15 | 06106 | prospectus_notice | Shanghai Seer Intelligent Technology Co., Ltd. Prospectus Notice | repo_root | data/raw/06106/prospectus_notice_2026-06-15.pdf | https://www1.hkexnews.hk/listedco/listconews/sehk/2026/0615/2026061500011.pdf | 510983deaba5614975a57c5e77d3ea83af071a24609c28cd3f89914e1649bff5 | 2026-06-15 | 2026-06-15T06:15:00Z | HKEXnews announcement containing global offering terms and timetable. |
| 4 | 06658_allotment_results_2026_06_12 | 06658 | allotment_results | Liuliumei Co., Ltd. Announcement of Allotment Results | repo_root | data/raw/06658/allotment_results_2026-06-12.pdf | https://www1.hkexnews.hk/listedco/listconews/sehk/2026/0612/2026061202100.pdf | bb305cf55cc87809ecd845ea44243c4f41fcfaa31dbf496580e2ed8fc06d54a0 | 2026-06-12 | 2026-06-15T06:15:00Z | HKEXnews allotment results. |
| 5 | 06658_prospectus_2026_06_05 | 06658 | prospectus | Liuliumei Co., Ltd. Prospectus | repo_root | data/raw/06658/prospectus_2026-06-05.pdf | https://www1.hkexnews.hk/listedco/listconews/sehk/2026/0605/2026060500023.pdf | e928dd8082e8aaf28156a46f64c98bee308d8ae4d10a9571a4531a3f9a8f0eb1 | 2026-06-05 | 2026-06-15T06:15:00Z | HKEXnews prospectus. |
| 6 | 06675_global_offering_announcement_2026_06_09 | 06675 | global_offering_announcement | SENASIC Electronics Technology Co., Ltd. Global Offering Announcement | repo_root | data/raw/06675/global_offering_announcement_2026-06-09.pdf | https://www.hkexnews.hk/listedco/listconews/sehk/2026/0609/2026060900009.pdf | a6b0c03d6b7a42cab0865aa0abf6dfa2dd80e6d16e392d73ddd3cd3839f7aeff | 2026-06-09 | 2026-06-15T06:15:00Z | HKEXnews global offering announcement. |
| 7 | 06675_prospectus_2026_06_09 | 06675 | prospectus | SENASIC Electronics Technology Co., Ltd. Prospectus | repo_root | data/raw/06675/prospectus_2026-06-09.pdf | https://www.hkexnews.hk/listedco/listconews/sehk/2026/0609/2026060900029.pdf | 0c0c634786b7e7da921dd631fa7ba696043fae4ab29cf29dcc5f9e976c53b160 | 2026-06-09 | 2026-06-15T06:15:00Z | HKEXnews prospectus. |