臺灣植物名彙數位化
為何
ChhoeTaigi的現有數位化在輸入法上很好用,但是它只有記錄台語白話字和漢字,而沒有包含前言、索引、學名列表等等。作為 ChhoeTaigi 的一部分這很合理,但我覺得想試試看把整本所有內容都數位化。
感謝
- 台語文記憶的原始掃描與線上發表
- ChhoeTaigi的數位化版本
- 教育部異體字字典(找尋漢字)
- 英文維基詞典(找尋漢字)
- 意傳台文輸入法
編輯原則
- English hyphenation is removed.
- Uses
'
as the apostrophe. - Text will be in an Asciidoc file first, then an HTML file later.
- The dictionary part will be well-structured YAML.
- 原本的漢字在Unicode裡的話用原本的漢字,Unicode裡找不到的話會用Unicode裡有的變體。不確定的會先用?代替。
- Syntactic misspellings (like the
n.n.
thing is misspelled once asn..n
on page 18) are ignored - Misspelled words are preserved with another
<…>-orig
key.
plants.yaml schema
- title: "scientific name"
by: "person"
names?:
- romaji: "Romaji"
kana: "ロマジ"
n.n.?: true # Only present if the "n.n." thing exists
- poj: "Pe̍h-ōe-jī"
han: "白話字"
han-orig?: "白話字" # Only present if there is an original typo
- poj: ""
han: ""
hakka: true # for hakka words
- native: "..."
group: "..."
where?: "全島"
page: 10
family: "Polypodiaceae"
indigenous: true # if false, the plant is cultivated or introduced
一些感想
- The English introduction is so easy to read since it's basically the same as modern English.
- I was not expecting to learn about Taiwanese tones in POJ here, of all places.
- The way I type the old Japanese Kana usage and the old Kanji is quite horrendous, even if it's reasonably fast: I type the Han characters using Bopomofo as if it's Mandarin (Traditional Chinese), and kana with the Emacs
japanese-katakana
input method. - I'm not really using any OCR because I don't believe any of them is able to handle a not-prefectly-clear scan of a mix of 1920s Japanese, POJ, and scientific names.
其他筆記
The book uses POJ for Taiwanese (and even includes an introduction to POJ and Taiwanese tones).
Each entry is:
- scientific name (the word(s) after the comma are the name of the person who published the scientific name; this convention is still alive to this day)
- Indigenous (full-face) or introduced / cultivated (italics)
- Japanese name (in Romaji)
- Taiwanese name (in POJ)
- “Kanton dialect” (actually Hakka, in POJ) (italics)
- Aboriginals name (including which people)
- Where it's found
- [category and such]