ChhoeTaigi的現有數位化在輸入法上很好用，但是它只有記錄台語白話字和漢字，而沒有包含前言、索引、學名列表等等。作為 ChhoeTaigi 的一部分這很合理，但我覺得想試試看把整本所有內容都數位化。
- English hyphenation is removed.
'as the apostrophe.
- Text will be in an Asciidoc file first, then an HTML file later.
- The dictionary part will be well-structured YAML.
- Syntactic misspellings (like the
n.n.thing is misspelled once as
n..non page 18) are ignored
- Misspelled words are preserved with another
- title: "scientific name" by: "person" names?: - romaji: "Romaji" kana: "ロマジ" n.n.?: true # Only present if the "n.n." thing exists - poj: "Pe̍h-ōe-jī" han: "白話字" han-orig?: "白話字" # Only present if there is an original typo - poj: "" han: "" hakka: true # for hakka words - native: "..." group: "..." where?: "全島" page: 10 family: "Polypodiaceae" indigenous: true # if false, the plant is cultivated or introduced
- The English introduction is so easy to read since it's basically the same as modern English.
- I was not expecting to learn about Taiwanese tones in POJ here, of all places.
- The way I type the old Japanese Kana usage and the old Kanji is quite horrendous, even if it's reasonably fast: I type the Han characters using Bopomofo as if it's Mandarin (Traditional Chinese), and kana with the Emacs
- I'm not really using any OCR because I don't believe any of them is able to handle a not-prefectly-clear scan of a mix of 1920s Japanese, POJ, and scientific names.
The book uses POJ for Taiwanese (and even includes an introduction to POJ and Taiwanese tones).
Each entry is:
- scientific name (the word(s) after the comma are the name of the person who published the scientific name; this convention is still alive to this day)
- Indigenous (full-face) or introduced / cultivated (italics)
- Japanese name (in Romaji)
- Taiwanese name (in POJ)
- “Kanton dialect” (actually Hakka, in POJ) (italics)
- Aboriginals name (including which people)
- Where it's found
- [category and such]