臺灣植物名彙數位化

為何

ChhoeTaigi的現有數位化在輸入法上很好用,但是它只有記錄台語白話字和漢字,而沒有包含前言、索引、學名列表等等。作為 ChhoeTaigi 的一部分這很合理,但我覺得想試試看把整本所有內容都數位化。

感謝

編輯原則

  • English hyphenation is removed.
  • Uses ' as the apostrophe.
  • Text will be in an Asciidoc file first, then an HTML file later.
  • The dictionary part will be well-structured YAML.
  • 原本的漢字在Unicode裡的話用原本的漢字,Unicode裡找不到的話會用Unicode裡有的變體。不確定的會先用?代替。
  • Syntactic misspellings (like the n.n. thing is misspelled once as n..n on page 18) are ignored
  • Misspelled words are preserved with another <…>-orig key.

plants.yaml schema

- title: "scientific name"
  by: "person"
  names?:
    - romaji: "Romaji"
      kana: "ロマジ"
      n.n.?: true # Only present if the "n.n." thing exists
    - poj: "Pe̍h-ōe-jī"
      han: "白話字"
      han-orig?: "白話字" # Only present if there is an original typo
    - poj: ""
      han: ""
      hakka: true # for hakka words
    - native: "..."
      group: "..."
  where?: "全島"
  page: 10
  family: "Polypodiaceae"
  indigenous: true # if false, the plant is cultivated or introduced

一些感想

  • The English introduction is so easy to read since it's basically the same as modern English.
  • I was not expecting to learn about Taiwanese tones in POJ here, of all places.
  • The way I type the old Japanese Kana usage and the old Kanji is quite horrendous, even if it's reasonably fast: I type the Han characters using Bopomofo as if it's Mandarin (Traditional Chinese), and kana with the Emacs japanese-katakana input method.
  • I'm not really using any OCR because I don't believe any of them is able to handle a not-prefectly-clear scan of a mix of 1920s Japanese, POJ, and scientific names.

其他筆記

The book uses POJ for Taiwanese (and even includes an introduction to POJ and Taiwanese tones).

Each entry is:

  • scientific name (the word(s) after the comma are the name of the person who published the scientific name; this convention is still alive to this day)
  • Indigenous (full-face) or introduced / cultivated (italics)
  • Japanese name (in Romaji)
  • Taiwanese name (in POJ)
  • “Kanton dialect” (actually Hakka, in POJ) (italics)
  • Aboriginals name (including which people)
  • Where it's found
  • [category and such]