Jisho.org: Japanese Dictionary

4 Replies ・ Started by jojii at 2021-04-19 09:51:56 UTC ・ Last reply by Kimtaro Admin at 2021-09-11 22:40:21 UTC

How does the furigana algorithm work?

I'm trying to implement a furigana algorithm, which generates readings for kanji-compounds within a word/expression/sentence. In each case, I have the kanji and kana reading available (I'm using jmdict as resource). I was wondering how jisho.org does that. The resources jisho.org uses doesn't provide the readings directly and guessing it by looking up each kanji is kinda inefficient, right? How is it done by jisho.org?

koute at 2021-04-23 02:31:14 UTC

I'm guessing they probably use Mecab?

https://en.wikipedia.org/wiki/MeCab

Kimtaro Admin at 2021-06-05 03:21:10 UTC

If you mean the furigana on top of headwords, then it's an algorithm that tries to pair kanji reading data from the Kanjidic2 database with the reading of the word. For furigana on example sentences it's partly the metadata from the file linked here http://www.edrdg.org/wiki/index.php/Sentence-Dictionary_Linking with a fallback to data from MeCab that @koute linked to.

bacing at 2021-06-15 18:00:12 UTC

@Kimtaro
I've been trying to do something similar in Python and got stuck.
Is there somewhere I can read more about the "algorithm that tries to pair kanji reading data from the Kanjidic2 database with the reading of the word"?
Is it available somewhere?

Kimtaro Admin at 2021-09-11 22:40:21 UTC

@bacing The algorithm isn't very good, so I've opted not to make it public. I hope to improve it and maybe release it one day.

to reply.

Jisho

Draw

Radicals

Voice

How does the furigana algorithm work?