Jisho

×
97d1e1e75daef6998e19321817dc91dc
4 Replies ・ Started by jojii at 2021-04-19 09:51:56 UTC ・ Last reply by Kimtaro Admin at 2021-09-11 22:40:21 UTC

How does the furigana algorithm work?

I'm trying to implement a furigana algorithm, which generates readings for kanji-compounds within a word/expression/sentence. In each case, I have the kanji and kana reading available (I'm using jmdict as resource). I was wondering how jisho.org does that. The resources jisho.org uses doesn't provide the readings directly and guessing it by looking up each kanji is kinda inefficient, right? How is it done by jisho.org?

3058c3367603d1004740abda3938e94e
koute at 2021-04-23 02:31:14 UTC

I'm guessing they probably use Mecab?

https://en.wikipedia.org/wiki/MeCab

2986330e38386f92fee4774b0c54ed66
Kimtaro Admin at 2021-06-05 03:21:10 UTC

If you mean the furigana on top of headwords, then it's an algorithm that tries to pair kanji reading data from the Kanjidic2 database with the reading of the word. For furigana on example sentences it's partly the metadata from the file linked here http://www.edrdg.org/wiki/index.php/Sentence-Dictionary_Linking with a fallback to data from MeCab that @koute linked to.

49850c098a5727e500dc3deb974ba0d1
bacing at 2021-06-15 18:00:12 UTC

@Kimtaro
I've been trying to do something similar in Python and got stuck.
Is there somewhere I can read more about the "algorithm that tries to pair kanji reading data from the Kanjidic2 database with the reading of the word"?
Is it available somewhere?

2986330e38386f92fee4774b0c54ed66
Kimtaro Admin at 2021-09-11 22:40:21 UTC

@bacing The algorithm isn't very good, so I've opted not to make it public. I hope to improve it and maybe release it one day.

to reply.