Jisho

×
996ccb369486f9803d54998eff73ba16
3 Replies ・ Started by Chris10 at 2024-08-15 15:38:56 UTC ・ Last reply by Chris10 at 2024-08-21 10:57:01 UTC

Full word list in Word or Excel

Hello!
I'm trying to get a full word list of all words contained in Jisho for personal Japanese study purposes, preferably in Word or Excel format, or in a format easily convertible into ".docx", etc., but I'm lost about how to get this list.

Investigating in the "About" section of Jisho, I could understand Jisho is mostly composed by words from JMDict, which seems to be available here:

http://www.edrdg.org/wiki/index.php/JMdict-EDICT_Dictionary_Project

However , the only format I could download the JMDict is in ".gz", which I couldn't open with any program, except Notepad, so now I have that ".gz" file saved as ".txt", but it is a super heavy file, and when trying to convert it to some format like ".doc" or ".docx", my computer freezes.

Then, to sum up a bit, could you please guide me to get a list of all words contained in Jisho in Word or Excel?

Thank you all in advance for any help you could offer me.

996ccb369486f9803d54998eff73ba16
Chris10 at 2024-08-15 16:09:10 UTC

As an addition to what I said before, I can't open either that ".txt" file directly by clicking right button > Open with > Word, because when doing it, a message appears highlighting an error in the content of the file. For that reason, I tried to convert that ".txt" file to ".doc" ".docx" format, but that's when my computer freezes and I have to force a shutdown.

If you could please help me somehow, I would appreciate it a lot.

6d4b0d7986f5ee4c159a7e5fef92e241
flayxis at 2024-08-16 14:41:01 UTC

If a file name ends in ".gz" that's just an indication that the actual contents of the file have been compressed with gzip, one of the most ubiquituos file formats for file compressions used in computing. It doesn't tell you what format the file itself is in after you have decompressed it. (As for decompression itself, there are a multitude of implementations available for different platforms. In case you're using Windows, I've heard that 7-Zip is the name of a popular program that has a wide range of support for compressing and decompressing files. Other platforms like Linux or macOS usually already come with tools on-board for that task without the need to install any new software.)

The format of the JMdict dictionary file is described accurately on the page that you're linked (in the section "Format"). It tells you that it is an XML file which is further described by a DTD: http://www.edrdg.org/jmdict/jmdict_dtd_h.html If you're not familiar working with XML files, it's probably a bit daunting to get into. They are just plain text (what you call ".txt"), but even if you'd use a reader program that doesn't give up on large files (on a side note: that's sloppy programming by Microsoft if their Notepad and Word programs fail on a bit of text), the XML format is mainly meant to make it easy to process it programatically, and doesn't necessarily lead to a nice formatting for human consumption. After all it's supposed to be very versatile and usable in different contexts like for creating apps, websites and all kinds of other tools. Your use case "I want a list of every single entry as a Word document" is admittedly rare (after all, there are well over 200.000 entries just in JMdict).

If you still want to give it a try, I would suspect there are probably some programs available for your platform that allow you to open big XML files with a GUI and maybe make the necessary modifications etc. yourself relatively painlessly without learning any programming language. A few search engine hits:
* http://www.firstobject.com/dn_editor.htm
* https://xml-copy-editor.sourceforge.io
* https://symbolclick.com
(not a full list, and I haven't tested any of these.)

Probably Word, whatever you do, will still have issues with any big file though. That's just a limitation of that software I suppose.

996ccb369486f9803d54998eff73ba16
Chris10 at 2024-08-21 10:57:01 UTC

@flayxis

Thank you very much for your answer. I really appreciate a lot your help. :)

Thanks to your explanation, now I have much more clear what I'm working with and what steps to follow to deal with this kind of files. It has been for me very insightful and clarifying. Thanks a lot.

to reply.