Mozc UT2 Dictionary

20171002

Mozc UT2 DictionarySecond mozc-utDefault entriesOptional entriesLicenseDownloadInstallAdvanced: Add optional entriesAdvanced: Refresh hit numbers

Mozc NEologd UT Dictionary is here. Mozc UT Dictionary (Discontinued) is here.

Second mozc-ut

I lost a disk partition that includes tools for making mozc-ut dictionary. I used yahoo and google's "hit numbers" to sort words in mozc-ut1, but I can't do it again. They don't provide free search API now. I wrote mozc-ut2 from scratch. I splitted Wikipedia's articles into 1 million files and got hit numbers by Hyper Estraier. mozc-ut2 will add over 500,000 words.

Default entries

My big thanks go to the authors/maintainers.

Type "いんたーねっと" and press space ⇨ Internet

If you don't want to use it, run

and uncheck "Katakana to English conversion" in "Dictionary" tab.

Optional entries

Press Caps Lock, type "dolphin" and press Tab.

If you need a dictionary for human, check this page.

License

I think we can redistribute hatena's yomigana-hyouki pairs, but I can't believe we can redistribute niconico's pairs. If you want to make redistributable mozc-ut, don't uncomment #NICODIC="true" in generate-dictionary.sh.

Download

https://osdn.net/users/utuhiro/pf/utuhiro/files/

Install

See mozc's official Build Instructions. If you are using Arch Linux (tested on Antergos Linux), you can make and install packages as follows:

Advanced: Add optional entries

Get the latest Mozc.

Choose optional entries.

If you want to use an English-Japanese dictionary, uncomment the following line.

If you want to use a niconico dictionary, uncomment the following line.

Generate mozc-ut

Advanced: Refresh hit numbers

You need 35GB disk space (use SSD) and it will take 8 hours.

This will download the latest edict/hatena/niconico/skk-jisyo files, and refresh hit numbers with the latest Japanese Wikipedia articles.

Install ruby and gcc-6.4.1.

estcmd built with gcc-7.2.0 caused segfault. I sent mails to the author, but I couldn't get a reply.

Install QDBM and Hyper Estraier.

I use Hyper Estraier to get hit numbers.

wget http://fallabs.com/qdbm/qdbm-1.8.78.tar.gz
tar xf qdbm-1.8.78.tar.gz
cd qdbm-1.8.78/
./configure --prefix=/usr --enable-zlib
make -j4 CC=/usr/bin/gcc-6
sudo make install
wget http://fallabs.com/hyperestraier/hyperestraier-1.4.13.tar.gz
tar xf hyperestraier-1.4.13.tar.gz
cd hyperestraier-1.4.13/
./configure --prefix=/usr --enable-zlib
make -j4 CC=/usr/bin/gcc-6
sudo make install
cd ../..

Put mozcdic-ut2 into mozc-tmp.

mkdir -p mozc-tmp
mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
cd mozc-tmp/
tar xf mozcdic-ut2-date.tar.bz2

Get alt-cannadic.

Get alt-cannadic-110208.tar.bz2.

mv alt-cannadic-110208.tar.bz2 mozcdic-ut2-date/alt-cannadic/

Change SEEDVER of mecab-user-dict-seed.

Check mecab-user-dict-seed.yyyymmdd.csv.xz and change SEEDVER in neologd/generate-dictionary.sh.

cd mozcdic-ut2-date/neologd/
leafpad generate-dictionary.sh

Change MOZCVER and DICVER.

cd ../
leafpad generate-dictionary.sh

Change DICVER.

cd src/
leafpad generate-release.sh

Refresh hit numbers with the latest Japanese Wikipedia articles.

sh update-dictionary.sh

HOME