Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mecab-dict-index '-a' option overwrites user-specified costs/ids unexpectedly #22

Open
wataradio opened this issue Oct 2, 2015 · 2 comments

Comments

@wataradio
Copy link

mecab-dict-index '-a' option overwrites user-specified costs/ids unexpectedly.

An expected behavior of the '-a' option is that blank fields are filled out automatically, but the user-specified ones are kept as it is.

Below is an example.

  1. Prepare foo.csv file for making user dictionary like as follow.

    田町,,,3000,名詞,固有名詞,地域,一般,,,田町,タマチ,タマチ

  2. Execute the following line (before doing this, you need to get ipadic dictionary and its model file)

    mecab-dict-index -m mecab-ipadic.model -d ipadic -u foo2.csv -f euc-jp -t euc-jp -a foo.csv

  3. then, you get the following output in foo2.csv

    田町,1293,1293,8067,名詞,固有名詞,地域,一般,,,田町,タマチ,タマチ

As you see, the user-specified cost, 3000, is overwritten by 8067.

An expected output, in this case, is;

  • 田町,1293,1293,3000,名詞,固有名詞,地域,一般,,,田町,タマチ,タマチ
@wataradio
Copy link
Author

MeCab I tried is 0.996, Windows version.

@wataradio
Copy link
Author

When making a compiled user-dictionary directly with a model file, but without '-a' option, user-specified fields are taken into the compiled user dictioinary as it is.

I expect the '-a' option also works like this. Because it is useful for the following case.

  1. you want to control some limited records' costs/ids, but doesn't want to do that for the remaining part
  2. and you want to see automatically-assigned values for the remaining parts in a CSV file
  3. then, the generated CSV file can be easily integrated into system dictionary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant