Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running demos on Ubuntu 14 #14

Open
olivermueller opened this issue May 2, 2014 · 5 comments
Open

Issue running demos on Ubuntu 14 #14

olivermueller opened this issue May 2, 2014 · 5 comments

Comments

@olivermueller
Copy link

Hi there,

I get the following error message on all demos:

Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet]
Traceback (most recent call last):
File "bin/train_mallet.py", line 42, in
main()
File "bin/train_mallet.py", line 39, in main
TrainMallet( args.corpus_path, args.model_path, args.token_regex, args.topics, args.iters, args.quiet, args.overwrite )
File "bin/train_mallet.py", line 25, in TrainMallet
BuildLDA( corpus_filename, model_path, tokenRegex = token_regex, numTopics = num_topics, numIters = num_iters )
File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 31, in init
importer.ImportFileOrFolder( tokenRegex )
File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 76, in ImportFileOrFolder
self.Shell( command )
File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 44, in Shell
p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT )
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Best,
Oliver

@jcchuang
Copy link
Member

jcchuang commented May 3, 2014

The file corpus.txt should have been automatically extracted from corpus.db when you download the dataset.

Run the following command to generate the file:
bin/export_corpus.py data/demo/20newsgroups/corpus data/demo/20newsgroups/corpus/corpus.txt

You might want to remove the following folder, so that you have a clean start when you train an LDA in MALLET.
rm -rf data/demo/20newsgroups/model-mallet

Then, try running the following again.
./demo.py 20newsgroups mallet

@olivermueller
Copy link
Author

The file corpus.txt is being created. Here is a longer extract of the error message:

Copying [data/demo/infovis/corpus/corpus.db] --> [apps/temp_20140503_190729_997992_3269/data/corpus.db]
Copying [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/corpus.txt]
Extracting [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/sentences.txt]
An error occured while creating app: infovis_mallet [apps/infovis_mallet]
Traceback (most recent call last):
File "bin/read_mallet.py", line 87, in
main()
File "bin/read_mallet.py", line 84, in main
ImportMalletLDA( args.app_name, args.model_path, args.corpus_path, args.database_path, args.quiet, args.overwrite )
File "bin/read_mallet.py", line 50, in ImportMalletLDA
SplitSentences( corpus_filename, app_sentences_filename )
File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 14, in init
self.Shell( command )
File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 17, in Shell
p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT )
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

The error occurs for mallet and gensim.

@jcchuang
Copy link
Member

jcchuang commented May 6, 2014

There must be other issues that cause these files to be missing. Regenerating these files doesn't fix the root problem. Could you remove the data/demo/infovis and apps/infovis_mallet folders, and run "./demo.py infovis mallet"? What is the full console output?

@jcchuang jcchuang closed this as completed May 6, 2014
@jcchuang jcchuang reopened this May 6, 2014
@olivermueller
Copy link
Author

Here is the full console output:

oliver@ubuntu:$ cd termite-data-server-master/
oliver@ubuntu:
/termite-data-server-master$ sudo python demo.py 20newsgroups mallet
[sudo] password for oliver:

Build a topic model (mallet) using a demo dataset (20newsgroups)
database = data/demo/20newsgroups/corpus
corpus = data/demo/20newsgroups/corpus
model = data/demo/20newsgroups/model-mallet
app = 20newsgroups_mallet

Setting up the 20newsgroups dataset...
Creating folder 'data/demo/20newsgroups'...
Downloading...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 38.1M 100 38.1M 0 0 279k 0 0:02:19 0:02:19 --:--:-- 291k
Uncompressing...
Extracting corpus.txt from corpus.db...
Exporting database [data/demo/20newsgroups/corpus/corpus.db] to file [data/demo/20newsgroups/corpus/corpus.txt]
Corpus available: data/demo/20newsgroups/corpus
Available: tools/mallet-2.0.7
Available: tools/mallet-2.0.7
Available: tools/corenlp-3.3.1

Training an LDA topic model using MALLET...
corpus = data/demo/20newsgroups/corpus/corpus.txt
model = data/demo/20newsgroups/model-mallet
token_regex = \w{3,}
topics = 20
iters = 1000

Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet]
Traceback (most recent call last):
File "bin/train_mallet.py", line 42, in
main()
File "bin/train_mallet.py", line 39, in main
TrainMallet( args.corpus_path, args.model_path, args.token_regex, args.topics, args.iters, args.quiet, args.overwrite )
File "bin/train_mallet.py", line 25, in TrainMallet
BuildLDA( corpus_filename, model_path, tokenRegex = token_regex, numTopics = num_topics, numIters = num_iters )
File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 31, in init
importer.ImportFileOrFolder( tokenRegex )
File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 76, in ImportFileOrFolder
self.Shell( command )
File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 44, in Shell
p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT )
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Import a MALLET LDA topic model as a web2py application...
app_name = 20newsgroups_mallet
app_path = apps/20newsgroups_mallet
model_path = data/demo/20newsgroups/model-mallet
corpus_filename = data/demo/20newsgroups/corpus/corpus.txt
database_filename = data/demo/20newsgroups/corpus/corpus.db

Creating app: 20newsgroups_mallet [apps/temp_20140518_102959_056298_2262]
Creating folder: [apps/temp_20140518_102959_056298_2262/data]
Creating folder: [apps/temp_20140518_102959_056298_2262/databases]
Linking folder: [apps/temp_20140518_102959_056298_2262/models]
Linking folder: [apps/temp_20140518_102959_056298_2262/views]
Linking folder: [apps/temp_20140518_102959_056298_2262/controllers]
Linking folder: [apps/temp_20140518_102959_056298_2262/static]
Linking folder: [apps/temp_20140518_102959_056298_2262/modules]
Creating file: [apps/temp_20140518_102959_056298_2262/init.py]
Copying [data/demo/20newsgroups/corpus/corpus.db] --> [apps/temp_20140518_102959_056298_2262/data/corpus.db]
Copying [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/corpus.txt]
Extracting [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/sentences.txt]
An error occured while creating app: 20newsgroups_mallet [apps/20newsgroups_mallet]
Traceback (most recent call last):
File "bin/read_mallet.py", line 87, in
main()
File "bin/read_mallet.py", line 84, in main
ImportMalletLDA( args.app_name, args.model_path, args.corpus_path, args.database_path, args.quiet, args.overwrite )
File "bin/read_mallet.py", line 50, in ImportMalletLDA
SplitSentences( corpus_filename, app_sentences_filename )
File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 14, in init
self.Shell( command )
File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 17, in Shell
p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT )
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
oliver@ubuntu:~/termite-data-server-master$

@jsbarry
Copy link

jsbarry commented Jun 5, 2014

Ran into some of the same problems here.

  1. make sure you have curl installed in your distro (i.e. sudo apt-get install curl). If you don't then the line

curl --insecure --location http://homes.cs.washington.edu/~jcchuang/termite-datasets/$DEMO.zip > $DOWNLOAD_PATH/$DEMO.zip

located in fetch_dataset.sh cannot execute

  1. make sure the mallet, CoreNLP and gensim are well downloaded in the utils/tools. CoreNLP is quite large so it may take time.
  2. You can always check that the demo.zip can actually open and you can see the contents. Part of the problems I encountered were exactly that...the db wasn't being unzipped so the .txt file couldn't be created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants