Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nutch crawl using selenium plugin doesn't crawling data #3

Open
Yoganandh opened this issue Oct 6, 2015 · 0 comments
Open

Nutch crawl using selenium plugin doesn't crawling data #3

Yoganandh opened this issue Oct 6, 2015 · 0 comments

Comments

@Yoganandh
Copy link

Hi, I am using Nutch 1.10 version, Selenium 2.44.0 and Firefox 40.0.3 . I wanted to crawl dynamic contents of web pages. I have followed the instructions given in this link https://github.com/momer/nutch-selenium .
but when I execute the Nutchcrawl the process is executing. But when I try to take a dump from the segments it doesn't contain any data content. I am facing this issue only when I include the "protocol-selenium" plugin. Without this plugin I can able to crawl and I am getting the data content while dumping it. I don't know where am I going wrong please correct me and help me in this regard.
I am using the below command to start nutch to crawl:
$ bin/crawl /home/yoganandh/yoga/testnutch/apache-nutch-1.10/runtime/local/urls/seed.txt /home/yoganandh/yoga/testnutch/apache-nutch-1.10/runtime/local/crawl 2

nutch-selenium

dump command:
$ bin/nutch readseg -dump crawl/segments/20151006174816 dumpData1 -nocontent -nofetch -nogenerate -noparse -noparsedata

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant