Skip to content

Elasticsearch File System Crawler (FS Crawler) - vectorization

License

Notifications You must be signed in to change notification settings

Morphus1/fscrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File System Crawler for Elasticsearch

Welcome to the FS Crawler for Elasticsearch

This crawler helps to index binary documents such as PDF, Open Office, MS Office.

Main features:

  • Local file system (or a mounted drive) crawling and index new files, update existing ones and removes old ones.
  • Remote file system over SSH/FTP crawling.
  • REST interface to let you "upload" your binary documents to elasticsearch.

You need to install a version matching your Elasticsearch version:

Elasticsearch FS Crawler Released Docs
6.x, 7.x 2.10-SNAPSHOT 2.10-SNAPSHOT
6.x, 7.x 2.9 2022-01-10 2.9
6.x, 7.x 2.8 2021-12-13 2.8
6.x, 7.x 2.7 2021-08-05 2.7
2.x, 5.x, 6.x 2.6 2019-01-09 2.6
2.x, 5.x, 6.x 2.5 2018-08-04 2.5
2.x, 5.x, 6.x 2.4 2017-08-11 2.4
2.x, 5.x, 6.x 2.3 2017-07-10 2.3
1.x, 2.x, 5.x 2.2 2017-02-03 2.2
1.x, 2.x, 5.x 2.1 2016-07-26 2.1
es-2.0 2.0.0 2015-10-30 2.0.0

Build and Quality Status

Maven Central Build Documentation Status Code Quality: Java Total Alerts

Lines Duplicated Lines Maintainability Technical Debt Reliability

Vulnerabilities Bugs Quality Gate Code Smells Coverage

The guide has been moved to ReadTheDocs.

Contribute

Works on my machine - and yours ! Spin up pre-configured, standardized dev environments of this repository, by clicking on the button below.

Open in Gitpod

License

Read more about the License.

Thanks

Thanks to JetBrains for the IntelliJ IDEA License!

About

Elasticsearch File System Crawler (FS Crawler) - vectorization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 90.4%
  • Rich Text Format 4.3%
  • HTML 4.2%
  • Other 1.1%