A script which written in PHP, Will query in Elasticsearch and fetch the bulk data with help of the Scroll API. We can specify query in Query DSL syntax . Script does bulk write to the CSV file with the selected fields. It will export your data asynchronously. It will fetch data by forking child process and run them parallely. There is also class file Export.class.php, Which you can easily integrate with your custom script or application. It requires official Elasticsearch-PHP SDK to run.
- PHP version >= 5
- Elasticsesarch-PHP SDK - You can choose the version according to your ES.
- Install PHP-Elasticsearch SDK as mention here . You can find doc according to your ES versions Also check version compatibility from here
git clone https://github.com/ashishtiwari1993/elasticsearch-csv-export.git
.- Include PHP-Elasticsearch sdk's
/vendor/autoload.php
inprocess.php
. Like
require '/change-with-PHP-Elasticsearch-SDK-directory-path/vendor/autoload.php';
php process.php [--host HOSTNAME:PORT] [--index INDEX] [--type TYPE]
[--query QUERY] [--stm TIMEOUT] [--fields FIELD1,FIELD2]
[--size SIZE] [--csvfile CSVPATH] [--logfile LOGPATH] [--async NUMBER_OF_SLICE]
Optional argument
--host HOST:PORT Elasticsearch hostname with port e.g. example.com:9200 [required]
--index NAME Provide multiple Index name with comma seperated. [required]
--type TYPE Specified type
--query QUERY Query string in Lucene syntax
--stm TIME Value in seconds to open search context in Scroll API. Default
it set to 30.
--fields field1,field2 Specify Field need to fetch with comma sperated.
It won't work for nested fields. [required]
--size SIZE Per scroll api how much data should fetch in one call. By Default
it is set to 100.
--csvfile CSVPATH Path to csv file where to export. [required]
--logfile LOGFILEPATH Path to log file where log will be write.
--async 2 The maximum number of slices for scroll API. It will fork same
Number of child process.
- It is using Scroll API. You can find more here on Scroll APIs.
- You can specify
--query
using Query string OR Query DSL .It should be same as when you definequery
param in POST call when query to elasticsearch via curl. Check here for example. - For nested fileds you need to specify nested field value like
field.key
in--field
param. For example with this structure{info:[{name:php}]}
, I can access my keyname
with this wayinfo.name
. - For
--async
request, We usingpcntl_fork
to forking child processes. It will fetch all data asynchronously irrespective of any order or sorting.
php process.php --host 'localhost:9200' --index 'myindex1,myindex2' --type 'logs' --fields 'balance,firstname,gender,state,city' --stm 60 --size 500 --query '{"query":{"match":{"gender":"M"}}}' --csvfile '/home/ashish/records.csv' --logfile '/tmp/b.log' --async 2
This project is licensed under the MIT License - see the LICENSE file for details