Efficiently manage and query vast log data volumes with a scalable Log Ingestor and Query Interface, featuring real-time ingestion, advanced filtering, and a user-friendly interface.
Table of Contents
This is an efficient implementation of a log ingestion & analysis solution. It utilises Golang & Elastic DB to efficiently ingest, store & analyse logs. It can accept logs from two sources, Kafka queue or direct HTTP API requests. It also provides a web UI for analysis of logs using different filters & search queries.
- Text-search across all fields
- Regular expression search on all fields
- Filters on specific fields
- Search in given time range
- Combination of all filters & search
- HTTP endpoint for posting logs
- Kafka queue for streamlined Log processing
- Ingestion Buffer & Batch processing
- Efficient search queries leveraging Elastic DB
- Scalable & Efficient processing using sharding provided by Elastic DB
- Python
- Node JS & NPM
- Golang
- Kafka
- Elastic Search
-
Clone the repo
git clone https://github.com/dyte-submissions/november-2023-hiring-OmkarPh cd november-2023-hiring-OmkarPh/
-
Setup Log ingestion server
-
Go to
log-server
directorycd log-server/
-
Install golang dependencies
go mod download
-
Install Python dependencies
pip install -r requirements.txt
-
Start the ingestion server
cd cmd GIN_MODE=release go run .
The server should now be running on http://localhost:3000.
-
Simulate huge amount of sample logs simultaneously Configure
LOGS_LENGTH
inlog-server/tests/performance_test.py
. Default value: 3000python tests/performance_test.py
-
-
Setup Web UI
- Go to
frontend
directorycd frontend/
- Install NPM dependencies
npm install
- Start the React app
npm start
- View the app here - http://localhost:3006
- Go to
-
Simulate Log Publishers (Optional)
-
Start zookeepr
zookeeper-server-start //path-to-kafka-config/zoo.cfg
-
Start Kafka
kafka-server-start /path-to-kafka-config/server.properties
-
Start publisher script Go to
log-producers
directorycd log-server/log-producers
Start a producer to simulate service using
--topic
option in different shells. Configured topics:auth
,database
,email
,payment
,server
,services
,# Example: python producer.py --topic payment python producer.py --topic auth
-
-
Endpoint:
POST /
-
Description: Ingests a new log entry into the system.
Request Example:
{ "level": "error", "message": "Failed to connect to DB", "resourceId": "server-1234", "timestamp": "2023-09-15T08:00:00Z", "traceId": "abc-xyz-123", "spanId": "span-456", "commit": "5e5342f", "metadata": { "parentResourceId": "server-0987" } }
Response Example:
{ "status": "success" }
-
Endpoint:
GET /logs-count
-
Description: Retrieves the count of logs stored in Elasticsearch.
Response Example:
{ "count": 5286 }
-
Endpoint:
POST /search-logs
-
Description: Searches for logs based on specified parameters. All the filter params, search text & time range are optional.
Request Example:
{ "text": "email", "regexText": "jkl-*", "filters": [ { "columnName": "level", "filterValues": ["error", "warning"] }, { "columnName": "resourceId", "filterValues": ["user-123"] }, { "columnName": "metadata.parentResourceId", "filterValues": ["9876", "1234"] }, ... Other columns ], "timeRange": { "startTime": "2023-11-19T00:00:00Z", "endTime": "2023-11-19T23:59:59Z" } }
Response Example:
{ "hits": { "total": 5, "hits": [ { "_id": "1", "_source": { "level": "error", "message": "Database connection error", "resourceId": "user-123" // ... (other log fields) } } // Additional log entries ] } }
- Elasticsearch Index Name:
log-ingestor
- Server Port:
:3000
- CORS Configuration: Allows all origins (
*
) and supports HTTP methods: GET, POST, PUT, DELETE. - Concurrency Configuration:
- Buffered log channel with a default capacity of 5000 logs. (Can be changed via
logsBufferSize
) - Maximum concurrent log processing workers: 20.
- Buffered log channel with a default capacity of 5000 logs. (Can be changed via
- Kibana integration for better visual analysis
- Persistent TCP or Web socket connection between servers (log producer) and log ingestor for lower latency
- Redundant disk buffer for reliability
- Alert system to notify administrators or users about critical log events in real-time.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Omkar Phansopkar - @omkarphansopkar - [email protected]