Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Health status #131

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Node Health status #131

wants to merge 1 commit into from

Conversation

YAMISHKA02
Copy link

Added one pannel on dashboard with Node health/unhealth status.
Its based on messages from node, produced last 5 minutes.
image
image

added node Health status based on messages in last 5 mins
@fryorcraken fryorcraken requested a review from a team October 4, 2024 03:16
@fryorcraken
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

@NagyZoltanPeter
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.

@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

@YAMISHKA02
Copy link
Author

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.

@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

Hello, the best way is of course to add something familiar with checkhlth.sh

Can you please send me link to file which is reference of metrics exporter? I can modify this file to add new metrics, exported by this.

@NagyZoltanPeter
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.
@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

Hello, the best way is of course to add something familiar with checkhlth.sh

Can you please send me link to file which is reference of metrics exporter? I can modify this file to add new metrics, exported by this.

@YAMISHKA02 : Sorry for not answering yet. I'm afraid there is no single link I can point to as the health status of a node - if I'm thinking of a continuous report of it - consisting of several properties. We need to think of what is worth measuring. Currently chkhealth.sh is mainly to support node ops about the boot status of the node, because the very first boot with RLN sync can take a while and that was misunderstood in many ways. So of course there is plenty of room for improvement, I believe it will come into scope shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants