-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing event "Empty Notification" in "hono.event.${tenant_id}" Kafka topic when a larger number of devices are disconnected from MQTT adapter at the same time #3655
Comments
The @petr-cada It would be helpful, if you could provide DEBUG log output of the MQTT adapter and Command Router pods of such a case. (In the Hono Helm chart, DEBUG logs are enabled if you haven't set the |
You can find the logs from MQTT adapter and Command Router pods at the end of this post. The logs are from today's reproduction of the issue. Just summary how I reproduced the issue today.
Look at the images
The provided log files contains logs from MQTT adapter and Command Router pods from that time. Just don't be confused by the timestamps. Log files List of all devices which appeared as connected (in reality they were disconnected) |
In the MQTT adapter logs I only see corresponding messages for 6 devices:
How are you performing the disconnect? Are you sending an MQTT DISCONNECT Packet or just closing the socket? In the latter case, it may take a while until the MQTT adapter notices that the client isn't connected any more (we had such scenarios in #2244 for example). This also depends on the MQTT keep-alive value reported by your MQTT client. Another thing: |
Regarding section "How are you performing the disconnect? ..." In this test scenario MQTT DISCONNECT Packet should not be sent from a device. But it should not matter. I think the wrong detection of device disconnection is not problem of mqtt adapter because as you wrote in your first response "In contrast to that, the "Connection Event" messages get sent by the MQTT adapter directly.". Regarding section "In the Command Router logs, I see no log messages ..." There is only one instance of Command Router service (one pod) in testing environment where the test was executed. |
Then it seems to me that the interesting parts of the log output possibly came after the time period covered in the attached log files. A possible cause for the issue here could be that there are errors in communication between the MQTT adapter and the command router because of the high volume of requests at a time. You could try adapting some configuration values:
The corresponding property of the Helm chart values.yaml would be In the MQTT adapter deployment YAML, you could add an env variables with
The corresponding property of the Helm chart values.yaml would be As a general fix in Hono, it should be considered to add some sort of retry mechanism in case of "no credit" errors when unsubscribing devices and/or adding batch requests for unsubscribing (see #3445). |
After a testing it seems that increasing value of HONO_COMMANDROUTER_AMQP_RECEIVERLINKCREDIT from 100 to 300 fixed the issue (sudden disconnection of 200 devices is now working correctly). I guess that increasing of HONO_COMMANDROUTER_AMQP_RECEIVERLINKCREDIT can also lead to increased consumption of memory. So I increased hono.commandRouterService.resources.requests.memory (in values.yaml) from default value 256Mi to 512Mi. Do you have any idea how far can I go with HONO_COMMANDROUTER_AMQP_RECEIVERLINKCREDIT value (whether e.g. 500 is safe) based on this newly "guaranteed" value (512Mi) of memory. Thank you |
On the AMQP link between a protocol adapter and the command router, only messages concerning the management of the command subscriptions get sent (not command(response) messages itself). These management messages are usually very small. Therefore, I wouldn't expect much of a memory usage increase with the 500 link credit value. You could check the memory metrics of the pods during a test run to get actual values.
|
Thank you for the information. We had already set HONO_COMMANDROUTER_REQUESTTIMEOUT to "10000" before we reported the issue, which I believe is more than enough (we didn't see any timeout error with this setting). In our case, increasing HONO_COMMANDROUTER_AMQP_RECEIVERLINKCREDIT resolved the issue. I will close this issue as resolved. |
We are using Eclipse Hono in our product for connecting devices via MQTT (hono-adapter-mqtt). We are installing Eclipse Hono to our Kubernetes cluster via Helm chart:
dependencies:
version: 2.6.3
repository: "https://eclipse.org/packages/charts/"
We are also using Eclipse Ditto in our product. Hono and Ditto are communicating with each other via Kafka.
We found out that sometimes devices have wrong data in ConnectionStatus feature in Ditto. Value in readyUntil not corresponding the real state of MQTT connection of device (device is disconnected but it looks like connected). We are using ConnectionStatus mapper https://eclipse.dev/ditto/connectivity-mapping.html#connectionstatus-mapper to automatically set ConnectionStatus (based on Eclipse Hono device notifications).
After some investigation we found out following.
When we disconnect larger number of devices (e.g. 100) from Hono MQTT adapter at the same time the "Empty Notification" https://eclipse.dev/hono/docs/api/event/#empty-notification with "ttd":"0" is not present in "hono.event.${tenant_id}" Kafka topic for some devices (it happens only sometimes).
Just for curriosity we enabled "Connection Event" https://eclipse.dev/hono/docs/api/event/#connection-event for Hono MQTT adapter. The interesting is that "Connection Event" with "cause": "disconnected" is always present in "hono.event.${tenant_id}" Kafka topic.
As you can see in the picture bellow the "Empty Notification" with "ttd":"0" is missing after "Connection Event" with "cause": "disconnected".
The consequence of this is that "ConnectionStatus" feature in Ditto does not contain proper values. To put it simply the device in Ditto seems as connected even though it is disconnected.
We did not found any error in Hono MQTT adapter and Hono command router services which will lead us to cause of this problem.
The text was updated successfully, but these errors were encountered: