[Bug]: start the overlay mesh before the routing is added #3227

atanas18 · 2024-11-27T10:16:06Z

What happened?

On a server reboot, the overlay mesh is started before the necessary ip route rules are added (to route that ip traffic trough the netmaker interface).
Because we have hundreds of mesh nodes under rfc1918 addresses (behind NAT), on a reboot of a public node (which is not behind NAT) start searching nodes in the overlay mesh trough rfc1918 addresses, which triggers Hetzner abuse for Netscan detected (because of hundreds requests to rfc1918). They don't like traffic on rfc1918 subnets over the public interface, and when the route is not up, in the beginning it's sending hundreds of TCP connections trying to reach the nodes under the NAT. Once the route is up, the problem is not happening, and traffic for rfc1918 is not send over the public interface anymore.
This behavior is happening after updating from 0.24.1 to 0.25.0, also happens on 0.26.0. On and before 0.24.1 we didn't have such problem. I guess something has changed between these versions.

Thanks.

Version

v0.25.0

What OS are you using?

Linux

Relevant log output

No response

Contributing guidelines

Yes, I did.

yabinma · 2024-11-28T10:11:26Z

@atanas18 can you please share more details?

Is the issue happened on Netmaker server side or netclient side?
Is the issue happened after Netmaker server restart or a client machine restart?
How the ip routes are added after the restart? netclient is up by system daemon, for example on Ubunutu, it's systemd.
Any screenshot or logs will be helpful for the investigation.
Thanks.

atanas18 · 2024-11-28T10:23:28Z

Hi @yabinma

happened on netclient side
on a client machine restart
systemd service
the log (from journalctl -u netclient):

Nov 27 08:53:33 systemd[1]: Starting netclient.service - Netclient Daemon...
Nov 27 08:53:50 systemd[1]: Started netclient.service - Netclient Daemon.
Nov 27 08:53:50 netclient[73807]: daemon called
Nov 27 08:53:50 netclient[73807]: [netclient] 2024-11-27 08:53:50 Starting firewall...
Nov 27 08:53:50 netclient[73807]: [netclient] 2024-11-27 08:53:50 iptables is supported
Nov 27 08:53:50 netclient[73807]: [netclient] 2024-11-27 08:53:50 adding forwarding rule
Nov 27 08:53:51 netclient[73807]: {"time":"2024-11-27T08:53:51.040946126Z","level":"ERROR","source":"daemon.go 229}","msg":"fail to pull config from server","error":"server config not found"}
Nov 27 08:53:51 netclient[73807]: [netclient] 2024-11-27 08:53:51 adding addresses to netmaker interface
Nov 27 08:54:52 netclient[73807]: [netclient] 2024-11-27 08:54:52 flushing netmaker rules...
Nov 27 08:54:53 netclient[73807]: [netclient] 2024-11-27 08:54:53 Starting firewall...
Nov 27 08:54:53 netclient[73807]: [netclient] 2024-11-27 08:54:53 iptables is supported
Nov 27 08:54:53 netclient[73807]: [netclient] 2024-11-27 08:54:53 adding forwarding rule
Nov 27 08:54:55 netclient[73807]: completed pull for server netmaker.domain.tld
Nov 27 08:54:55 netclient[73807]: [netclient] 2024-11-27 08:54:55 adding addresses to netmaker interface
Nov 27 08:54:55 netclient[73807]: [netclient] 2024-11-27 08:54:55 initialized endpoint detection on port 51821
Nov 27 08:54:55 netclient[73807]: [netclient] 2024-11-27 08:54:55 initialized endpoint detection on port 51821
Nov 27 08:55:00 netclient[73807]: [netclient] 2024-11-27 08:55:00 adding addresses to netmaker interface
Nov 27 08:55:53 netclient[73807]: [netclient] 2024-11-27 08:55:53 adding addresses to netmaker interface

currently this is v0.26.0 version client.

thanks

yabinma · 2024-11-28T10:37:58Z

@atanas18 , if the ip route change is managed by systemd service as well, can you please try to add the dependency in systemd configuration file? /etc/systemd/system/netclient.service, adding your systemd service in After= section, so that the netclient is started after your ip route change service.

atanas18 · 2024-11-28T10:46:30Z

@yabinma ah, maybe I didn't understand your 3rd question correctly. I meant the ip route that netclient is adding, not that I add custom routes myself.
This one:

172.16.10.0/24 dev netmaker proto kernel scope link src 172.16.10.11

yabinma · 2024-11-28T10:52:17Z

@yabinma ah, maybe I didn't understand your 3rd question correctly. I meant the ip route that netclient is adding, not that I add custom routes myself. This one:

172.16.10.0/24 dev netmaker proto kernel scope link src 172.16.10.11

@atanas18 , my bad, I may misunderstanding the issue.

172.16.10.0/24 dev netmaker proto kernel scope link src 172.16.10.11, yes, this route is added after the netmaker interface up.
How it's impacted in the case? Help me understand the issue please.

atanas18 · 2024-11-28T15:02:28Z

I sit down and rethink the situation .. and I think I mislead you somehow.
As we have many clients behind nat (but let's say the current client is not in that nat, it only has public IP) ... netclient tries to make many connections to the rfc1918 ips to reach the netclients behind the nat (and it cannot and shouldn't do that). I think it's better if the current netclient first check if we have an interface (or a route) from that rfc1918 subnets, to try to reach them, otherwise it's no point to try to reach them as there is no port forwarding on the NATs public IP to the clients behind the nat .. and only the punch hole is working once the client behind the NAT try to reach the current client.
I hope you understand me. I think this is the real problem.

yabinma · 2024-11-28T15:29:55Z

@atanas18 , hope my understanding is correct.
By default, all the hosts registered in the same network, it's designed to be connected each other, whatever behind a firewall/NAT or with public ip.
But as your requirement, it may be fulfilled with the ACL feature.
All the resources are able to communicate each other by default because of the default policy.

You may create a customized policy, to tag all the nodes behind NAT as a group and to allow them to be able to communicate each other. For the nodes with public ip, to setup a different policy.
This way, you may control the access among all the nodes in the network.

CC @abhishek9686

atanas18 · 2024-11-28T15:55:40Z

Well, in the end they must all communicate between each other, it's just in the beginning the nodes with only public IPs should not try to reach the ones with only private IPs :) that's my point. When the nodes with private IP try to reach the public, the punch hole is enough to have the communication both ways. But when the node with only public IP try to reach the ones with private IP is no go anyway.. it can't reach it for sure .. it just generates traffic which Hetzner catch and flag as portscanner and send an abuse email. There's really no point to try to reach rfc1918 IPs if you only have public IP interface (except the interface of the netmaker of course this one can be rfc1918 but either way it shouldn't try do the first try to reach the clients trough itself).

Is your proposal going to work for what I'm trying to explain? Can I make the group behind NAT be able to communicate with EVERYTHING, and then nodes with public IPs only to be able to also communicate with the ones behind NAT but only and ONLY after the ones behind NAT initiate the punch hole? The nodes with public IPs should not try to reach the ones behind NAT before that (otherwise as I say it's generating an abuse ticket on Hetzner, which I have to reply and explain the situation so they do not shut off the server... without explanation that's their final decision - shutting of the server).

Thanks.

yabinma · 2024-11-28T16:12:21Z

@atanas18 , what traffic is captured before the netmaker interface up? Is there source ip, destination ip in the Hetzner scan?

As I checked the code, there is no peer communication before the netmaker interface up.
After the netmaker interface up, there is an endpoint detection. But it's after the interface up,not before.
You may disable endpoint detection and check it again. Setting ENDPOINT_DETECTION=false in netmaker.env file on Netmaker server side, and then restart the netmaker server and check it again.

atanas18 · 2024-11-29T04:56:08Z

Doesn't the ENDPOINT_DETECTION affect only the server? Will check the documentation about that. The problem for me is on the clients side for sure.
Also, I'm not sure if this happens before the netmaker interface up .. It could happen after the interface up, it's just that as Hetzner clients are not in the same 10.0.0.0/16 NAT range, the Hetzner client shouldn't try to connect with these 10.0.0.0/16 IPs to the clients behind the NAT.

abhishek9686 · 2024-11-29T07:17:34Z

Well, in the end they must all communicate between each other, it's just in the beginning the nodes with only public IPs should not try to reach the ones with only private IPs :) that's my point. When the nodes with private IP try to reach the public, the punch hole is enough to have the communication both ways. But when the node with only public IP try to reach the ones with private IP is no go anyway.. it can't reach it for sure .. it just generates traffic which Hetzner catch and flag as portscanner and send an abuse email. There's really no point to try to reach rfc1918 IPs if you only have public IP interface (except the interface of the netmaker of course this one can be rfc1918 but either way it shouldn't try do the first try to reach the clients trough itself).

Is your proposal going to work for what I'm trying to explain? Can I make the group behind NAT be able to communicate with EVERYTHING, and then nodes with public IPs only to be able to also communicate with the ones behind NAT but only and ONLY after the ones behind NAT initiate the punch hole? The nodes with public IPs should not try to reach the ones behind NAT before that (otherwise as I say it's generating an abuse ticket on Hetzner, which I have to reply and explain the situation so they do not shut off the server... without explanation that's their final decision - shutting of the server).

Thanks.

@atanas18 client will update it's peer endpoint to private IPs, only if it's able to communicate over it otherwise it uses public IP

yabinma · 2024-11-29T08:29:29Z

Doesn't the ENDPOINT_DETECTION affect only the server? Will check the documentation about that. The problem for me is on the clients side for sure. Also, I'm not sure if this happens before the netmaker interface up .. It could happen after the interface up, it's just that as Hetzner clients are not in the same 10.0.0.0/16 NAT range, the Hetzner client shouldn't try to connect with these 10.0.0.0/16 IPs to the clients behind the NAT.

ENDPOINT_DETECTION is a server side settings, but it will be cascaded to client side, and then to change the netclient behavior.

atanas18 added the bug Something isn't working label Nov 27, 2024

atanas18 assigned afeiszli Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: start the overlay mesh before the routing is added #3227

[Bug]: start the overlay mesh before the routing is added #3227

atanas18 commented Nov 27, 2024 •

edited

Loading

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024 •

edited

Loading

yabinma commented Nov 28, 2024 •

edited

Loading

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 29, 2024 •

edited

Loading

abhishek9686 commented Nov 29, 2024

yabinma commented Nov 29, 2024

[Bug]: start the overlay mesh before the routing is added #3227

[Bug]: start the overlay mesh before the routing is added #3227

Comments

atanas18 commented Nov 27, 2024 • edited Loading

What happened?

Version

What OS are you using?

Relevant log output

Contributing guidelines

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024 • edited Loading

yabinma commented Nov 28, 2024 • edited Loading

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 28, 2024

yabinma commented Nov 28, 2024

atanas18 commented Nov 29, 2024 • edited Loading

abhishek9686 commented Nov 29, 2024

yabinma commented Nov 29, 2024

atanas18 commented Nov 27, 2024 •

edited

Loading

atanas18 commented Nov 28, 2024 •

edited

Loading

yabinma commented Nov 28, 2024 •

edited

Loading

atanas18 commented Nov 29, 2024 •

edited

Loading