Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consul_services_max_fails, consul_services_fail_timeout, fallback_peer and upsync_weight NodeMeta #200

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

eugenepaniot
Copy link

This patch add following changes:

  1. Backport changes from lyokha/nginx-upsync-module repository:
    1.1 Add consul_services_max_fails and consul_services_fail_timeout directive to control max_fails and fail_timeout values for backends
    1.2 Add fallback_peer directive. It allowes to delete all backends in upstream and set fallback_peer address marked as down as a once backend in upstream. Otherwise error "cannot delete all peers" will occure
  2. Add functionality to change backend weight based on consul agent node NodeMeta (available in Consul 0.7.3). Tag named as "upsync_weight". It could be usefull when you have different hardware configurtions in your environment. And could be more flexible than "weight" ServiceTag.
    For example: CPUs from Intel and AMD has different speed, and we should be able to balance requests based on CPU speed (based on node weight). More powerfull nodes will have higher weight and should be able to serve more requests
  3. Seed srandom PRNG with PID, seconds, and milliseconds
  4. Add current_weight, conns, max_conns and fails in show_upstream. It more usefull for backend status monitoring

1. Backport changes from lyokha/nginx-upsync-module repository:
1.1 Add consul_services_max_fails and consul_services_fail_timeout directive to control max_fails and fail_timeout values for backends
1.2 Add fallback_peer directive. It allowes to delete all backends in upstream and set fallback_peer address marked as down as a once backend in upstream. Otherwise error "cannot delete all peers" will occure
2. Add functionality to change backend weight based on consul agent node NodeMeta (available in Consul 0.7.3). Tag named as "upsync_weight". It could be usefull when you have different hardware configurtions in your environment. And could be more flexible than "weight" ServiceTag.
For example: CPUs from Intel and AMD has different speed, and we should be able to balance requests based on CPU speed (based on node weight). More powerfull nodes will have higher weight and should be able to serve more requests
3. Seed srandom PRNG with PID, seconds, and milliseconds
4. Add current_weight, conns, max_conns and fails in show_upstream. It more usefull for backend status monitoring
@xiaokai-wang
Copy link
Member

hi eugenepaniot,

Thanks your contributing, it's great. 

1.
  1.1 The directives is all about the conf-server(consul,etcd), it should be not config the backend
       servers, because the conf-server's function is to store the informations of all about backends. 
       I realize your code, the two directives are only as a default value?
  1.2 What's the purpose of fallback_peer? You don't want requests to the old servers? If it's that
        you can just mark it down?
2.
    I realize your usage scenario, but I don't recommend to add "upsync_weight" tag, I think it will 
    confused users. You can just tune the "weight" of servers based hardware environment.

The above is my opinion, it's welcome to be discussed.

About 3 and 4 I suggest a new commit.

Thanks again.

@eugenepaniot
Copy link
Author

1.1 The directives is all about the conf-server(consul,etcd), it should be not config the backend
servers, because the conf-server's function is to store the informations of all about backends.
I realize your code, the two directives are only as a default value?

Yes, these two directives upsync_consul_services_max_fails and upsync_consul_services_fail_timeout controlling default values then service haven't tagged.
On the other hands these directives could be used to control backend max_fails, fail_timeout values as a setting for IP on LoadBalancer (in these case max_fails, fail_timeout values could be a property of VIP, not a property of application). For example: Nginx as LoadBalancer service (see our arch diagram).

hlb-and-ccs

I realize your usage scenario, but I don't recommend to add "upsync_weight" tag, I think it will 
confused users. You can just tune the "weight" of servers based hardware environment.

Ok, agree. I'll remove it.

@eugenepaniot
Copy link
Author

1.2 What's the purpose of fallback_peer? You don't want requests to the old servers? If it's that
you can just mark it down?

When our application has 0 instances then upstream should be zeroed too. In our environment we are using mesos/marathon for containers orchestration. When container started on compute node, mesos assign random port for iptables NAT rules, in a worst scenario (we are facing on it few times because we have a huge conainers cluster) nginx proxying requests to wrong destination because upsteam haven't cleared.
For example:

  1. Starting two container-nginx instances on container port 80:
    1.1 Mesos assing ports 32001 and 32002 and create NAT rules:
    1. compute node1 IP port 32001 DNAT to container1 IP port 80
    2. compute node1 IP port 32001 DNAT to container2 IP port 80
  2. When suspend nginx application. Without this patch LoadBalancer nginx will store information about latest container-nginx instance (for example: compute node1 IP port 32001)
  3. Try to create another mesos/marathon application. For example Java
  4. Meso/Marathon will assign the same port (32001) as for container-nginx instance. And as a result traffic for container-nginx instances proxying to Java instance.

Copy link
Author

@eugenepaniot eugenepaniot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "if (tags == NULL)'

@xiaokai-wang
Copy link
Member

Yes, these two directives upsync_consul_services_max_fails and upsync_consul_services_fail_timeout controlling default values then service haven't tagged.
On the other hands these directives could be used to control backend max_fails, fail_timeout values as a setting for IP on LoadBalancer (in these case max_fails, fail_timeout values could be a property of VIP, not a property of application). For example: Nginx as LoadBalancer service (see our arch diagram).

Sorry eugenepaniot, you don't persuade me. The attributes of server can get from consul. If you haven't tagged, it seems you don't care so you haven't tagged.
Your purpose can be satisfied by setting max_fails and fail_timeout to the consul, right? I hope I understand your scenario.

@xiaokai-wang
Copy link
Member

When our application has 0 instances then upstream should be zeroed too.

The module is not support to delete all servers of one upstream, which is to avoid causing service useless.
The "fallback_peer" of your solution is to replace the deleted servers(all real servers), right?
That can be realized using "fallback_peer" to replace the all real servers on consul, which will sync to nginx too. This can satisfy you?

@eugenepaniot
Copy link
Author

The module is not support to delete all servers of one upstream, which is to avoid causing service useless.

But what if this is an expected behaviour? In my case I need to clear all backend list cause I suspend my application.

The "fallback_peer" of your solution is to replace the deleted servers(all real servers), right?

Yes, correct.

That can be realized using "fallback_peer" to replace the all real servers on consul, which will sync to nginx too. This can satisfy you?

You mean "fallback_peer" as a tag? But I won't have a service registration (and tags too) because I dont have a running service instances, or from KV ?

@gfrankliu
Copy link
Collaborator

We also have such cases where all upstream servers unregistered from consul. That's why we put a dummy "server" in the "upstream", eg:

upstream upstream_pool {
  server 127.0.0.1:11111 down;
  upsync ..
...

This is to make nginx happy, since nginx requires at least one server in the pool.

Why can't you use that for your fallback peer? It will get replaced once consul returns live servers.

2. Add max_stale GET parameter to service discovery request
3. Add wait GET parameter to service discovery request
@cai-personal
Copy link

为什么要提这个?

扩容:curl -X PUT -d '{"id":"111","name":"testnginx","address":"127.0.0.1","port":8001, "tags":["weight=4", "fail_timeout=13s", "max_fails=19"]}' http://127.0.0.1:8500/v1/agent/service/register

缩容:curl -X PUT -d '{"id":"111","name":"testnginx","address":"127.0.0.1","port":8001, "tags":["weight=4", "fail_timeout=13s", "max_fails=19", "down"]}' http://127.0.0.1:8500/v1/agent/service/register 或者 把服务下掉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants