[alibaba/tengine]ngx_http_dyups_module和ngx_http_upstream_check_module不能同时生效

2024-07-12 322 views
1
  1. 使用ngx_http_dyups_module模块,在节点宕机后会间隔丢失请求

  2. 某个服务节点宕机后,ngx_http_upstream_check_module健康检查没有生效,每10秒会给宕机节点发一个请求,nginx日志显示返回code是502;在使用dyups模块前是正常的

  3. 期望结果:节点宕机后,在恢复(tcp心跳正常)前都不给转发请求到宕机节点

  4. 重现过程:upstream配置如下,kill进程(10.110.14.130:7347),连续请求im-test.lietou.com

file: /usr/local/nginx/conf/upstream.conf
upstream  my-test.lietou.com {
     server 10.110.14.206:7347;
     server 10.110.14.130:7347;
     check interval=2000 rise=1 fall=2 timeout=1000 type=tcp;
}
server {
    listen 80;
    server_name im-test.lietou.com;
    proxy_next_upstream off;
    set $access_filter "on";
    set $ups  "my-test.lietou.com";
    location /{
        proxy_pass http://$ups;
    }
}
  1. access日志
10.110.252.55 [26/Apr/2019:23:17:10 +0800] GET /publish/error?editMode=1&code=200 HTTP/1.1 "200" 1077 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" "-" 

... 200 status ... snipped...

"502" 7590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" "-" "-" "0.002" "0.002" "-" 10.110.14.130:7347 "im-test.lietou.com" "-" "-" "-" "http" "-"
10.110.252.55 [26/Apr/2019:23:17:16 +0800] GET /publish/error?editMode=1&code=200 HTTP/1.1 ..

...200 status ... snipped...

"502" 7590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" "-" "-" "0.001" "0.001" "-" 10.110.14.130:7347 "im-test.lietou.com" "-" "-" "-" "http" "-"
10.110.252.55 [26/Apr/2019:23:17:32 +0800] GET /publish/error?editMode=1&code=200 HTTP/1.1 

errorr日志:

2019/04/26 23:17:00 [error] 7006#0: *256491 connect() failed (111: Connection refused) while connecting to upstream, client: 10.110.252.55, server: im-test.lietou.com, request: "GET /publish/error?editMode=1&code=200 HTTP/1.1", upstream: "http://10.110.14.130:7347/publish/error?editMode=1&code=200", host: "im-test.lietou.com"
2019/04/26 23:17:14 [error] 7006#0: *256491 connect() failed (111: Connection refused) while connecting to upstream, client: 10.110.252.55, server: im-test.lietou.com, request: "GET /publish/error?editMode=1&code=200 HTTP/1.1", upstream: "http://10.110.14.130:7347/publish/error?editMode=1&code=200", host: "im-test.lietou.com"
2019/04/26 23:17:27 [error] 7006#0: *256491 connect() failed (111: Connection refused) while connecting to upstream, client: 10.110.252.55, server: im-test.lietou.com, request: "GET /publish/error?editMode=1&code=200 HTTP/1.1", upstream: "http://10.110.14.130:7347/publish/error?editMode=1&code=200", host: "im-test.lietou.com"
  1. 环境:tengine-2.2.1,centos7

回答

2

@liutpmars I have not found any place to use the dyups module from the configuration you gave.

1

@liutpmars I have not found any place to use the dyups module from the configuration you gave.

@wangfakang sorry, I forgot the nginx.conf file. file : /usr/local/nginx/conf/nginx.conf

user  nobody;
worker_processes  1;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr     [$time_local]   $request        "$status"       $body_bytes_sent        "$http_referer"'
        '       "$http_user_agent"      "$http_x_forwarded_for" "$cookie_UniqueKey"     "$request_time"'
        '       "$upstream_response_time"       "-"     $upstream_addr  "$http_host"    "$http_X_Requested_With"        "$cookie__e_ld_auth_"'
                      ' "$cookie__h_ld_auth_"   "$scheme"       "$http_X_Alt_Referer"';

    log_format  proxy '$remote_addr [$time_local] $request "$status" $body_bytes_sent "$http_accept_encoding"'
                  '"$content_type" "$http_user_agent" "$http_x_forwarded_for"'
                  '"$request_time" "$upstream_addr" "$upstream_status" "$upstream_response_time" "$request_body"';

     access_log  logs/access.log  main;
    #access_log  "pipe:rollback logs/access_log interval=1d baknum=7 maxsize=2G"  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    # default
    server {
        listen       80;
        server_name  localhost;
        # proxy the PHP scripts to Apache listening on 127.0.0.1:80
        #
        location / {
            proxy_pass   http://127.0.0.1;
        }
    }
    # test file
    include upstream.conf;
    server {
        listen 9081;
        location / {
            dyups_interface; # 这个指令表示这边是接口站点
        }
    }
}
9

hi @liutpmars

  1. show your compilation options ( ./configure command), (Or use sbin/nginx -V).
  2. show your produced step (How to reproduce it (as minimally and precisely as possible)), every command to produce this issue
9

I have tried to reproduce this issue, but without success.

My steps as following:

  1. two nginx upstream servers and start two nginx master

First nginx server:

    server {
        listen       7346;
        server_name  localhost;
        return 200 "OK ! 7346\n";
    }

Second nginx server:

    server {
        listen 7345;
        return 200 "OK 7345\n";
    }
  1. dyups+upstream-check tengine-2.2.1
./configure --prefix=$(pwd)/output   --with-http_ssl_module  \
    --with-http_lua_module \
    --with-http_upstream_check_module \
    --with-http_dyups_module \
    --with-http_dyups_lua_api \
    --with-cc-opt="-I/usr/local/opt/openssl/include -O0 -g" \
    --with-ld-opt="-L/usr/local/opt/openssl/lib" \
    --with-debug

tengine conf:

    req_status_zone server "$host,$server_addr:$server_port" 10M;

    upstream  ups {
        server 127.0.0.1:7345;
        server 127.0.0.1:7346;
        check interval=2000 rise=1 fall=2 timeout=1000 type=tcp;
    }

#    dyups_read_msg_log off;
#    dyups_read_msg_timeout 1h;

    server {
        listen 8081;

        location / {
            set $ups ups;
            proxy_pass http://$ups;
        }

        location /us {
            req_status_show;
        }
        req_status server;
    }

    server {
        listen 8082;

        location / {
            dyups_interface;
        }
    }
  1. send request to port 8081 per 1s
$ while [ true ] ; do date; curl localhost:8081; sleep 1; done
  1. kill port 7345 server , and step 3. is not stopped

  2. cannot find 5xx request from access.log

2

关注,有什么新的进展吗?

0

关注一下