Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
This repository was archived by the owner on Jan 30, 2020. It is now read-only.

Random failure to submit units when etcd endpoint not specified #1650

@waddles

Description

@waddles

I have built a new environment with a cluster of 3 etcd servers and a build server running etcd in proxy mode:

root@tree:~# /usr/local/sbin/etcd -version
etcd Version: 2.3.3
Git SHA: c41345d
Go Version: go1.6.2
Go OS/Arch: linux/amd64
root@tree:~# /usr/local/sbin/etcd --proxy on \
 -initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 \
 -listen-client-urls http://172.17.42.1:2379,http://127.0.0.1:2379

In the cluster, starting with a clean registry, I run etcd (tried several versions including 2.3.3) using relevant options on all 3 servers:

root@elf:~# rm -fr /var/lib/etcd2/
root@elf:~# /tmp/etcd-2.3.6/etcd -initial-cluster-token mytoken \
 -initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 \
 -initial-cluster-state new -initial-advertise-peer-urls http://10.1.1.3:2380 \
 -advertise-client-urls http://10.1.1.3:2379 \
 -listen-client-urls http://127.0.0.1:2379,http://10.1.1.3:2379 \
 -listen-peer-urls http://127.0.0.1:2380,http://10.1.1.3:2380 \
 -data-dir /var/lib/etcd2 -name elf \
 -heartbeat-interval 600 -election-timeout 6000 -debug > /tmp/etcd.log 2>&1

And wait a few seconds for it to become healthy:

# etcdctl --debug --endpoints=http://10.1.1.3:2379 cluster-health
Cluster-Endpoints: http://10.1.1.3:2379
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
member b5fa9b282334905d is healthy: got healthy result from http://10.1.1.4:2379
member bdd359f77713beb4 is healthy: got healthy result from http://10.1.1.3:2379
member d45ce0c430449c65 is healthy: got healthy result from http://10.1.1.5:2379
cluster is healthy

Then I try to submit a unit from my build server using fleetctl:

root@tree:/tmp/fleet-v0.11.7-linux-amd64# while :; do 
 ./fleetctl --debug submit lb_feedcache_1002@.service && sleep 1;
 ./fleetctl --endpoint=http://10.1.1.3:2379 --driver=etcd --debug destroy lb_feedcache_1002@.service; sleep 1;
 done
2016/07/26 12:45:05 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 404 Not Found
2016/07/26 12:45:05 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:45:05 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 201 Created
2016/07/26 12:45:05 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/26 12:45:05 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 200 OK
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:45:07 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 404 Not Found
2016/07/26 12:45:07 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:45:07 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 201 Created
2016/07/26 12:45:07 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/26 12:45:07 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:07 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:08 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:08 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:08 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:08 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:08 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:08 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:09 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:09 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:09 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:09 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:10 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:10 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
^C^C

As above, it first accepted the unit then on 2nd try, it went into a loop reporting response code 500 from the cluster, yet the keys were correctly created in etcd. See https://gist.github.com/waddles/0e121d46c0499eaaef9685eef06120f0 for more info.

HOWEVER

I see quite a different output and no errors when I specify an endpoint in the cluster to send to (otherwise it is the same command):

root@tree:/tmp/fleet-v0.11.7-linux-amd64# while :; do
 ./fleetctl --endpoint=http://10.1.1.4:2379 --debug submit lb_feedcache_1002@.service && sleep 1;
 ./fleetctl --endpoint=http://10.1.1.3:2379 --driver=etcd --debug destroy lb_feedcache_1002@.service; sleep 1;
 done
2016/07/26 12:51:29 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:29 DEBUG fleetctl.go:612: Found Unit(lb_feedcache_1002@.service) in Registry, no need to recreate it
2016/07/26 12:51:29 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:31 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:31 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:31 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:33 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:33 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:33 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:35 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:35 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:35 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:37 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:37 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:37 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:39 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:39 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:39 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:41 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:41 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:41 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:43 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:43 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:43 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:45 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:45 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:45 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:47 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:47 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:47 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
^C
  1. I never seem to get a failure when the endpoint is specified in the submit command.
  2. I can specify --endpoint=http://127.0.0.1:2379 to force it to use the local etcd proxy and that also succeds reliably.
  3. We also have a similar setup using older versions and legacy ports 4001 and 7001 which does not exhibit this behaviour.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions