This repository was archived by the owner on Jan 30, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 299
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
Random failure to submit units when etcd endpoint not specified #1650
Copy link
Copy link
Open
Description
I have built a new environment with a cluster of 3 etcd servers and a build server running etcd in proxy mode:
root@tree:~# /usr/local/sbin/etcd -version
etcd Version: 2.3.3
Git SHA: c41345d
Go Version: go1.6.2
Go OS/Arch: linux/amd64
root@tree:~# /usr/local/sbin/etcd --proxy on \
-initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 \
-listen-client-urls http://172.17.42.1:2379,http://127.0.0.1:2379
In the cluster, starting with a clean registry, I run etcd (tried several versions including 2.3.3) using relevant options on all 3 servers:
root@elf:~# rm -fr /var/lib/etcd2/
root@elf:~# /tmp/etcd-2.3.6/etcd -initial-cluster-token mytoken \
-initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 \
-initial-cluster-state new -initial-advertise-peer-urls http://10.1.1.3:2380 \
-advertise-client-urls http://10.1.1.3:2379 \
-listen-client-urls http://127.0.0.1:2379,http://10.1.1.3:2379 \
-listen-peer-urls http://127.0.0.1:2380,http://10.1.1.3:2380 \
-data-dir /var/lib/etcd2 -name elf \
-heartbeat-interval 600 -election-timeout 6000 -debug > /tmp/etcd.log 2>&1
And wait a few seconds for it to become healthy:
# etcdctl --debug --endpoints=http://10.1.1.3:2379 cluster-health
Cluster-Endpoints: http://10.1.1.3:2379
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
member b5fa9b282334905d is healthy: got healthy result from http://10.1.1.4:2379
member bdd359f77713beb4 is healthy: got healthy result from http://10.1.1.3:2379
member d45ce0c430449c65 is healthy: got healthy result from http://10.1.1.5:2379
cluster is healthy
Then I try to submit a unit from my build server using fleetctl:
root@tree:/tmp/fleet-v0.11.7-linux-amd64# while :; do
./fleetctl --debug submit lb_feedcache_1002@.service && sleep 1;
./fleetctl --endpoint=http://10.1.1.3:2379 --driver=etcd --debug destroy lb_feedcache_1002@.service; sleep 1;
done
2016/07/26 12:45:05 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 404 Not Found
2016/07/26 12:45:05 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:45:05 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 201 Created
2016/07/26 12:45:05 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/26 12:45:05 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:05 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 200 OK
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:45:07 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 404 Not Found
2016/07/26 12:45:07 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:45:07 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 201 Created
2016/07/26 12:45:07 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/26 12:45:07 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:07 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:07 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:08 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:08 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:08 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:08 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:08 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:08 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:09 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:09 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:09 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
2016/07/26 12:45:09 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json
2016/07/26 12:45:10 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb_feedcache_1002%40.service?alt=json 500 Internal Server Error
2016/07/26 12:45:10 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}}
^C^C
As above, it first accepted the unit then on 2nd try, it went into a loop reporting response code 500 from the cluster, yet the keys were correctly created in etcd. See https://gist.github.com/waddles/0e121d46c0499eaaef9685eef06120f0 for more info.
HOWEVER
I see quite a different output and no errors when I specify an endpoint in the cluster to send to (otherwise it is the same command):
root@tree:/tmp/fleet-v0.11.7-linux-amd64# while :; do
./fleetctl --endpoint=http://10.1.1.4:2379 --debug submit lb_feedcache_1002@.service && sleep 1;
./fleetctl --endpoint=http://10.1.1.3:2379 --driver=etcd --debug destroy lb_feedcache_1002@.service; sleep 1;
done
2016/07/26 12:51:29 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:29 DEBUG fleetctl.go:612: Found Unit(lb_feedcache_1002@.service) in Registry, no need to recreate it
2016/07/26 12:51:29 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:31 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:31 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:31 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:33 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:33 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:33 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:35 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:35 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:35 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:37 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:37 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:37 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:39 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:39 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:39 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:41 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:41 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:41 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:43 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:43 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:43 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:45 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:45 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:45 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
2016/07/26 12:51:47 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/26 12:51:47 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/26 12:51:47 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
Destroyed lb_feedcache_1002@.service
^C
- I never seem to get a failure when the endpoint is specified in the submit command.
- I can specify
--endpoint=http://127.0.0.1:2379to force it to use the local etcd proxy and that also succeds reliably. - We also have a similar setup using older versions and legacy ports 4001 and 7001 which does not exhibit this behaviour.