Skip to content

Old NSM interfaces might remain in proxy after OOM kill #407

@zolug

Description

@zolug

Describe the bug
When proxy process gets killed by OOM, there's no chance for the proxy to close them obviously. However, NSM might not be able to clean them up either when the MaxTokenLifetime expires due to rpc error: code = PermissionDenied desc = no sufficient privileges.

The old interfaces do not seem to cause any problem, but they are confusing. (Their IPs are also leaked.)
The connected peer interface in LB also remains. (and can be used to reach the proxy).

To Reproduce
Steps to reproduce the behavior:

  1. Kill a proxy process using KILL signal. (For example from its worker.)
  2. Wait the container to recover and check the NSM interfaces towards LBs.
  3. Wait MaxTokenLifetime (10 minutes by default) and check the interfaces agains. If they are present check the logs of the local nsmgr or forwarder for the Close.

Expected behavior
The interfaces should be event removed (probably by NSM).
(Resolving the leaked IP issue sound much more complex.)

Context

  • Network Service Mesh: v1.6.2

  • Meridio: 115ca87
    Author: Lionel Jouin lionel.jouin@est.tech
    Date: Fri Mar 24 19:25:56 2023 +0100

    Use Helm chart to deploy e2e operator env

    • Helm charts are now used in the kind-operator test env
    • Enable Meridio and TAPA versions in the test scope
    • Set hostPID to true in vpp-forwarder

Logs

Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(2.1)    close={"id":"4967924d-fc4b-42df-b071-20824f7fa158","network_service":"load-balancer.trench-a.default","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026532326","name":"load-balan-fb82"}},"context":{"ip_context":{"src_ip_addrs":["172.16.3.3/24","fd00:0:0:3::3/64"],"dst_ip_addrs":["172.16.3.2/24","fd00:0:0:3::2/64"],"excluded_prefixes":["11.0.0.0/22","11.0.4.0/23","11.0.6.0/24","12.0.0.1/32","12.0.27.146/32","12.0.116.178/32","12.0.155.217/32","12.0.184.219/32","12.0.201.243/32","12.0.250.201/32"]},"MTU":1446},"labels":{"nodeName":"vm-002"},"path":{"index":2,"path_segments":[{"name":"proxy-trench-a-cfqvk","id":"fb82d6e2-ecaf-496e-8072-0c90da0ddfd1-proxy-trench-a-cfqvk-load-balancer.trench-a.default-load-balancer-trench-a-7c885b6d9b-xgvx8","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTY4MjAwODE3NH0.0QtsWuA3vEKWj0dEwIkkCgPu5Rf9uBD5hzT50GAqnUcWrQ0P_XfEZyCbX-Z7l-Bzzg_S0tiyAp0pcDVrVg-CFw","expires":{"seconds":1682008174,"nanos":455339069}},{"name":"nsmgr-tr5n7","id":"44a7009f-5a59-4329-a253-38ae6ccd93d5","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc1fQ.exoviVXnqYh4aYdzm1Qx9JpxVS5mYlMYzDQaPs7RiMjm7ilyy9rJJAQLhY258EHpd1gUa6vlqy5LfD9-bJqc2Q","expires":{"seconds":1682008175,"nanos":429149769}},{"name":"forwarder-vpp-thdt2","id":"4967924d-fc4b-42df-b071-20824f7fa158","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4MTc2fQ.kcaJ8HkJmLu5A9Q02R7ACiolZ6JUS00Vcuzxt1MgC5_fq4Vc-ob57E-yQTqDky9suZjjlMrevtJWw9fLX4pwfg","expires":{"seconds":1682008176,"nanos":273042269},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"nsmgr-27j6j","id":"ddfd106c-e964-4f9f-bb54-9b6e1912a0f8","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc3fQ.8BBI4hhSXXw-yoXJwSsNP3O08x9Q4TOLAuS_oTOnu37ZtjnFo3DbC6nmAJCUpSrfEOQesq8Nk2MhFeLqrKZrWA","expires":{"seconds":1682008177,"nanos":830338082}},{"name":"forwarder-vpp-cp8sp","id":"eebdbac2-5393-4cd1-a148-6aa2fb41544d","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSJdLCJleHAiOjE2ODIwMDgxNzh9.0FEurb6Wm_U1OXlwGNQ7CEHWHwx1uB37m3I_oXYotGckctlmfYihBcNDlPvvX3LKWGKbjzihbf8CQfYvHhfOqw","expires":{"seconds":1682008178,"nanos":249942792},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"load-balancer-trench-a-7c885b6d9b-xgvx8","id":"30958abe-2bde-4f45-83a0-e67fa66d801b","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCJdLCJleHAiOjE2ODIwMDgxNzh9.5q5UYPUNFCV5aw-4yPSmG0sjGNW9JhsdXTlq_xO2wjVsV5Bcx1EMSNCevk5PBtLNthIg3O152flg6lLUjOQFZA","expires":{"seconds":1682008178,"nanos":250666242}}]},"network_service_endpoint_name":"load-balancer-trench-a-7c885b6d9b-xgvx8","payload":"ETHERNET"}
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3)   ⎆ sdk/pkg/networkservice/common/updatetoken/updateTokenServer.Close()
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.1)     close-diff={"path":{"path_segments":{"2":{"expires":{"nanos":434310849,"seconds":1682008775},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4Nzc1fQ.nbjavxp1tsSZxvPiiKISnXo8WFsVmRWhX9AL7-ZuSLEpPjtfh8Xpddia3SKjjL7hDWS6hAlMOpT1NTXqM8s0Zg"}}}}
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(4)    ⎆ sdk/pkg/networkservice/common/authorize/authorizeServer.Close()
Apr 20 16:29:35.435ESC[31m [ERRO] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(4.1)      rpc error: code = PermissionDenied desc = no sufficient privileges;  Error returned from sdk/pkg/networkservice/common/authorize/authorizeServer.Close;      github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.logError;               /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/trace/common.go:206;  github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceServer).Close;              /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/trace/server.go:71;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceServer).Close;                /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/trace/server.go:86;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/common/updatetoken.(*updateTokenServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/common/updatetoken/server.go:84;   github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceServer).Close;              /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/trace/server.go:69;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceServer).Close;                /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/trace/server.go:86;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/common/begin.(*eventFactoryServer).Close.func1;            /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.2/pkg/networkservice/common/begin/event_factory.go:225; github.com/edwarnicke/serialize.(*Executor).process;            /root/go/pkg/mod/github.com/edwarnicke/serialize@v1.0.7/serialize.go:68;        runtime.goexit;         /go/src/runtime/asm_amd64.s:1571;       
Apr 20 16:29:35.435ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.2)     close-response-diff={"id":"4967924d-fc4b-42df-b071-20824f7fa158","path":{"index":2,"path_segments":{"2":{"expires":{"nanos":434310849,"seconds":1682008775},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4Nzc1fQ.nbjavxp1tsSZxvPiiKISnXo8WFsVmRWhX9AL7-ZuSLEpPjtfh8Xpddia3SKjjL7hDWS6hAlMOpT1NTXqM8s0Zg"}}}}
Apr 20 16:29:35.435ESC[31m [ERRO] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.3)     Error returned from sdk/pkg/networkservice/common/authorize/authorizeServer.Close: rpc error: code = PermissionDenied desc = no sufficient privileges
Apr 20 16:29:35.854ESC[37m [TRAC] [id:5ae244a3-d784-4ec5-b3b4-0d5951b1b7db] [type:networkService] ESC[0m(2.1)    close={"id":"5ae244a3-d784-4ec5-b3b4-0d5951b1b7db","network_service":"load-balancer.trench-a.default","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026532326","name":"load-balan-0515"}},"context":{"ip_context":{"src_ip_addrs":["172.16.3.5/24","fd00:0:0:3::5/64"],"dst_ip_addrs":["172.16.3.4/24","fd00:0:0:3::4/64"],"excluded_prefixes":["11.0.0.0/22","11.0.4.0/23","11.0.6.0/24","12.0.0.1/32","12.0.27.146/32","12.0.116.178/32","12.0.155.217/32","12.0.184.219/32","12.0.201.243/32","12.0.250.201/32"]},"MTU":1500},"labels":{"nodeName":"vm-002"},"path":{"index":2,"path_segments":[{"name":"proxy-trench-a-cfqvk","id":"051579bb-4ced-459c-bd02-e17721370210-proxy-trench-a-cfqvk-load-balancer.trench-a.default-load-balancer-trench-a-7c885b6d9b-k7fnk","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTY4MjAwODE3NH0.uNZ4KPC2NZbqHDIxpXFbLhxxXSWhkZ1VbD3s3k4yVr86mMP0w24Vw7DsWmVyIv3ywDMMqUIHyOvA2tHnIdY7UQ","expires":{"seconds":1682008174,"nanos":472660529}},{"name":"nsmgr-tr5n7","id":"3902abe8-3678-43b9-bcbf-569c716a8685","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc1fQ.1FuetkXxojB8W_CsYdydm3Bswz_e27-ttZoTvfCL_lRQiHDvCQtvOcBhitrIRq64snn8dfMSWw_p6LoNwGCkAw","expires":{"seconds":1682008175,"nanos":852200649}},{"name":"forwarder-vpp-thdt2","id":"5ae244a3-d784-4ec5-b3b4-0d5951b1b7db","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSJdLCJleHAiOjE2ODIwMDgxNzZ9.LNIU7MDeu7AJtRjU2n5xPRa4Q-zn_uagGjQ5NQTe3oSrbLf-r_Mw6AvnzR_Z_3izq77Yq4FEu3P9MxmMjvnFbg","expires":{"seconds":1682008176,"nanos":643886529},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"load-balancer-trench-a-7c885b6d9b-k7fnk","id":"db2bd643-5505-4c38-9704-7eb194b51f44","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCJdLCJleHAiOjE2ODIwMDgxNzZ9.m0FIfhJPKfomZ6eXTBBWef7gjuc1xddGLfo959MFBg4zvNapP2Ephv-sYKVSrcunCRzE2uU88cyTdU0CkkXMFw","expires":{"seconds":1682008176,"nanos":644706639}}]},"network_service_endpoint_name":"load-balancer-trench-a-7c885b6d9b-k7fnk","payload":"ETHERNET"}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    📋 To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions