Nodes lose wireguard after they rebootet and are uncordoned #31

Closed
opened 2023-02-15 00:20:48 +01:00 by aaron · 9 comments
Owner

Nodes lose wireguard after they rebootet and are uncordoned

ufw disable does not fix the issue
wg-quick down wg0 && wg-quick up wg0 does not fix the issue

Nodes lose wireguard after they rebootet and are uncordoned `ufw disable` does not fix the issue `wg-quick down wg0 && wg-quick up wg0` does not fix the issue
aaron added this to the Yolo Ready milestone 2023-02-15 00:21:03 +01:00
Author
Owner
2023-02-17T01:24:05.949559309+01:00 stderr F level=info msg="Serving cilium API at unix:///var/run/cilium/cilium.sock" subsys=daemon
2023-02-17T01:24:05.949733345+01:00 stderr F level=info msg="Configuring Hubble server" eventQueueSize=8192 maxFlows=4095 subsys=hubble
2023-02-17T01:24:05.949886928+01:00 stderr F level=info msg="Starting local Hubble server" address="unix:///var/run/cilium/hubble.sock" subsys=hubble
2023-02-17T01:24:05.950230906+01:00 stderr F level=info msg="Beginning to read perf buffer" startTime="2023-02-17 00:24:05.949901255 +0000 UTC m=+3.145054149" subsys=monitor-agent
2023-02-17T01:24:05.950264147+01:00 stderr F level=info msg="Starting Hubble server" address=":4244" subsys=hubble
2023-02-17T01:24:06.276590333+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.209 e205f9b5-a1f6-4514-8697-57fa1b918214 fd00::25e ec143425-9083-4ce7-98a0-d4eb1222a469}" containerID=ae735905a7c68022a3365ad4e2472f111a97c0453780956b93a16dc267924014 datapathConfiguration="&{false false false false false <nil>}" interface=lxc2a57c286befd k8sPodName=rook-ceph/rook-ceph-crashcollector-worker3.yolokube.de-66dccd7d46-ndjpb labels="[]" subsys=daemon sync-build=true
2023-02-17T01:24:06.276708404+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.276903745+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 identityLabels="k8s:app=rook-ceph-crashcollector,k8s:ceph-version=17.2.5-0,k8s:ceph_daemon_id=crash,k8s:crashcollector=crash,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:node_name=worker3.yolokube.de,k8s:rook-version=v1.10.11,k8s:rook_cluster=rook-ceph" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.277146661+01:00 stderr F level=info msg="Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination" labels="map[k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name:rook-ceph]" subsys=crd-allocator
2023-02-17T01:24:06.360790442+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.210 961f2a2a-8c1b-4dcf-80da-63b08e8ae502 fd00::224 5e6604e5-feb3-404e-92b2-34e091d68334}" containerID=bcba488c07009d550864a441d6eb336fc4a9a1bebdd54975a0bab1b8c18a5fbd datapathConfiguration="&{false false false false false <nil>}" interface=lxcb5b483665644 k8sPodName=rook-ceph/rook-ceph-mon-b-cd7fcffd-jqk6s labels="[]" subsys=daemon sync-build=true
2023-02-17T01:24:06.360954285+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.361140417+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 identityLabels="k8s:app.kubernetes.io/component=cephclusters.ceph.rook.io,k8s:app.kubernetes.io/created-by=rook-ceph-operator,k8s:app.kubernetes.io/instance=b,k8s:app.kubernetes.io/managed-by=rook-ceph-operator,k8s:app.kubernetes.io/name=ceph-mon,k8s:app.kubernetes.io/part-of=rook-ceph,k8s:app=rook-ceph-mon,k8s:ceph_daemon_id=b,k8s:ceph_daemon_type=mon,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:mon=b,k8s:mon_cluster=rook-ceph,k8s:rook.io/operator-namespace=rook-ceph,k8s:rook_cluster=rook-ceph" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.444728474+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.235 30e42ab7-9f59-489a-bcf9-ceeda08604d7 fd00::245 150764f9-45b6-45ec-8f31-684e0da6a108}" containerID=94cd9ae28a834f6e59989a89fa145a443f55fe8656e392e74f4190fd36f2a4bd datapathConfiguration="&{false false false false false <nil>}" interface=lxc8ac4189b401d k8sPodName=rook-ceph/rook-ceph-osd-2-7878dbb475-46qbl labels="[]" subsys=daemon sync-build=true
2023-02-17T01:24:06.444874464+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.445021313+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 identityLabels="k8s:app.kubernetes.io/component=cephclusters.ceph.rook.io,k8s:app.kubernetes.io/created-by=rook-ceph-operator,k8s:app.kubernetes.io/instance=2,k8s:app.kubernetes.io/managed-by=rook-ceph-operator,k8s:app.kubernetes.io/name=ceph-osd,k8s:app.kubernetes.io/part-of=rook-ceph,k8s:app=rook-ceph-osd,k8s:ceph-osd-id=2,k8s:ceph-version=17.2.5-0,k8s:ceph_daemon_id=2,k8s:ceph_daemon_type=osd,k8s:device-class=nvme,k8s:failure-domain=worker3.yolokube.de,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=rook-ceph-osd,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:osd=2,k8s:portable=false,k8s:rook-version=v1.10.11,k8s:rook.io/operator-namespace=rook-ceph,k8s:rook_cluster=rook-ceph,k8s:topology-location-host=worker3-yolokube-de,k8s:topology-location-root=default" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.596496457+01:00 stderr F level=info msg="regenerating all endpoints" reason= subsys=endpoint-manager
2023-02-17T01:24:06.596525167+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint
2023-02-17T01:24:06.596529388+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint
2023-02-17T01:24:06.596545272+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint
2023-02-17T01:24:06.919627867+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.919647743+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 identityLabels="reserved:health" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:06.919671766+01:00 stderr F level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 identity=4 identityLabels="reserved:health" ipv4= ipv6= k8sPodName=/ oldIdentity="no identity" subsys=endpoint
2023-02-17T01:24:07.092942506+01:00 stderr F level=info msg="Compiled new BPF template" BPFCompilationTime=1.220855645s file-path=/var/run/cilium/state/templates/85ad1b04990cfd0d50413cdd49afe7e2cb2e394824d5b5fa728b0058df8f728b/bpf_host.o subsys=datapath-loader
2023-02-17T01:24:07.229459759+01:00 stderr F level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=8 endpointID=440 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:11.566057278+01:00 stderr F level=info msg="Compiled new BPF template" BPFCompilationTime=4.641350347s file-path=/var/run/cilium/state/templates/214339f189962bc3104f2a2c7ecf1817b52e50461364ad2e6c93a4480bf1df42/bpf_lxc.o subsys=datapath-loader
2023-02-17T01:24:11.715277214+01:00 stderr F level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=8 endpointID=3222 identity=4 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2023-02-17T01:24:20.879974645+01:00 stderr F level=warning msg="No response from probe within 15 seconds" probe=kubernetes subsys=status
2023-02-17T01:24:49.599196901+01:00 stderr F level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/namespace.go:65: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding" subsys=klog
2023-02-17T01:24:49.599302987+01:00 stderr F level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/service.go:71: watch of *v1.Service ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding" subsys=klog

logs of cilium-agent on the time of the loss of communication. The Communication via wireguard was lost at about 01:24:06 in this log.

``` 2023-02-17T01:24:05.949559309+01:00 stderr F level=info msg="Serving cilium API at unix:///var/run/cilium/cilium.sock" subsys=daemon 2023-02-17T01:24:05.949733345+01:00 stderr F level=info msg="Configuring Hubble server" eventQueueSize=8192 maxFlows=4095 subsys=hubble 2023-02-17T01:24:05.949886928+01:00 stderr F level=info msg="Starting local Hubble server" address="unix:///var/run/cilium/hubble.sock" subsys=hubble 2023-02-17T01:24:05.950230906+01:00 stderr F level=info msg="Beginning to read perf buffer" startTime="2023-02-17 00:24:05.949901255 +0000 UTC m=+3.145054149" subsys=monitor-agent 2023-02-17T01:24:05.950264147+01:00 stderr F level=info msg="Starting Hubble server" address=":4244" subsys=hubble 2023-02-17T01:24:06.276590333+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.209 e205f9b5-a1f6-4514-8697-57fa1b918214 fd00::25e ec143425-9083-4ce7-98a0-d4eb1222a469}" containerID=ae735905a7c68022a3365ad4e2472f111a97c0453780956b93a16dc267924014 datapathConfiguration="&{false false false false false <nil>}" interface=lxc2a57c286befd k8sPodName=rook-ceph/rook-ceph-crashcollector-worker3.yolokube.de-66dccd7d46-ndjpb labels="[]" subsys=daemon sync-build=true 2023-02-17T01:24:06.276708404+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.276903745+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 identityLabels="k8s:app=rook-ceph-crashcollector,k8s:ceph-version=17.2.5-0,k8s:ceph_daemon_id=crash,k8s:crashcollector=crash,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:node_name=worker3.yolokube.de,k8s:rook-version=v1.10.11,k8s:rook_cluster=rook-ceph" ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.277146661+01:00 stderr F level=info msg="Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination" labels="map[k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name:rook-ceph]" subsys=crd-allocator 2023-02-17T01:24:06.360790442+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.210 961f2a2a-8c1b-4dcf-80da-63b08e8ae502 fd00::224 5e6604e5-feb3-404e-92b2-34e091d68334}" containerID=bcba488c07009d550864a441d6eb336fc4a9a1bebdd54975a0bab1b8c18a5fbd datapathConfiguration="&{false false false false false <nil>}" interface=lxcb5b483665644 k8sPodName=rook-ceph/rook-ceph-mon-b-cd7fcffd-jqk6s labels="[]" subsys=daemon sync-build=true 2023-02-17T01:24:06.360954285+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.361140417+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 identityLabels="k8s:app.kubernetes.io/component=cephclusters.ceph.rook.io,k8s:app.kubernetes.io/created-by=rook-ceph-operator,k8s:app.kubernetes.io/instance=b,k8s:app.kubernetes.io/managed-by=rook-ceph-operator,k8s:app.kubernetes.io/name=ceph-mon,k8s:app.kubernetes.io/part-of=rook-ceph,k8s:app=rook-ceph-mon,k8s:ceph_daemon_id=b,k8s:ceph_daemon_type=mon,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:mon=b,k8s:mon_cluster=rook-ceph,k8s:rook.io/operator-namespace=rook-ceph,k8s:rook_cluster=rook-ceph" ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.444728474+01:00 stderr F level=info msg="Create endpoint request" addressing="&{10.0.2.235 30e42ab7-9f59-489a-bcf9-ceeda08604d7 fd00::245 150764f9-45b6-45ec-8f31-684e0da6a108}" containerID=94cd9ae28a834f6e59989a89fa145a443f55fe8656e392e74f4190fd36f2a4bd datapathConfiguration="&{false false false false false <nil>}" interface=lxc8ac4189b401d k8sPodName=rook-ceph/rook-ceph-osd-2-7878dbb475-46qbl labels="[]" subsys=daemon sync-build=true 2023-02-17T01:24:06.444874464+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.445021313+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 identityLabels="k8s:app.kubernetes.io/component=cephclusters.ceph.rook.io,k8s:app.kubernetes.io/created-by=rook-ceph-operator,k8s:app.kubernetes.io/instance=2,k8s:app.kubernetes.io/managed-by=rook-ceph-operator,k8s:app.kubernetes.io/name=ceph-osd,k8s:app.kubernetes.io/part-of=rook-ceph,k8s:app=rook-ceph-osd,k8s:ceph-osd-id=2,k8s:ceph-version=17.2.5-0,k8s:ceph_daemon_id=2,k8s:ceph_daemon_type=osd,k8s:device-class=nvme,k8s:failure-domain=worker3.yolokube.de,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=rook-ceph,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=rook-ceph-osd,k8s:io.kubernetes.pod.namespace=rook-ceph,k8s:osd=2,k8s:portable=false,k8s:rook-version=v1.10.11,k8s:rook.io/operator-namespace=rook-ceph,k8s:rook_cluster=rook-ceph,k8s:topology-location-host=worker3-yolokube-de,k8s:topology-location-root=default" ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.596496457+01:00 stderr F level=info msg="regenerating all endpoints" reason= subsys=endpoint-manager 2023-02-17T01:24:06.596525167+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3159 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint 2023-02-17T01:24:06.596529388+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3691 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint 2023-02-17T01:24:06.596545272+01:00 stderr F level=info msg="Invalid state transition skipped" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1562 endpointState.from=waiting-for-identity endpointState.to=waiting-to-regenerate file=/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go ipv4= ipv6= k8sPodName=/ line=526 subsys=endpoint 2023-02-17T01:24:06.919627867+01:00 stderr F level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.919647743+01:00 stderr F level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 identityLabels="reserved:health" ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:06.919671766+01:00 stderr F level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3222 identity=4 identityLabels="reserved:health" ipv4= ipv6= k8sPodName=/ oldIdentity="no identity" subsys=endpoint 2023-02-17T01:24:07.092942506+01:00 stderr F level=info msg="Compiled new BPF template" BPFCompilationTime=1.220855645s file-path=/var/run/cilium/state/templates/85ad1b04990cfd0d50413cdd49afe7e2cb2e394824d5b5fa728b0058df8f728b/bpf_host.o subsys=datapath-loader 2023-02-17T01:24:07.229459759+01:00 stderr F level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=8 endpointID=440 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:11.566057278+01:00 stderr F level=info msg="Compiled new BPF template" BPFCompilationTime=4.641350347s file-path=/var/run/cilium/state/templates/214339f189962bc3104f2a2c7ecf1817b52e50461364ad2e6c93a4480bf1df42/bpf_lxc.o subsys=datapath-loader 2023-02-17T01:24:11.715277214+01:00 stderr F level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=8 endpointID=3222 identity=4 ipv4= ipv6= k8sPodName=/ subsys=endpoint 2023-02-17T01:24:20.879974645+01:00 stderr F level=warning msg="No response from probe within 15 seconds" probe=kubernetes subsys=status 2023-02-17T01:24:49.599196901+01:00 stderr F level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/namespace.go:65: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding" subsys=klog 2023-02-17T01:24:49.599302987+01:00 stderr F level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/service.go:71: watch of *v1.Service ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding" subsys=klog ``` logs of cilium-agent on the time of the loss of communication. The Communication via wireguard was lost at about 01:24:06 in this log.
Author
Owner
2023-02-17T01:23:59.862129919+01:00 stderr F I0217 00:23:59.861945       1 server.go:655] "Version info" version="v1.26.1"
2023-02-17T01:23:59.862168678+01:00 stderr F I0217 00:23:59.861997       1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
2023-02-17T01:23:59.871997622+01:00 stderr F I0217 00:23:59.871808       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=262144
2023-02-17T01:23:59.872312323+01:00 stderr F I0217 00:23:59.872137       1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
2023-02-17T01:23:59.873023071+01:00 stderr F I0217 00:23:59.872809       1 config.go:317] "Starting service config controller"
2023-02-17T01:23:59.873076019+01:00 stderr F I0217 00:23:59.872830       1 config.go:226] "Starting endpoint slice config controller"
2023-02-17T01:23:59.873239596+01:00 stderr F I0217 00:23:59.873019       1 config.go:444] "Starting node config controller"
2023-02-17T01:23:59.873279508+01:00 stderr F I0217 00:23:59.873133       1 shared_informer.go:273] Waiting for caches to sync for node config
2023-02-17T01:23:59.873299784+01:00 stderr F I0217 00:23:59.873137       1 shared_informer.go:273] Waiting for caches to sync for service config
2023-02-17T01:23:59.874158916+01:00 stderr F I0217 00:23:59.873602       1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config
2023-02-17T01:23:59.974079536+01:00 stderr F I0217 00:23:59.973798       1 shared_informer.go:280] Caches are synced for node config
2023-02-17T01:23:59.974136566+01:00 stderr F I0217 00:23:59.973880       1 shared_informer.go:280] Caches are synced for service config
2023-02-17T01:23:59.975501816+01:00 stderr F I0217 00:23:59.975322       1 shared_informer.go:280] Caches are synced for endpoint slice config
2023-02-17T01:24:44.889417768+01:00 stderr F W0217 00:24:44.889114       1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2023-02-17T01:24:44.88948203+01:00 stderr F W0217 00:24:44.889162       1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2023-02-17T01:24:44.889652849+01:00 stderr F W0217 00:24:44.889449       1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2023-02-17T01:25:15.708466592+01:00 stderr F W0217 00:25:15.708129       1 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://10.10.0.1:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=762444": dial tcp 10.10.0.1:6443: i/o timeout

logs of kube-proxy do not show any entries during the time of the crash. Just the loss of communication is logged about 38s later.

``` 2023-02-17T01:23:59.862129919+01:00 stderr F I0217 00:23:59.861945 1 server.go:655] "Version info" version="v1.26.1" 2023-02-17T01:23:59.862168678+01:00 stderr F I0217 00:23:59.861997 1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" 2023-02-17T01:23:59.871997622+01:00 stderr F I0217 00:23:59.871808 1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=262144 2023-02-17T01:23:59.872312323+01:00 stderr F I0217 00:23:59.872137 1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600 2023-02-17T01:23:59.873023071+01:00 stderr F I0217 00:23:59.872809 1 config.go:317] "Starting service config controller" 2023-02-17T01:23:59.873076019+01:00 stderr F I0217 00:23:59.872830 1 config.go:226] "Starting endpoint slice config controller" 2023-02-17T01:23:59.873239596+01:00 stderr F I0217 00:23:59.873019 1 config.go:444] "Starting node config controller" 2023-02-17T01:23:59.873279508+01:00 stderr F I0217 00:23:59.873133 1 shared_informer.go:273] Waiting for caches to sync for node config 2023-02-17T01:23:59.873299784+01:00 stderr F I0217 00:23:59.873137 1 shared_informer.go:273] Waiting for caches to sync for service config 2023-02-17T01:23:59.874158916+01:00 stderr F I0217 00:23:59.873602 1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config 2023-02-17T01:23:59.974079536+01:00 stderr F I0217 00:23:59.973798 1 shared_informer.go:280] Caches are synced for node config 2023-02-17T01:23:59.974136566+01:00 stderr F I0217 00:23:59.973880 1 shared_informer.go:280] Caches are synced for service config 2023-02-17T01:23:59.975501816+01:00 stderr F I0217 00:23:59.975322 1 shared_informer.go:280] Caches are synced for endpoint slice config 2023-02-17T01:24:44.889417768+01:00 stderr F W0217 00:24:44.889114 1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2023-02-17T01:24:44.88948203+01:00 stderr F W0217 00:24:44.889162 1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2023-02-17T01:24:44.889652849+01:00 stderr F W0217 00:24:44.889449 1 reflector.go:347] vendor/k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2023-02-17T01:25:15.708466592+01:00 stderr F W0217 00:25:15.708129 1 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://10.10.0.1:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=762444": dial tcp 10.10.0.1:6443: i/o timeout ``` logs of kube-proxy do not show any entries during the time of the crash. Just the loss of communication is logged about 38s later.
Author
Owner
Feb 17 01:23:55 worker3 kernel: [   24.233761] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=151.106.39.209 DST=138.201.131.202 LEN=675 TOS=0x00 PREC=0x00 TTL=116 ID=16575 PROTO=UDP SPT=17071 DPT=8855 LEN=655 
Feb 17 01:23:56 worker3 kernel: [   24.593510] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=51.158.164.2 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=28806 PROTO=TCP SPT=53845 DPT=36216 WINDOW=1024 RES=0x00 SYN URGP=0 
Feb 17 01:24:02 worker3 kernel: [   30.373260] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=185.225.74.55 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=50344 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 
Feb 17 01:24:05 worker3 kernel: [   33.491773] [UFW BLOCK] IN=cilium_host OUT=wg0 MAC=ca:50:47:c8:15:34:e2:05:a0:da:92:df:08:00 SRC=10.0.1.219 DST=10.0.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=62506 DF PROTO=TCP SPT=51634 DPT=4240 WINDOW=65170 RES=0x00 SYN URGP=0 
Feb 17 01:24:56 worker3 kernel: [   84.170883] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=88.202.190.134 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=1400 DPT=1400 WINDOW=65535 RES=0x00 SYN URGP=0 
Feb 17 01:25:03 worker3 kernel: [   91.106091] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=151.106.39.209 DST=138.201.131.202 LEN=675 TOS=0x00 PREC=0x00 TTL=116 ID=48734 PROTO=UDP SPT=34545 DPT=8856 LEN=655 

ufw logs are interesting... there is an entry about cilium getting blocked

``` Feb 17 01:23:55 worker3 kernel: [ 24.233761] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=151.106.39.209 DST=138.201.131.202 LEN=675 TOS=0x00 PREC=0x00 TTL=116 ID=16575 PROTO=UDP SPT=17071 DPT=8855 LEN=655 Feb 17 01:23:56 worker3 kernel: [ 24.593510] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=51.158.164.2 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=28806 PROTO=TCP SPT=53845 DPT=36216 WINDOW=1024 RES=0x00 SYN URGP=0 Feb 17 01:24:02 worker3 kernel: [ 30.373260] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=185.225.74.55 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=50344 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 Feb 17 01:24:05 worker3 kernel: [ 33.491773] [UFW BLOCK] IN=cilium_host OUT=wg0 MAC=ca:50:47:c8:15:34:e2:05:a0:da:92:df:08:00 SRC=10.0.1.219 DST=10.0.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=62506 DF PROTO=TCP SPT=51634 DPT=4240 WINDOW=65170 RES=0x00 SYN URGP=0 Feb 17 01:24:56 worker3 kernel: [ 84.170883] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=88.202.190.134 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=1400 DPT=1400 WINDOW=65535 RES=0x00 SYN URGP=0 Feb 17 01:25:03 worker3 kernel: [ 91.106091] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=151.106.39.209 DST=138.201.131.202 LEN=675 TOS=0x00 PREC=0x00 TTL=116 ID=48734 PROTO=UDP SPT=34545 DPT=8856 LEN=655 ``` ufw logs are interesting... there is an entry about cilium getting blocked
Author
Owner
Feb 17 01:23:58 worker3 kernel: [   27.016023] Key type ceph registered
Feb 17 01:23:58 worker3 kernel: [   27.016360] libceph: loaded (mon/osd proto 15/24)
Feb 17 01:23:58 worker3 kernel: [   27.026889] rbd: loaded (major 253)
Feb 17 01:24:02 worker3 kernel: [   30.373260] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=185.225.74.55 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=50344 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 
Feb 17 01:24:03 worker3 kernel: [   31.615304] Initializing XFRM netlink socket
Feb 17 01:24:04 worker3 kernel: [   32.923193] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_host: link becomes ready
Feb 17 01:24:05 worker3 kernel: [   33.430584] NET: Registered protocol family 38
Feb 17 01:24:05 worker3 kernel: [   33.491773] [UFW BLOCK] IN=cilium_host OUT=wg0 MAC=ca:50:47:c8:15:34:e2:05:a0:da:92:df:08:00 SRC=10.0.1.219 DST=10.0.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=62506 DF PROTO=TCP SPT=51634 DPT=4240 WINDOW=65170 RES=0x00 SYN URGP=0 
Feb 17 01:24:05 worker3 kernel: [   34.246965] IPv6: ADDRCONF(NETDEV_CHANGE): cilium: link becomes ready
Feb 17 01:24:05 worker3 kernel: [   34.247069] IPv6: ADDRCONF(NETDEV_CHANGE): lxc_health: link becomes ready
Feb 17 01:24:06 worker3 kernel: [   34.582907] eth0: renamed from tmpae735
Feb 17 01:24:06 worker3 kernel: [   34.610967] IPv6: ADDRCONF(NETDEV_CHANGE): lxc2a57c286befd: link becomes ready
Feb 17 01:24:06 worker3 kernel: [   34.670761] eth0: renamed from tmpbcba4
Feb 17 01:24:06 worker3 kernel: [   34.694947] IPv6: ADDRCONF(NETDEV_CHANGE): lxcb5b483665644: link becomes ready
Feb 17 01:24:06 worker3 kernel: [   34.742891] eth0: renamed from tmp94cd9
Feb 17 01:24:06 worker3 kernel: [   34.778985] IPv6: ADDRCONF(NETDEV_CHANGE): lxc8ac4189b401d: link becomes ready

kernel log

``` Feb 17 01:23:58 worker3 kernel: [ 27.016023] Key type ceph registered Feb 17 01:23:58 worker3 kernel: [ 27.016360] libceph: loaded (mon/osd proto 15/24) Feb 17 01:23:58 worker3 kernel: [ 27.026889] rbd: loaded (major 253) Feb 17 01:24:02 worker3 kernel: [ 30.373260] [UFW BLOCK] IN=enp0s31f6 OUT=wg0 MAC=90:1b:0e:9e:eb:96:00:31:46:0d:3e:e3:08:00 SRC=185.225.74.55 DST=138.201.131.202 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=50344 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 Feb 17 01:24:03 worker3 kernel: [ 31.615304] Initializing XFRM netlink socket Feb 17 01:24:04 worker3 kernel: [ 32.923193] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_host: link becomes ready Feb 17 01:24:05 worker3 kernel: [ 33.430584] NET: Registered protocol family 38 Feb 17 01:24:05 worker3 kernel: [ 33.491773] [UFW BLOCK] IN=cilium_host OUT=wg0 MAC=ca:50:47:c8:15:34:e2:05:a0:da:92:df:08:00 SRC=10.0.1.219 DST=10.0.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=62506 DF PROTO=TCP SPT=51634 DPT=4240 WINDOW=65170 RES=0x00 SYN URGP=0 Feb 17 01:24:05 worker3 kernel: [ 34.246965] IPv6: ADDRCONF(NETDEV_CHANGE): cilium: link becomes ready Feb 17 01:24:05 worker3 kernel: [ 34.247069] IPv6: ADDRCONF(NETDEV_CHANGE): lxc_health: link becomes ready Feb 17 01:24:06 worker3 kernel: [ 34.582907] eth0: renamed from tmpae735 Feb 17 01:24:06 worker3 kernel: [ 34.610967] IPv6: ADDRCONF(NETDEV_CHANGE): lxc2a57c286befd: link becomes ready Feb 17 01:24:06 worker3 kernel: [ 34.670761] eth0: renamed from tmpbcba4 Feb 17 01:24:06 worker3 kernel: [ 34.694947] IPv6: ADDRCONF(NETDEV_CHANGE): lxcb5b483665644: link becomes ready Feb 17 01:24:06 worker3 kernel: [ 34.742891] eth0: renamed from tmp94cd9 Feb 17 01:24:06 worker3 kernel: [ 34.778985] IPv6: ADDRCONF(NETDEV_CHANGE): lxc8ac4189b401d: link becomes ready ``` kernel log
Author
Owner
Feb 17 01:24:02 worker3 containerd[812]: time="2023-02-17T01:24:02.853475012+01:00" level=error msg="failed to reload cni configuration after receiving fs change event(\"/etc/cni/net.d/05-cilium.conf\": REMOVE)" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Feb 17 01:24:04 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:04 worker3 systemd-udevd[1404]: Using default interface naming scheme 'v247'.
Feb 17 01:24:04 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:04 worker3 systemd-udevd[1357]: Using default interface naming scheme 'v247'.
Feb 17 01:24:04 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:05 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:05 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:05 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for cilium
Feb 17 01:24:05 worker3 systemd-udevd[1357]: cilium: Could not set offload features, ignoring: No such device
Feb 17 01:24:06 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for tmpae735
Feb 17 01:24:06 worker3 systemd-udevd[1357]: tmpae735: Could not set offload features, ignoring: No such device
Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for tmpbcba4
Feb 17 01:24:06 worker3 systemd-udevd[1357]: tmpbcba4: Could not set offload features, ignoring: No such device
Feb 17 01:24:06 worker3 systemd-udevd[1424]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1460]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 17 01:24:06 worker3 systemd-udevd[1424]: ethtool: could not get ethtool features for tmp94cd9
Feb 17 01:24:06 worker3 systemd-udevd[1424]: tmp94cd9: Could not set offload features, ignoring: No such device
Feb 17 01:24:06 worker3 systemd-udevd[1460]: Using default interface naming scheme 'v247'.
Feb 17 01:24:06 worker3 systemd-udevd[1424]: Using default interface naming scheme 'v247'.

daemon.log

``` Feb 17 01:24:02 worker3 containerd[812]: time="2023-02-17T01:24:02.853475012+01:00" level=error msg="failed to reload cni configuration after receiving fs change event(\"/etc/cni/net.d/05-cilium.conf\": REMOVE)" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config" Feb 17 01:24:04 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:04 worker3 systemd-udevd[1404]: Using default interface naming scheme 'v247'. Feb 17 01:24:04 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:04 worker3 systemd-udevd[1357]: Using default interface naming scheme 'v247'. Feb 17 01:24:04 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:05 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:05 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:05 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for cilium Feb 17 01:24:05 worker3 systemd-udevd[1357]: cilium: Could not set offload features, ignoring: No such device Feb 17 01:24:06 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for tmpae735 Feb 17 01:24:06 worker3 systemd-udevd[1357]: tmpae735: Could not set offload features, ignoring: No such device Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1404]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1357]: ethtool: could not get ethtool features for tmpbcba4 Feb 17 01:24:06 worker3 systemd-udevd[1357]: tmpbcba4: Could not set offload features, ignoring: No such device Feb 17 01:24:06 worker3 systemd-udevd[1424]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1460]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 17 01:24:06 worker3 systemd-udevd[1424]: ethtool: could not get ethtool features for tmp94cd9 Feb 17 01:24:06 worker3 systemd-udevd[1424]: tmp94cd9: Could not set offload features, ignoring: No such device Feb 17 01:24:06 worker3 systemd-udevd[1460]: Using default interface naming scheme 'v247'. Feb 17 01:24:06 worker3 systemd-udevd[1424]: Using default interface naming scheme 'v247'. ``` daemon.log
Owner

Update: I used my test cluster to debug the problem a bit. I guess the problem is caused by Ciliums IPTables rules. This only occurs when IPv6 is enabled. With IPv4 only configuration this problem does not exist.
Makes sense, since Wireguard communicates over the main interface (public IPv6) and only IPTables rules are created when IPv6 is also enabled in Cilium. 🚧
Unfortunately, the node needs to be reinstalled to solve the problem. Deploying the CNI again and deleting the IPTables chains does not solve the problem. 🙈💩

Update: I used my test cluster to debug the problem a bit. I guess the problem is caused by Ciliums IPTables rules. This only occurs when IPv6 is enabled. With IPv4 only configuration this problem does not exist. Makes sense, since Wireguard communicates over the main interface (public IPv6) and only IPTables rules are created when IPv6 is also enabled in Cilium. 🚧 Unfortunately, the node needs to be reinstalled to solve the problem. Deploying the CNI again and deleting the IPTables chains does not solve the problem. 🙈💩
Owner

Update: I used my test cluster to debug the problem a bit. I guess the problem is caused by Ciliums IPTables rules. This only occurs when IPv6 is enabled. With IPv4 only configuration this problem does not exist.
Makes sense, since Wireguard communicates over the main interface (public IPv6) and only IPTables rules are created when IPv6 is also enabled in Cilium. 🚧
Unfortunately, the node needs to be reinstalled to solve the problem. Deploying the CNI again and deleting the IPTables chains does not solve the problem. 🙈💩

I have rechecked Cilium on the test cluster. The attempt to run "Cilium only on the wireguard interface" did not solve the problem. Maybe Cilium's host policies are the solution to allow the wireguard port. 🤔🤞

> Update: I used my test cluster to debug the problem a bit. I guess the problem is caused by Ciliums IPTables rules. This only occurs when IPv6 is enabled. With IPv4 only configuration this problem does not exist. > Makes sense, since Wireguard communicates over the main interface (public IPv6) and only IPTables rules are created when IPv6 is also enabled in Cilium. 🚧 > Unfortunately, the node needs to be reinstalled to solve the problem. Deploying the CNI again and deleting the IPTables chains does not solve the problem. 🙈💩 I have rechecked Cilium on the test cluster. The attempt to run "Cilium only on the wireguard interface" did not solve the problem. Maybe Cilium's host policies are the solution to allow the wireguard port. 🤔🤞
Owner

A weird workaround would be to drop Cilium and switch back to Flannel after all... 🤔

A weird workaround would be to drop Cilium and switch back to Flannel after all... 🤔
Author
Owner

with flannel it works fine

with flannel it works fine
aaron closed this issue 2023-03-12 23:23:31 +01:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
yolokube/ansible#31
No description provided.