Hello,
Since Friday's evening, all machines from Duffy are somewhat unstable, i.e. they timeout randomly, can't connect to various sites over network, or are misconfigured (like missing dnf on C8S nodes):
dnf
2022-05-08 04:38:33,775 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n46.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. ssh: connect to host n46.pufty port 22: Connection timed out
2022-05-08 05:54:33,424 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n47.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. Warning: Permanently added 'n47.pufty,172.19.3.111' (ECDSA) to the list of known hosts. bash: dnf: command not found
2022-05-07 15:56:42,820 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n16.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. Warning: Permanently added 'n16.pufty,172.19.3.80' (ECDSA) to the list of known hosts. 27 files removed CentOS Stream 8 - AppStream 51 MB/s | 22 MB 00:00 CentOS Stream 8 - BaseOS 51 MB/s | 22 MB 00:00 CentOS Stream 8 - Extras 1.6 MB/s | 18 kB 00:00 CentOS Stream 8 - Extras common packages 377 kB/s | 4.1 kB 00:00 Metadata cache created. Last metadata expiration check: 0:00:06 ago on Sun 08 May 2022 12:02:26 AM BST. Package bash-4.4.20-4.el8.x86_64 is already installed. Package git-2.31.1-2.el8.x86_64 is already installed. Package rsync-3.1.3-14.el8.x86_64 is already installed. Dependencies resolved. Nothing to do. Complete! Cloning into 'systemd-centos-ci'... fatal: unable to access 'https://github.com/systemd/systemd-centos-ci/': Failed to connect to github.com port 443: No route to host
2022-05-07 14:17:34,782 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n11.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. Warning: Permanently added 'n11.pufty,172.19.3.75' (ECDSA) to the list of known hosts. 43 files removed CentOS-8-stream - Gluster 10 0.0 B/s | 0 B 00:15 Errors during downloading metadata for repository 'centos-gluster10': - Curl error (7): Couldn't connect to server for http://mirror2.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror2.ci.centos.org port 80: No route to host] - Curl error (7): Couldn't connect to server for http://mirror.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror.ci.centos.org port 80: No route to host] - Curl error (7): Couldn't connect to server for http://mirror.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror.centos.org port 80: No route to host] - Curl error (7): Couldn't connect to server for http://mirror3.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror3.ci.centos.org port 80: No route to host] - Curl error (7): Couldn't connect to server for http://mirror4.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror4.ci.centos.org port 80: No route to host] Error: Failed to download metadata for repo 'centos-gluster10': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
This doesn't happen always, but it affects a significant portion of the CI runs (~50%).
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra, high-gain, medium-trouble
after some investigation we confirmed that there was internal network issues, reaching some chassis/network components (and so routers to go out to internet, etc) but tests today seem to all work fine (and confirmed by @mrc0mmand on irc in #centos-ci). So closing for now
Metadata Update from @arrfab: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Oh well, the issue seems to be back:
2022-05-09 18:15:44,909 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n60.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. Warning: Permanently added 'n60.pufty,172.19.3.124' (ECDSA) to the list of known hosts. bash: dnf: command not found
ServerAliveInterval=2 -l root n62.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci Pseudo-terminal will not be allocated because stdin is not a terminal. Warning: Permanently added 'n62.pufty,172.19.3.126' (ECDSA) to the list of known hosts. bash: dnf: command not found
~7 occurrences in the past 5 hours. It looks like it appears only during non-working hours :-)
Metadata Update from @mrc0mmand: - Issue status updated to: Open (was: Closed)
Going through the logs, it seems the issue affects only the pufty chassis, since on other chassis everything runs and works as expected.
pufty
Thanks to @arrfab who did some of his magic on the pufty chassis the issue is no longer present, hence closing.
Metadata Update from @mrc0mmand: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.