#771 Unstable Duffy machines
Closed: Fixed by mrc0mmand. Opened by mrc0mmand.

Hello,

Since Friday's evening, all machines from Duffy are somewhat unstable, i.e. they timeout randomly, can't connect to various sites over network, or are misconfigured (like missing dnf on C8S nodes):

2022-05-08 04:38:33,775 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n46.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
ssh: connect to host n46.pufty port 22: Connection timed out
2022-05-08 05:54:33,424 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n47.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'n47.pufty,172.19.3.111' (ECDSA) to the list of known hosts.
bash: dnf: command not found
2022-05-07 15:56:42,820 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n16.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'n16.pufty,172.19.3.80' (ECDSA) to the list of known hosts.
27 files removed
CentOS Stream 8 - AppStream                      51 MB/s |  22 MB     00:00    
CentOS Stream 8 - BaseOS                         51 MB/s |  22 MB     00:00    
CentOS Stream 8 - Extras                        1.6 MB/s |  18 kB     00:00    
CentOS Stream 8 - Extras common packages        377 kB/s | 4.1 kB     00:00    
Metadata cache created.
Last metadata expiration check: 0:00:06 ago on Sun 08 May 2022 12:02:26 AM BST.
Package bash-4.4.20-4.el8.x86_64 is already installed.
Package git-2.31.1-2.el8.x86_64 is already installed.
Package rsync-3.1.3-14.el8.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
Cloning into 'systemd-centos-ci'...
fatal: unable to access 'https://github.com/systemd/systemd-centos-ci/': Failed to connect to github.com port 443: No route to host
2022-05-07 14:17:34,782 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n11.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'n11.pufty,172.19.3.75' (ECDSA) to the list of known hosts.
43 files removed
CentOS-8-stream - Gluster 10                    0.0  B/s |   0  B     00:15    
Errors during downloading metadata for repository 'centos-gluster10':
  - Curl error (7): Couldn't connect to server for http://mirror2.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror2.ci.centos.org port 80: No route to host]
  - Curl error (7): Couldn't connect to server for http://mirror.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror.ci.centos.org port 80: No route to host]
  - Curl error (7): Couldn't connect to server for http://mirror.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror.centos.org port 80: No route to host]
  - Curl error (7): Couldn't connect to server for http://mirror3.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror3.ci.centos.org port 80: No route to host]
  - Curl error (7): Couldn't connect to server for http://mirror4.ci.centos.org/centos/8-stream/storage/x86_64/gluster-10/repodata/repomd.xml [Failed to connect to mirror4.ci.centos.org port 80: No route to host]
Error: Failed to download metadata for repo 'centos-gluster10': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

This doesn't happen always, but it affects a significant portion of the CI runs (~50%).


Metadata Update from @arrfab:
- Issue assigned to arrfab

Metadata Update from @arrfab:
- Issue tagged with: centos-ci-infra, high-gain, medium-trouble

after some investigation we confirmed that there was internal network issues, reaching some chassis/network components (and so routers to go out to internet, etc) but tests today seem to all work fine (and confirmed by @mrc0mmand on irc in #centos-ci).
So closing for now

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Oh well, the issue seems to be back:

2022-05-09 18:15:44,909 [agent-control/execute_local_command] INFO: Executing a LOCAL command: /usr/bin/ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=180 -o TCPKeepAlive=yes -o ServerAliveInterval=2 -l root n60.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'n60.pufty,172.19.3.124' (ECDSA) to the list of known hosts.
bash: dnf: command not found
ServerAliveInterval=2 -l root n62.pufty dnf clean all && dnf makecache && dnf -y install bash git rsync && rm -fr systemd-centos-ci && git clone https://github.com/systemd/systemd-centos-ci
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'n62.pufty,172.19.3.126' (ECDSA) to the list of known hosts.
bash: dnf: command not found

~7 occurrences in the past 5 hours. It looks like it appears only during non-working hours :-)

Metadata Update from @mrc0mmand:
- Issue status updated to: Open (was: Closed)

Going through the logs, it seems the issue affects only the pufty chassis, since on other chassis everything runs and works as expected.

Thanks to @arrfab who did some of his magic on the pufty chassis the issue is no longer present, hence closing.

Metadata Update from @mrc0mmand:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Log in to comment on this ticket.

Metadata