I noticed that since ~4 hours ago all our CI jobs are stuck since the OCP pods can't reach Duffy:
sh-4.4$ hostname cico-workspace-swgxj sh-4.4$ ip a s dev eth0 3: eth0@if48076: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP group default link/ether 0a:58:0a:83:01:60 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.131.1.96/23 brd 10.131.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::c033:cbff:fec1:b812/64 scope link valid_lft forever preferred_lft forever sh-4.4$ ping duffy.ci.centos.org -c 10 PING duffy.ci.centos.org (172.19.0.18) 56(84) bytes of data. --- duffy.ci.centos.org ping statistics --- 10 packets transmitted, 0 received, 100% packet loss, time 9236ms
This seems to be limited to that specific network, since I can reach Duffy just fine from my machines.
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra, medium-gain, medium-trouble
yes, there was a Fiber cut (I read the discussion today on the internal community-cage list) and while there should be redundant path through another provider/link, it seems it wasn't routed correctly so "flapping". AFAICS it was all resolved, and zabbix is also happy now about all nodes back under control. Can you just confirm and so close ticket ?
Metadata Update from @arrfab: - Issue priority set to: Waiting on Reporter (was: Needs Review)
Yup, it looks like everything is running again, thanks!
Metadata Update from @mrc0mmand: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.