We recently discovered that there are many machines in EC2 that are a result of failed provisions from Duffy.
When Duffy runs the Ansible playbook to create a machine, it can fail, and then Duffy will retry. However, it seems that the failed host might have been created and then something else goes wrong (e.g. perhaps a communication error), which causes Duffy to retry without deleting the host.
We should look into some form of reporting / cleanup mechanism (even just a Zabbix notify) that helps us avoid letting these hosts build up again.
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra
Due to centos infra tracker migration (https://lists.centos.org/hyperkitty/list/devel@lists.centos.org/thread/V3ZLBYFHMWSZFXOVGVU7R6P2X6ELGY5V/), this ticket is now closed but you can find corresponding ticket open at https://gitlab.com/CentOS/infra/tracker/-/issues/1648 Metadata for this ticket is imported in new gitlab tracker but not ownership, so feel free to visit migrated ticket to subscribe and get status update.
Metadata Update from @arrfab: - Issue close_status updated to: Ticket moved to Gitlab tracker - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.