Issue #1316: Duffy nodes stuck in provisioning/contextualizing loop #3 - centos-infra

centos-infra

#1316 Duffy nodes stuck in provisioning/contextualizing loop #3

Closed: Fixed with Explanation Nov 20, 2023 by nphilipp. Opened Nov 17, 2023 by mrc0mmand.

It's that time of week again, so it looks like nodes are getting stuck in provisioning. I've been watch-ing virt-ec2-t2-centos-8s-x86_64 and metal-ec2-c5n-centos-8s-x86_64 pools for 30+ minutes, and the the # of nodes in provisioning state hasn't changed:

{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 3,
    "levels": {
      "provisioning": 3,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 0,
      "deprovisioning": 0
    }
  }
}
{
  "action": "get",
  "pool": {
    "name": "virt-ec2-t2-centos-8s-x86_64",
    "fill_level": 10,
    "levels": {
      "provisioning": 2,
      "ready": 8,
      "contextualizing": 0,
      "deployed": 1,
      "deprovisioning": 0
    }
  }
}

mrc0mmand commented Nov 20, 2023

@nphilipp @dkirwan can someone please take a look?

nphilipp commented Nov 20, 2023

I’ve just done that and unstuck things, summary follows.

nphilipp commented Nov 20, 2023

So here’s what I have found and done:

I looked at a recent provisioning problem which was caused by that it wanted to register the new node with a hostname already in use by another node in the database, which was stuck in deprovisioning since end of June.
Altogether, there were 29 nodes in the database which were stuck like that, with creation times between June 2022 and June 2023.
I set them all to failed and put the reason in their metadata.
I also set the nodes which failed to be provisioned to failed. The task backend then replenished the starved pools.

Metadata Update from @nphilipp:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

Nov 20, 2023

Metadata Update from @nphilipp:
- Issue assigned to nphilipp

Nov 20, 2023

Metadata

Assignee

nphilipp

Tags

None

Blocking

None

Depending on

None

Priority

High

centos-infra

#1316 Duffy nodes stuck in provisioning/contextualizing loop #3 Closed: Fixed with Explanation Nov 20, 2023 by nphilipp. Opened Nov 17, 2023 by mrc0mmand.

Metadata

#1316 Duffy nodes stuck in provisioning/contextualizing loop #3

Closed: Fixed with Explanation Nov 20, 2023 by nphilipp. Opened Nov 17, 2023 by mrc0mmand.