#92 Hardware disk issue in one server (colo-move/IAD2)
Closed: Fixed by dkirwan. Opened by smooge.

Duplicating from Fedora infra ticket 9074

https://pagure.io/fedora-infrastructure/issue/9074

Creating ticket here but already discussed with @smooge during colo-move meetings.
One of the Dell nodes came with a DOA SSD disk, so just creating this ticket here for tracking purposes


Have worked out the steps to get a drive replaced in the IAD2 location.

  1. Get into the hardware's remote management (idrac in this case) and make sure it is up to date with BIOS, perc controller firmware and idrac. This is done as some issues actually get fixed by this (we had a couple of R630's with "good" drives finally fail and and vice versa).

  2. Gather the appropriate info:

  3. serial number
  4. hardware troubleshoot data (usually an xml tar ball)

  5. Call Dell support and get an initial ticket opened

  6. Work with phone support tech to get hardware replacement order sent.
  7. Give them the IAD2 location and let them know that the tech needs to call 3-4 hours in advance so you can get a ticket in for them.
  8. Remind them that if the hardware is sent to the datacenter you need to know its tracking info or the hardware may get refused at site. [Dell seems to do the right thing and send the hardware to their Unisys and other contractors Fedex instead.]
  9. Open an internal red hat helpdesk datacenter ticket to let them know that a tech will need access to the site and hardware needs to be ready to be recieved. Look up location and rack space (101 for CentOS hardware)
  10. Get an email and/or phone call the next business day from the tech
  11. Get their info and let IT know and then get the access ticket. Give this to the tech.
  12. Tech will get escorted to cage and do hardware change as needed. May call to see if you can fix/test things.
  13. Close out tickets.

Metadata Update from @arrfab:
- Issue tagged with: centos-common-infra, medium-gain, medium-trouble

Metadata Update from @dkirwan:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Log in to comment on this ticket.

Metadata