Issue #834: Frequent jenkins pod restart since last rollout on 28th June 2022 - centos-infra

centos-infra

#834 Frequent jenkins pod restart since last rollout on 28th June 2022

Closed: Fixed Jul 18, 2022 by anoopcs. Opened Jul 4, 2022 by anoopcs.

We have been observing very frequent jenkins pod restart since the last rollout which happened on 28th June 2022. Till now there are 48 restarts i.e, ~8 restarts per day(approximately every 3 hrs) and following are the last few lines from the output of oc logs jenkins-xxxxx --previous:

2022-07-01 04:17:15 INFO    winstone.Logger logInternal JVM is terminating. Shutting down Jetty
2022/07/01 04:17:15 [go-init] Main command failed
2022/07/01 04:17:15 [go-init] exit status 143
2022/07/01 04:17:15 [go-init] No post-stop command defined, skip

As far as any events are considered, failure in executing liveness and readiness probes are shown large number of times.

Liveness probe failed: HTTP probe failed with statuscode: 500
Readiness probe failed: HTTP probe failed with statuscode: 500

Despite these frequent restarts, all our configured jobs(scheduled and other triggers) are happening without any issues.

anoopcs commented Jul 4, 2022

Every restart removes System Admin e-mail address configured for jenkins which then prevents sending notifications.

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: centos-ci-infra, low-gain, medium-trouble

Jul 4, 2022

anoopcs commented Jul 4, 2022

FYI: https://jenkins-samba.apps.ocp.ci.centos.org/

anoopcs commented Jul 5, 2022

This has started affecting scheduled jobs which gets terminated in between due to jenkins restart.

anoopcs commented Jul 9, 2022

And now the web UI doesn't even come up. This has become a blocker in our CI workflow. Can someone please look into this asap?

anoopcs commented Jul 14, 2022

I did 2-3 rollout in the last couple of days and despite having some good amount of restarts in the beginning, things have started to settle down. There hasn't been any (auto)restart of jenkins pod since last 3 days. I don't know what magic was done to get it fixed. I'll keep it open for few more days.

mobrien commented Jul 14, 2022

@anoopcs sorry for the delay in response here. There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause

anoopcs commented Jul 18, 2022

There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause

A week passed without any restart of Jenkins pod. Looking at the timings this more or less aligns with the node issue which got fixed a wee back. Therefore closing the issue.

Metadata Update from @anoopcs:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Jul 18, 2022

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Low

centos-infra

#834 Frequent jenkins pod restart since last rollout on 28th June 2022 Closed: Fixed Jul 18, 2022 by anoopcs. Opened Jul 4, 2022 by anoopcs.

Metadata

centos-ci-infra low-gain medium-trouble

#834 Frequent jenkins pod restart since last rollout on 28th June 2022

Closed: Fixed Jul 18, 2022 by anoopcs. Opened Jul 4, 2022 by anoopcs.