#834 Frequent jenkins pod restart since last rollout on 28th June 2022
Closed: Fixed by anoopcs. Opened by anoopcs.

We have been observing very frequent jenkins pod restart since the last rollout which happened on 28th June 2022. Till now there are 48 restarts i.e, ~8 restarts per day(approximately every 3 hrs) and following are the last few lines from the output of oc logs jenkins-xxxxx --previous:

2022-07-01 04:17:15 INFO    winstone.Logger logInternal JVM is terminating. Shutting down Jetty
2022/07/01 04:17:15 [go-init] Main command failed
2022/07/01 04:17:15 [go-init] exit status 143
2022/07/01 04:17:15 [go-init] No post-stop command defined, skip

As far as any events are considered, failure in executing liveness and readiness probes are shown large number of times.

Liveness probe failed: HTTP probe failed with statuscode: 500

Readiness probe failed: HTTP probe failed with statuscode: 500

Despite these frequent restarts, all our configured jobs(scheduled and other triggers) are happening without any issues.


Every restart removes System Admin e-mail address configured for jenkins which then prevents sending notifications.

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: centos-ci-infra, low-gain, medium-trouble

FYI: https://jenkins-samba.apps.ocp.ci.centos.org/

This has started affecting scheduled jobs which gets terminated in between due to jenkins restart.

And now the web UI doesn't even come up. This has become a blocker in our CI workflow. Can someone please look into this asap?

I did 2-3 rollout in the last couple of days and despite having some good amount of restarts in the beginning, things have started to settle down. There hasn't been any (auto)restart of jenkins pod since last 3 days. I don't know what magic was done to get it fixed. I'll keep it open for few more days.

@anoopcs sorry for the delay in response here. There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause

There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause

A week passed without any restart of Jenkins pod. Looking at the timings this more or less aligns with the node issue which got fixed a wee back. Therefore closing the issue.

Metadata Update from @anoopcs:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Log in to comment on this ticket.

Metadata