We have been observing very frequent jenkins pod restart since the last rollout which happened on 28th June 2022. Till now there are 48 restarts i.e, ~8 restarts per day(approximately every 3 hrs) and following are the last few lines from the output of oc logs jenkins-xxxxx --previous:
oc logs jenkins-xxxxx --previous
2022-07-01 04:17:15 INFO winstone.Logger logInternal JVM is terminating. Shutting down Jetty 2022/07/01 04:17:15 [go-init] Main command failed 2022/07/01 04:17:15 [go-init] exit status 143 2022/07/01 04:17:15 [go-init] No post-stop command defined, skip
As far as any events are considered, failure in executing liveness and readiness probes are shown large number of times.
Liveness probe failed: HTTP probe failed with statuscode: 500 Readiness probe failed: HTTP probe failed with statuscode: 500
Despite these frequent restarts, all our configured jobs(scheduled and other triggers) are happening without any issues.
Every restart removes System Admin e-mail address configured for jenkins which then prevents sending notifications.
Metadata Update from @zlopez: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: centos-ci-infra, low-gain, medium-trouble
FYI: https://jenkins-samba.apps.ocp.ci.centos.org/
This has started affecting scheduled jobs which gets terminated in between due to jenkins restart.
And now the web UI doesn't even come up. This has become a blocker in our CI workflow. Can someone please look into this asap?
I did 2-3 rollout in the last couple of days and despite having some good amount of restarts in the beginning, things have started to settle down. There hasn't been any (auto)restart of jenkins pod since last 3 days. I don't know what magic was done to get it fixed. I'll keep it open for few more days.
@anoopcs sorry for the delay in response here. There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause
There was an issue with a node acting up and a cluster operator not working correctly. I'm not sure how it relates to your issue but that was fixed on Monday so may have been the cause
A week passed without any restart of Jenkins pod. Looking at the timings this more or less aligns with the node issue which got fixed a wee back. Therefore closing the issue.
Metadata Update from @anoopcs: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.