This morning the Jenkins service in the ceph-csi project seems unavailable:
ceph-csi
$ oc get pods -l name=jenkins NAME READY STATUS RESTARTS AGE jenkins-42-hmztr 0/1 CrashLoopBackOff 9 (43s ago) 22m
Logs suggest that the container-image is not fully functional:
$ oc logs jenkins-42-hmztr 2022/04/12 06:13:05 [go-init] No pre-start command defined, skip 2022/04/12 06:13:05 [go-init] Main command launched : /usr/libexec/s2i/run alternatives version 1.13 - Copyright (C) 2001 Red Hat, Inc. This may be freely redistributed under the terms of the GNU Public License. usage: alternatives --install <link> <name> <path> <priority> [--initscript <service>] [--family <family>] [--slave <slave_link> <slave_name> <slave_path>]* alternatives --remove <name> <path> alternatives --auto <name> alternatives --config <name> alternatives --display <name> alternatives --set <name> <path> alternatives --list alternatives --remove-all <name> alternatives --add-slave <name> <path> <slave_link> <slave_name> <slave_path> alternatives --remove-slave <name> <path> <slave_name> common options: --verbose --test --help --usage --version --keep-missing --altdir <directory> --admindir <directory> alternatives version 1.13 - Copyright (C) 2001 Red Hat, Inc. This may be freely redistributed under the terms of the GNU Public License. usage: alternatives --install <link> <name> <path> <priority> [--initscript <service>] [--family <family>] [--slave <slave_link> <slave_name> <slave_path>]* alternatives --remove <name> <path> alternatives --auto <name> alternatives --config <name> alternatives --display <name> alternatives --set <name> <path> alternatives --list alternatives --remove-all <name> alternatives --add-slave <name> <path> <slave_link> <slave_name> <slave_path> alternatives --remove-slave <name> <path> <slave_name> common options: --verbose --test --help --usage --version --keep-missing --altdir <directory> --admindir <directory> CONTAINER_MEMORY_IN_MB='3072', using /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el8_4.x86_64/jre/bin/java and /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el8_4.x86_64/bin/javac Administrative monitors that contact the update center will remain active /usr/local/bin/jenkins-common.sh: line 31: java: command not found Migrating slave image configuration to current version tag ... Using JENKINS_SERVICE_NAME=jenkins Generating jenkins.model.JenkinsLocationConfiguration.xml using (/var/lib/jenkins/jenkins.model.JenkinsLocationConfiguration.xml.tpl) ... Jenkins URL set to: https://jenkins-ceph-csi.apps.ocp.ci.centos.org in file: /var/lib/jenkins/jenkins.model.JenkinsLocationConfiguration.xml + exec java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xmxm -Dfile.encoding=UTF8 -Djavamelody.displayed-counters=log,error -Djava.util.logging.config.file=/var/lib/jenkins/logging.properties -Djdk.http.auth.tunneling.disabledSchemes= -Djdk.http.auth.proxying.disabledSchemes= -Duser.home=/var/lib/jenkins -Djavamelody.application-name=jenkins -Dhudson.security.csrf.GlobalCrumbIssuerConfiguration.DISABLE_CSRF_PROTECTION=true -Djenkins.install.runSetupWizard=false -jar /usr/lib/jenkins/jenkins.war /usr/libexec/s2i/run: line 628: exec: java: not found 2022/04/12 06:13:08 [go-init] Main command failed 2022/04/12 06:13:08 [go-init] exit status 127 2022/04/12 06:13:08 [go-init] No post-stop command defined, skip
Specially the
/usr/local/bin/jenkins-common.sh: line 31: java: command not found
and similar errors are concerning.
Details about the Pod:
$ oc describe pod/jenkins-42-hmztr Name: jenkins-42-hmztr Namespace: ceph-csi Priority: 0 Node: kempty-n11.ci.centos.org/172.19.0.139 Start Time: Tue, 12 Apr 2022 07:51:07 +0200 Labels: deployment=jenkins-42 deploymentconfig=jenkins name=jenkins Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.3.126" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.3.126" ], "default": true, "dns": {} }] openshift.io/deployment-config.latest-version: 42 openshift.io/deployment-config.name: jenkins openshift.io/deployment.name: jenkins-42 openshift.io/scc: restricted Status: Running IP: 10.130.3.126 IPs: IP: 10.130.3.126 Controlled By: ReplicationController/jenkins-42 Containers: jenkins: Container ID: cri-o://1092e81f8fc5a7d02925d1665ef8883b6784ea8df3340a8e9d74653e0968c0ce Image: image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:daf3954ab992a99f1d2599fb4552dbd2fb66a7e8b1f3e5405bb11e1e3f1cb44b Image ID: image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:daf3954ab992a99f1d2599fb4552dbd2fb66a7e8b1f3e5405bb11e1e3f1cb44b Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 12 Apr 2022 08:13:05 +0200 Finished: Tue, 12 Apr 2022 08:13:09 +0200 Ready: False Restart Count: 9 Limits: memory: 3Gi Requests: memory: 3Gi Liveness: http-get http://:8080/login delay=420s timeout=240s period=360s #success=1 #failure=2 Readiness: http-get http://:8080/login delay=3s timeout=240s period=10s #success=1 #failure=3 Environment: OPENSHIFT_ENABLE_OAUTH: true OPENSHIFT_ENABLE_REDIRECT_PROMPT: true DISABLE_ADMINISTRATIVE_MONITORS: True KUBERNETES_MASTER: https://kubernetes.default:443 KUBERNETES_TRUST_CERTIFICATES: true JENKINS_SERVICE_NAME: jenkins JNLP_SERVICE_NAME: jenkins-jnlp Mounts: /var/lib/jenkins from jenkins-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jn66r (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: jenkins-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: jenkins ReadOnly: false kube-api-access-jn66r: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 27m default-scheduler Successfully assigned ceph-csi/jenkins-42-hmztr to kempty-n11.ci.centos.org Normal AddedInterface 27m multus Add eth0 [10.130.3.126/23] from openshift-sdn Normal Pulling 27m kubelet Pulling image "image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:daf3954ab992a99f1d2599fb4552dbd2fb66a7e8b1f3e5405bb11e1e3f1cb44b" Normal Pulled 26m kubelet Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:daf3954ab992a99f1d2599fb4552dbd2fb66a7e8b1f3e5405bb11e1e3f1cb44b" in 17.617534467s Normal Pulled 25m (x4 over 26m) kubelet Container image "image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:daf3954ab992a99f1d2599fb4552dbd2fb66a7e8b1f3e5405bb11e1e3f1cb44b" already present on machine Normal Created 25m (x5 over 26m) kubelet Created container jenkins Normal Started 25m (x5 over 26m) kubelet Started container jenkins Warning BackOff 2m8s (x116 over 26m) kubelet Back-off restarting failed container
Rolling back the deployment did not work, it also picked the same image (sha), so that probably is expected.
Resetting the triggers, somehow caused the Jenkins pod to come up again:
$ oc set triggers dc/jenkins --auto deploymentconfig.apps.openshift.io/jenkins triggers updated $ oc get pods -l name=jenkins NAME READY STATUS RESTARTS AGE jenkins-45-fb7qp 1/1 Running 0 7m3s $ oc describe pod/jenkins-45-fb7qp | grep -m1 image Image: image-registry.openshift-image-registry.svc:5000/openshift/jenkins@sha256:080ef5f3fe4e00cbf79ad765fb6adda32f69af540a9ad1f88198e49b23647999
It seems a new image was picked-up now, and the service is running again.
Unclear what the cause was, but I guess there is no need to keep this issue open.
Metadata Update from @devos: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.