#106 very slow performance on https://osci-jenkins-2.ci.fedoraproject.org
Closed: Fixed by dkirwan. Opened by bgoncalv.

Just loading jenkins page sometimes I get "504 Gateway Time-out"

Also, jobs are taking too long to run each step they time out...

https://osci-jenkins-2.ci.fedoraproject.org/job/fedora-scratch-build-pipeline/

instance: https://osci-jenkins-2.ci.fedoraproject.org


Metadata Update from @siddharthvipul1:
- Issue assigned to dkirwan
- Issue tagged with: centos-ci-infra, groomed, high-gain

I couldn't find anything that could explain the problem on jenkins log

Updating the ticket with the information form irc chat.

(16:40:18) siddharthvipul: hmm, so from what I understand, that's a NFS export issue (we have a fix but that requires an outage that we are planning)

(09:36:21) siddharthvipul: jbair, bgoncalv hey, sorry nowadays I have started closing my laptop at hard stop of 10pm :) I was away
(09:36:40) siddharthvipul: and re: the outage.. fabian is in discussion with the person in datacenter and they need to sync
(09:37:05) siddharthvipul: It would be somewhere in the last week of September (graceful shutdown of OCP4 would be needed)

(15:09:43) siddharthvipul: bgoncalv, hey, sorry I was away for some time.. update would be that we need to upgrade hardware and for that there would be a need of outage.. It's likely to be scheduled at the end of this month.. regarding the issue itself, it's a limitation of network band. From the monitoring it doesn't look too bad but it's a collective of all other jobs as well and I am not sure if there is something passing the openshift monitoring
(15:10:23) siddharthvipul: on Monday "out openshift expert" will be back and will be assigned to work on it (while I work on #4) :)

Now jenkins seems to be completely down:

Application is not available

On openshift POD I see events like:

PodPosci-jenkins-2-b894b756-7jd74NamespaceNSfedora-ci-jenkins-prod
4 minutes ago
Generated from kubelet on kempty-n9.ci.centos.org
1478 times in the last 6 days
Readiness probe failed: Get http://10.128.2.189:8080/login: dial tcp 10.128.2.189:8080: connect: connection refused

PodPosci-jenkins-2-b894b756-7jd74NamespaceNSfedora-ci-jenkins-prod
Sep 15, 4:14 pm
Generated from kubelet on kempty-n9.ci.centos.org
15 times in the last 2 days
Readiness probe failed: HTTP probe failed with statuscode: 503

PodPosci-jenkins-2-b894b756-7jd74NamespaceNSfedora-ci-jenkins-prod
Sep 15, 4:09 pm
Generated from kubelet on kempty-n9.ci.centos.org
2104 times in the last 2 days
Back-off restarting failed container

We think we understand the issue, see https://pagure.io/centos-infra/issue/53#comment-686574 for more information as to when this will hopefully be resolved.

Metadata Update from @dkirwan:
- Issue marked as depending on: #53

Metadata Update from @dkirwan:
- Issue untagged with: groomed
- Issue priority set to: None (was: Needs Review)
- Issue tagged with: medium-trouble

Should be resolved now.

Metadata Update from @dkirwan:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

Log in to comment on this ticket.

Metadata