#303 Transition to NEED_GUIDANCE on certread subprocess failure
Opened by mrd000. Modified
mrd000/certmonger fix-for-missing-nss  into  master

Summary

  • Error handling: Check exit status in CM_READING_CERT handler; transition to
    NEED_GUIDANCE on failure (e.g. can't read the cert or library loading failure) instead of silently proceeding toMONITORING
    • Add finished_reading vtable method to certread API to check
      subprocess exit status after completion (mirrors keyiread pattern)
    • Implement in both NSS (certread-n) and OpenSSL (certread-o)
      backends
  • Fix certread-o bug: _exit(0) always reported success
    regardless of actual status

Note: the test uses NSS to fail, but the fix works for any general issues with NSS including failure to load the library in the subprocess

Problem

When NSS_InitContext() fails (e.g. missing/corrupt database),
certread-n calls _exit(CM_SUB_STATUS_ERROR_INITIALIZING) without
writing data to the pipe. The parent process ignores the exit status
and proceeds to MONITORING with stale certificate data, causing
infinite renewal loops every second. I saw 600k duplicate certificates
created that was in IPA, most of them cannot be removed with the existing
IPA tools and cause other issues

Test plan

  • [ x] make check passes (010-iterate updated to expect NEED_GUIDANCE)
  • [ ] Verify with an actual corrupt/missing NSS database that the
    entry reaches NEED_GUIDANCE and stops retrying

Fixes: https://pagure.io/certmonger/issue/302

Thanks for the patch and apologies for the delay in responding. It may take me some time to do a thorough review.

@rcritten Thanks. Please let me know if I can help in any way and appreciate your attention to this issue. It makes me really nervous about how to delete those 600k certs from the system now. But, hopefully, this will help others to avoid getting into the similar situation