#26 Support ELN composes
Closed by jcline. Opened by adamwill.

I caught a discussion between @ngompa and @sgallagh about publishing ELN cloud images. ELN is a bit special in two ways:

  1. It's an 'odd release', I think we can treat it about the same as we do Rawhide
    ~~2. It's composed by ODCS, not by scripts in pungi-fedora, so we don't get the same kind of messages as we do for other composes~~ edit: ODCS is going away soon, we'll wait for the composes to be converted to run from a script like the others

To handle 1) I think we just need to extend anywhere where we treat "rawhide" as special to treat "eln" as special in the same way (after downcasing, natch).


The actual topic to subscribe to for ODCS is org.fedoraproject.prod.odcs.compose.state-changed and a sample message is this one for e.g.

tagging @davdunc who'd probably be interested.

Neal reminded me that ELN might be moving off ODCS - https://pagure.io/pungi-fedora/pull-request/1304 . We could wait for that and not have to bother implementing an ODCS consumer.

I'm aiming to be done with the ODCS transition within the next week, so yeah; let's not bother with ODCS.

I'm aiming to be done with the ODCS transition within the next week, so yeah; let's not bother with ODCS.

Cool. Let me know when it's all done and I'll take care of this. I think treating it as we treat Rawhide is reasonable, if that sounds good to you.

Right now we don't have a retention policy (implemented here, anyway) for AWS, but for Azure we keep the last week of Rawhide images. Is that enough or do you want a slightly longer history?

I'm aiming to be done with the ODCS transition within the next week, so yeah; let's not bother with ODCS.

Cool. Let me know when it's all done and I'll take care of this. I think treating it as we treat Rawhide is reasonable, if that sounds good to you.

Right now we don't have a retention policy (implemented here, anyway) for AWS, but for Azure we keep the last week of Rawhide images. Is that enough or do you want a slightly longer history?

The last week is probably fine. I'll note that we generally run composes multiple times per day though; currently this is every four hours. If that's going to be too frequent for image retention, we can talk about only pushing once a day, perhaps?

I'm aiming to be done with the ODCS transition within the next week, so yeah; let's not bother with ODCS.

Cool. Let me know when it's all done and I'll take care of this. I think treating it as we treat Rawhide is reasonable, if that sounds good to you.

Right now we don't have a retention policy (implemented here, anyway) for AWS, but for Azure we keep the last week of Rawhide images. Is that enough or do you want a slightly longer history?

The last week is probably fine. I'll note that we generally run composes multiple times per day though; currently this is every four hours. If that's going to be too frequent for image retention, we can talk about only pushing once a day, perhaps?

While we obviously can't keep all images for all time, if there's value to keeping (for example) a month's worth of images at 4 images a day I think that won't cause a problem. Or, if you want just the one image per day I can do that too. Just let me know what's most useful for you.

While we obviously can't keep all images for all time, if there's value to keeping (for example) a month's worth of images at 4 images a day I think that won't cause a problem. Or, if you want just the one image per day I can do that too. Just let me know what's most useful for you.

I think one image per day is likely plenty.

The https://pagure.io/pungi-fedora/pull-request/1304 merge request is in pretty good shape now and I'm hopeful it will land in the next couple days. What can I do to help with this ticket?

I don't anticipate a ton of work, really, just a few changes from "if rawhide" to "if rawhide or eln".

It looks like Adam already released fedfind with eln support so if you could just drop a datagrepper link to a completed eln compose message when it's merged I can add the special-casing for it and test in stage by replaying that message for the consumer to make sure it's all good.

It should be basically the same as https://apps.fedoraproject.org/datagrepper/v2/id?id=eb8a70e9-cd69-4d40-8ea6-d29e554cde06&is_raw=true&size=extra-large (except that the status will be FINISHED or FINISHED_INCOMPLETE).

Images are being uploaded to Azure and AWS now. It does seem that they don't have sudo available which is a little problematic since I didn't provision with a password:

$ az vm create --location eastus2 --name eln-x64 --resource-group jcline-eln-testing --image /CommunityGalleries/Fedora-5e266ba4-2250-406d-adad-5d73860d958f/Images/Fedora-Cloud-ELN-x64/Versions/latest --size Standard_D2ls_v5 --generate-ssh-key --accept-term
$ ssh jeremycline@...
[jeremycline@eln-x64 ~]$ hostnamectl
Failed to query product UUID, ignoring: Access denied
Failed to query hardware serial, ignoring: Access denied
     Static hostname: eln-x64
           Icon name: computer-vm
             Chassis: vm 🖴
          Machine ID: b9ea32ab4abd4610b5c62a0d6dc763ca
             Boot ID: e56e2de188b54f7daaa1d6cf4d77ca1f
      Virtualization: microsoft
    Operating System: Fedora ELN
         CPE OS Name: cpe:/o:fedoraproject:fedora:42
      OS Support End: Tue 2025-05-13
OS Support Remaining: 7month 3w 4d
              Kernel: Linux 6.11.0-63.eln142.x86_64
        Architecture: x86-64
     Hardware Vendor: Microsoft Corporation
      Hardware Model: Virtual Machine
    Firmware Version: Hyper-V UEFI Release v4.1
       Firmware Date: Mon 2024-05-13
        Firmware Age: 4month 4d
[jeremycline@eln-x64 ~]$ sudo
-bash: sudo: command not found

One last question: do you want it to handle container images as well? I need to tweak that job if so.

Huh... they definitely have sudo available in the repositories. I'm not actually certain why it would be missing from the images. Maybe there's something missing in our kiwi config. I'll look into that.

Yes, we definitely need container images as well. Thanks!

@jcline I'm not sure why sudo would be missing for you. I've just checked the Generic cloud image by loading it into KVM locally using Cockpit's importer (I'm not sure if it uses cloud-init or ignition under the hood) and sudo was available there.

Both the Generic and AmazonEC2 images are built from the same config (with AmazonEC2 having a couple extra packages on top), so I don't know why it would be missing.

Also, can we move ahead on the "hacky" approach to getting ELN container images uploading? A side-effect of the compose changes meant that our old approach to uploading to quay.io/fedoraci/fedora:eln broke and I'd rather just get the new ones loading rather than trying to figure out how to fix that ancient hack.

so I was gonna say it should 'just work' when we deploy this tool for container image uploading, but I'm looking at it and I found two reasons it's wrong:

  1. the code needs a minor tweak to use a sensible tag for ELN
  2. ELN composes have incorrect metadata. The subvariant for every image is its variant. That's not what the subvariant should be, and it will cause this tool not to recognize any container images, because it identifies them by their subvariant and type (which is how you're supposed to do it). See https://pagure.io/cloud-image-uploader/blob/main/f/fedora-image-uploader/fedora_image_uploader/handler.py#_891 . We need the subvariant for the cloud base images to be Cloud_Base in order for the tool to find them. I'll see if I can figure out why this is.

As a footnote, the 'type' for GCE cloud images is "docker", which seems wrong. Not sure if that's wrong in non-ELN composes too, I'll look.

OK, so after looking into this and talking about it with @sgallagh , the subvariant being "BaseOS" isn't necessarily wrong. RHEL has more variants than Fedora and fewer images, so often the variant alone does kinda identify the 'payload' of the image. This is a BaseOS container image. Assuming RHEL won't want to produce multiple different container images within the BaseOS variant, that's actually kinda fine. It looks like real RHEL composes also have the subvariant identical to the variant for most images, with just a few cloud images using specific subvariants since otherwise they'd fail the 'unique metadata' check in productmd.

So, I've sent https://pagure.io/cloud-image-uploader/pull-request/30 which accepts "BaseOS" as a valid subvariant for a container image, and should handle ELN container image upload correctly as per @sgallagh 's suggestion of how it should work: repository 'eln', and only one tag, 'latest'. This will mean the images would show up at https://quay.io/repository/fedora/eln with the 'latest' tag in Quay, and https://registry.fedoraproject.org/repo/eln/tags/ with tag "latest" on the Fedora registry.

https://pagure.io/pungi/pull-request/1788 should fix the type of the GCE images.

I think this is all wrapped up at this point, so I'm going to close this as done. Let me know if there's anything missing!

Metadata Update from @jcline:
- Issue status updated to: Closed (was: Open)

Metadata