#36 Handle waiting on AWS images gracefully
Closed by jcline. Opened by jcline.

Prod crashed with

[2024-11-14 22:15:14,113 fedora_messaging.twisted.consumer ERROR] Received unexpected exception from consumer Consumer(queue=cloud-image-uploader, callback=<fedora_image_uploader.handler.Uploader object at 0x7f30189e1fd0>)
Traceback (most recent call last):
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/fedora_messaging/twisted/consumer.py", line 220, in _read_one
    yield d
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/twisted/python/threadpool.py", line 269, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
                                ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        ctx, func, *args, **kw
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/fedora_image_uploader/handler.py", line 160, in __call__
    handler(image, ffrel)
    ~~~~~~~^^^^^^^^^^^^^^
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/fedora_image_uploader/aws.py", line 90, in __call__
    regions_to_amis = self.aws_copy_image_to_regions(image, ffrel, image_id, ami_name)
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/fedora_image_uploader/aws.py", line 326, in aws_copy_image_to_regions
    waiter.wait(ImageIds=[image_id])
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/botocore/waiter.py", line 55, in wait
    Waiter.wait(self, **kwargs)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/srv/image-uploader/venv/lib64/python3.13/site-packages/botocore/waiter.py", line 387, in wait
    raise WaiterError(
    ...<3 lines>...
    )
botocore.exceptions.WaiterError: Waiter ImageAvailable failed: Max attempts exceeded
[2024-11-14 22:15:14,214 fedora_messaging.cli ERROR] Unexpected error occurred in consumer Consumer(queue=cloud-image-uploader, callback=<fedora_image_uploader.handler.Uploader object at 0x7f30189e1fd0>): <twisted.python.failure.Failure botocore.exceptions.WaiterError: Waiter ImageAvailable failed: Max attempts exceeded>

When it gave up waiting for an AMI to replicate to a region. We probably aren't catching the right exception, we should handle the waitererror by nacking the message and trying again later (logging at error level maybe, just to ensure it doesn't go unnoticed forever).

We might also want to increase the wait time a bit.


Metadata Update from @jcline:
- Issue status updated to: Closed (was: Open)

Metadata
Related Pull Requests