You have S3 event notifications pushing to SQS. Something went wrong in your pipeline and you need to reprocess a bunch of files. You don’t want to download and re-upload them - that’s slow and wastes bandwidth.
The trick is --metadata-directive REPLACE. When you copy a file to itself with this flag, S3 treats it as a new write and fires the event notification again.
aws s3 cp s3://my-bucket/events/ s3://my-bucket/events/ \
--recursive \
--exclude "*" \
--include "*.jsonl.gz" \
--metadata-directive REPLACE
This copies files in place, triggering S3 events for each one. Your SQS queue fills up, your pipeline reprocesses everything.
A few things to keep in mind:
- This copies files to themselves, so there’s no data transfer cost between regions
- It does count as PUT requests, so you’ll pay for those
- Use
--includeand--excludeto target specific files - Add
--dryrunfirst to see what would be affected - Consider rate limiting with
--quietif you’re processing millions of files
Works great for backfills and reprocessing after pipeline fixes.