In my previous article I detailed sending datasets to non-AWS S3 object storage. While working on that project I ran into another issue with the aws_s3 ansible module. The lack of a HEAD operation to pull metadata from a bucket or object.
There are various workarounds for the lack of metadata on a bucket lising including using 3rd party modules or calling awscli via shell. Since the dataset was time series related, I opted to add the metadata to the object name. This proved to be a very quick solution as a GET operation against a bucket (which should return object creation time but does not) is a single operation vs HEADing each object (which could number millions or billions).
Below is an example of the solution adapted for daily backups:
- When uploading files to an S3 bucket insert a date string into the object name/path
- name: PUT operation with Date aws_s3: bucket: mybucket object: /backups/{{ ansible_date_time.date }}/key.txt src: /usr/local/myfile.txt mode: put
By adding {{ ansible_date_time.date }} to the key (path / object name), each object has metadata which can be operated on. This is important as unlike the ansible find module, the aws_s3 module can only list a bucket (but not filter results based on age).
2. List objects in the backups prefix (folder / path)
- name: Find backup objects
aws_s3:
bucket: mybucket
mode: list
prefix: /backups/
register: backups_in_aws
This command is the S3 equivalent of the following find command against a file system:
- name: Find backup files find: paths: /backups recurse: yes file_type: any register: files_in_nas
3. Delete old files
- name: Delete old objects aws_s3: bucket: mybucket object: "{{ item }}" mode: delobj loop: "{{ backups_in_aws.s3_keys }}" loop_control: label: "{{ item }}" when: " ( (ansible_date_time.date | to_datetime('%Y-%m-%d')) - (item.split('/')[2] | to_datetime('%Y-%m-%d')) ).days > backup_days" register: deleted_objects
As noted, this example is for deleting old objects, but it could be adapted for anything. The logic is in the when clause. Here’s a rundown of the code:
The function in step 2 stored results in a variable called backups_in_aws. This dictionary has a list called s3_keys. The loop function iterates through the keys and the when clause com subtracts the current date from the metadata date.
Here’s a break down of the when clause:
(ansible_date_time.date | to_datetime(‘%Y-%m-%d’)
This converts the current datetime to a format of YYYY-MM-DD
– (item.split(‘/’)[2] | to_datetime(‘%Y-%m-%d’))
This takes the second element in the object name and selects it as the date and converts it to the same datetime format as above.
Why is datetime the second element? Look at the code in step 2:
/backups/{{ ansible_date_time.date }}/key.txt
The first item is backups, the second is the datetime, and the third is the key. Adjust the number in brackets as needed if your path/object naming convention is different.
.days > backup_days
Once we subtract the datetimes, this takes the whole number of days with no rounding and compares it to a variable called “backup_days”. If the time elapsed in days is great than the defined variable an action is taken. In this case it’s mode: delobj which deletes the object.
This is just an example, but it simplifies working with object storage and time series data.







Douglas Soltesz
Latest posts by Douglas Soltesz (see all)
- LucidLink Technology Preview Setup Guide - September 6, 2021
- The features and benefits of using LucidLink with the Enterprise File Fabric - September 2, 2021