Skip to content
Storage Made Easy Blog

Storage Made Easy Blog

Enterprise File Fabric™

Posted on July 17, 2019 by Douglas Soltesz

Ansible: Time series data in S3 API without HEADing metadata

In my previous article I detailed sending datasets to non-AWS S3 object storage.  While working on that project I ran into another issue with the aws_s3 ansible module.  The lack of a HEAD operation to pull metadata from a bucket or object.

There are various workarounds for the lack of metadata on a bucket lising including using 3rd party modules or calling awscli via shell.  Since the dataset was time series related, I opted to add the metadata to the object name.  This proved to be a very quick solution as a GET operation against a bucket (which should return object creation time but does not) is a single operation vs HEADing each object (which could number millions or billions).

Below is an example of the solution adapted for daily backups:

  1.  When uploading files to an S3 bucket insert a date string into the object name/path
- name: PUT operation with Date
  aws_s3:
    bucket: mybucket
    object: /backups/{{ ansible_date_time.date }}/key.txt
    src: /usr/local/myfile.txt
    mode: put

By adding {{ ansible_date_time.date }} to the key (path / object name), each object has metadata which can be operated on.  This is important as unlike the ansible find module, the aws_s3 module can only list a bucket (but not filter results based on age).

2.  List objects in the backups prefix (folder / path)

    - name: Find backup objects
      aws_s3:
        bucket: mybucket
        mode: list
        prefix: /backups/
      register: backups_in_aws

This command is the S3 equivalent of the following find command against a file system:

    - name: Find backup files
      find:
        paths: /backups
        recurse: yes
        file_type: any
      register: files_in_nas

3.  Delete old files

    - name: Delete old objects
      aws_s3:
        bucket: mybucket
        object: "{{ item }}"
        mode: delobj
      loop: "{{ backups_in_aws.s3_keys }}"
      loop_control:
        label: "{{ item }}"
      when: " ( (ansible_date_time.date | to_datetime('%Y-%m-%d')) - (item.split('/')[2] | to_datetime('%Y-%m-%d')) ).days > backup_days"
      register: deleted_objects

As noted, this example is for deleting old objects, but it could be adapted for anything.  The logic is in the when clause.  Here’s a rundown of the code:

The function in step 2 stored results in a variable called backups_in_aws.  This dictionary has a list called s3_keys.  The loop function iterates through the keys and the when clause com subtracts the current date from the metadata date.

Here’s a break down of the when clause:

(ansible_date_time.date | to_datetime(‘%Y-%m-%d’)
This converts the current datetime to a format of YYYY-MM-DD

– (item.split(‘/’)[2] | to_datetime(‘%Y-%m-%d’))
This takes the second element in the object name and selects it as the date and converts it to the same datetime format as above.
Why is datetime the second element?  Look at the code in step 2:
/backups/{{ ansible_date_time.date }}/key.txt
The first item is backups, the second is the datetime, and the third is the key.  Adjust the number in brackets as needed if your path/object naming convention is different.

.days > backup_days
Once we subtract the datetimes, this takes the whole number of days with no rounding and compares it to a variable called “backup_days”.  If the time elapsed in days is great than the defined variable an action is taken.  In this case it’s mode: delobj which deletes the object.

This is just an example, but it simplifies working with object storage and time series data.

Facebooktwitterredditpinterestlinkedinmailby feather
The following two tabs change content below.
  • Bio
  • Latest Posts
My Twitter profileMy LinkedIn profileMy YouTube channel

Douglas Soltesz

Director Product Solutions at Storage Made Easy
Doug's focus is in Object and Cloud Storage APIs, Data Governance, Virtualization, and Containerization.
My Twitter profileMy LinkedIn profileMy YouTube channel

Latest posts by Douglas Soltesz (see all)

  • LucidLink Technology Preview Setup Guide - September 6, 2021
  • The features and benefits of using LucidLink with the Enterprise File Fabric - September 2, 2021

Related posts:

  1. Using Ansible with On-premises S3 Object Storage
  2. Using AWS Lambda to Automatically Sync Metadata With S3 Events
  3. If data is the new oil then metadata is the refiner
  4. How to use Ansible to automate VMware OVAs
CategoriesAmazon S3, Metadata, Storage Made Easy TagsAmazon S3, Amazon S3 compatible API, ansible, metadata

Post navigation

Previous PostPrevious Using Ansible with On-premises S3 Object Storage
Next PostNext How to use Ansible to automate VMware OVAs

Categories

SME Web CIFS

File Fabric Collaterals

White Papers
Video Gallery


Proudly powered by WordPress

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in settings.

Powered by  GDPR Cookie Compliance
Cookie Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can adjust all of your cookie settings by navigating the tabs on the left hand side.

Functional Cookies

Functional Cookies should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Analytics Cookies

This website uses:

Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Leadlander to help build intelligence about visits to our blog to provide accurate and better information for our sales teams about site visitors.

Keeping these cookies enabled helps us to improve provide more targeted services and helps us improve our site.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Privacy Policy

See our Privacy Policy and Cookie Policy for more information about the information we collect and your rights.