Skip to content
Storage Made Easy Blog

Storage Made Easy Blog

Enterprise File Fabric™

Posted on October 11, 2019 by Dan Stone

Meta-data Sync and Re-Sync Optimizations in v1906 of the Enterprise File Fabric

The File Fabric has two main modes of operation with regard to knowing what files are on a storage provider: cached mode and real-time mode.  In cached mode the File Fabric maintains, in its database, metadata about each file on the underlying storage.  As files are created, updated and deleted, the changes flow through the File Fabric and the File Fabric updates its metadata accordingly.

 

Cached mode is often used when:

  • Companies are working directly through the File Fabric with no access to the underlying storage.
  • When there are particularly large amounts of data on the remote data store that make it impossible to pull back in real-time. Millions of files on Amazon S3 are a good example of this. With cached mode the files can be accessed and searched very quickly compared to the alternative.

Cached mode presents two obvious challenges.

  1. If a storage provider that already contains files is added to the File Fabric, how does the File Fabric initialize the metadata for that provider?
  2. If the storage is being used in a bi-modal way ie. changes are made directly to the storage – how does the File Fabric learn about those changes?

The answer to the two questions are somewhat similar.  When a storage provider is added, the File Fabric populates its meta-database with an initial sync process to discover the content of the storage and thereby record the appropriate metadata in the File Fabric’s metadata database. It is probably worth pointing out that the File Fabric does not copy ofrmove the files that exist on the remote storage. The file integrity is maintained in a single location.

If changes have been made directly to the storage, the user can initiate the File Fabric’ re-sync process, which compare file information retrieved from the storage with the metadata in the File Fabric’s meta-database and updates the meta-database as needed.

There are other options to the above that are also worth mentioning:

  • The File Fabric can operate in real-time mode which negates the needs for a re-synchronization process as new data added directly to the storage is discovered in real-time as users browser directories.
  • The File Fabric can be scheduled to spider the underling storage at set intervals and update the metadata. This is particularly useful when the underlying data set is very large.

These options are not mutually exclusive and can be used in combinations.

Initial sync and re-sync are bread-and-butter operations for the File Fabric   and have been available since the File Fabric’s inception  With digital transformation and a companies digital assets doubling every year companies are now dealing with very large unstructured datasets which can now require billions of files and objects to be indexed, so it had become clear to the engineering team that these meta-sync operations would benefit from being optimized.

In the latest major File Fabric release, v1906, both types of sync operations have become much faster.  How much faster?  The actual performance depends on a host of variables such as CPU speed, network capacity, storage speed (because the storage has to tell the File Fabric about its contents), number of changes (re-sync only), background load etc, so there is no single answer, but here are two hard facts:

  1. In some situations we have seen throughput (measured as number of files sync’d per second) increase by more than 50x.
  2. We have also seen sustained throughput of more than 1,000 files per second.

How did we optimise this ? A combination of code re-factoring  improvements coupled with updated algorithms to deal with how meta-data is synchronised.

These performance improvements have enabled File Fabric  to deal with extremely large datasets with ease.

If you are planning to evaluate the File Fabric for use by your organization, be sure to include these operations in your evaluation.  We think you will be impressed.

Facebooktwitterredditpinterestlinkedinmailby feather
The following two tabs change content below.
  • Bio
  • Latest Posts

Dan Stone

Dan is COO at Storage Made Easy. He has been working with the founders since the company was launched.

Latest posts by Dan Stone (see all)

  • CISO Bulletin: Protecting the Enterprise File Fabric™ Against Third Party Software Vulnerabilities - March 30, 2022
  • CISO Bulletin: Two Factor Authentication for any Storage using the Enterprise File Fabric™ - August 6, 2021

Related posts:

  1. Using Multi-Cloud Automation Rules to detect PHI / PII / SPI data
  2. Access Microsoft Distributed File System Shares (DFS) from a web browser using the Enterprise File Fabric – Part 1
  3. The Enterprise File Fabric now includes a GDPR Compliance Healthcheck feature for on-premises and on-cloud data
  4. Enterprise File Fabric Use Cases
CategoriesEnterprise File Fabric, Storage Made Easy Tags1906, File Fabric, meta-data, metadata, multi-cloud, multicolour, optimisations

Post navigation

Previous PostPrevious Using Rclone with the SME S3 Drive for Perfect Data Migrations
Next PostNext Using the File Fabric’s SFTP protocol adaptor to work with multi-cloud files from a Linux Console

Categories

SME Web CIFS

File Fabric Collaterals

White Papers
Video Gallery


Proudly powered by WordPress

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in settings.

Powered by  GDPR Cookie Compliance
Cookie Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can adjust all of your cookie settings by navigating the tabs on the left hand side.

Functional Cookies

Functional Cookies should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Analytics Cookies

This website uses:

Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Leadlander to help build intelligence about visits to our blog to provide accurate and better information for our sales teams about site visitors.

Keeping these cookies enabled helps us to improve provide more targeted services and helps us improve our site.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Privacy Policy

See our Privacy Policy and Cookie Policy for more information about the information we collect and your rights.