Azure Storage – select the best storage type for the job

Summary

In this post, I want to share some basic insights before selecting an Azure storage type. Before you provision Azure storage you’re presented with several options. The most important configurations being the ‘Performance‘, ‘Account Kind‘, ‘Replication‘ and ‘Blob access tier‘.

It’s important to select the right storage for the job, otherwise you’ll end up paying a premium for the storage.

Performance

General-purpose storage accounts may be configured for either of the following performance tiers:

  • A standard performance tier for storing blobs, files, tables, queues, and Azure virtual machine disks. For more information about scalability targets for standard storage accounts, see Scalability targets for standard storage accounts.
  • A premium performance tier for storing unmanaged virtual machine disks. Microsoft recommends using managed disks with Azure virtual machines instead of unmanaged disks. For more information about scalability targets for the premium performance tier, see Scalability targets for premium page blob storage accounts.

Account Kind

The following table describes the types of accounts available.

Storage account typeSupported servicesSupported performance tiersSupported access tiersReplication optionsDeployment model1Encryption2
General-purpose V2Blob, File, Queue, Table, Disk, and Data Lake Gen26Standard, Premium5Hot, Cool, Archive3LRS, GRS, RA-GRS, ZRS, GZRS (preview), RA-GZRS (preview)4Resource ManagerEncrypted
General-purpose V1Blob, File, Queue, Table, and DiskStandard, Premium5N/ALRS, GRS, RA-GRSResource Manager, ClassicEncrypted
BlockBlobStorageBlob (block blobs and append blobs only)PremiumN/ALRS, ZRS4Resource ManagerEncrypted
FileStorageFile onlyPremiumN/ALRS, ZRS4Resource ManagerEncrypted
BlobStorageBlob (block blobs and append blobs only)StandardHot, Cool, Archive3LRS, GRS, RA-GRSResource ManagerEncrypted

Microsoft recommends that for most scenarios you should always opt for the General-purpose V2 storage type as this supports Access Tiers.

The other important point to note here is that General-purpose V2 supports Data Lake technology. You will not be able to use the Data Lake unless you provision this type of account.

To provision Azure Storage with the Data Lake, make sure you select ‘Hierarchical namespace’.

Replication

Under Replication, you have several high-availability configurations that you can select from.

  • Locally redundant storage (LRS): A simple, low-cost redundancy strategy. Data is copied synchronously three times within the primary region.
  • Zone-redundant storage (ZRS): Redundancy for scenarios requiring high availability. Data is copied synchronously across three Azure availability zones in the primary region.
  • Geo-redundant storage (GRS): Cross-regional redundancy to protect against regional outages. Data is copied synchronously three times in the primary region, then copied asynchronously to the secondary region. For read access to data in the secondary region, enable read-access geo-redundant storage (RA-GRS).
  • Geo-zone-redundant storage (GZRS) (currently preview): Redundancy for scenarios requiring both high availability and maximum durability. Data is copied synchronously across three Azure availability zones in the primary region, then copied asynchronously to the secondary region. For read access to data in the secondary region, enable read-access geo-zone-redundant storage (RA-GZRS).

Blob access tier

The available access tiers are:

  • The Hot access tier. This tier is optimized for frequent access of objects in the storage account. Accessing data in the hot tier is most cost-effective, while storage costs are higher. New storage accounts are created in the hot tier by default.
  • The Cool access tier. This tier is optimized for storing large amounts of data that is infrequently accessed and stored for at least 30 days. Storing data in the cool tier is more cost-effective, but accessing that data may be more expensive than accessing data in the hot tier.
  • The Archive tier. This tier is available only for individual block blobs. The archive tier is optimized for data that can tolerate several hours of retrieval latency and that will remain in the archive tier for at least 180 days. The archive tier is the most cost-effective option for storing data. However, accessing that data is more expensive than accessing data in the hot or cool tiers.

Which storage type to select?

For simple testing, my suggestions is:

  1. Performance: Standard
  2. Account Kind: General-purpose V2
  3. Replication: Locally-redundant storage LRS
  4. Blob access tier: Cool

The above configuration will help keep costs as low as possible, with the caveat being the costs are determined by actual usage.