Azure Data Factory Partition Performance

Summary

There are various optimisation techniques available during the Dataflow Sink stage, in this post, I’m going to present some performance test results using various partition models. In this post, I am going to look at three: Round Robin, Hash and Dynamic Range.

Performance Baseline

  • Settings: Output to Single File.
  • Partition: Single.

With a Single Partition in place, the average runtime for a Triggered Pipeline is:

Pipeline NameRun StartRun EndDurationTriggered ByStatusSize
Pipeline 1Jun 17, 2022, 11:56:10 amJun 17, 2022, 12:01:36 pm00:05:26Manual triggerSucceeded280 MiB
Pipeline 1Jun 17, 2022, 12:14:07 pmJun 17, 2022, 12:19:17 pm00:05:09Manual triggerSucceeded280 MiB

Round Robin Partition

  1. Settings: Default
  2. Partition: Set Partitioning (Round Robin)
Pipeline NamePartitionsRun StartRun EndDurationTriggered ByStatusSize
Pipeline 12Jun 17, 2022, 12:32:18 pmJun 17, 2022, 12:36:51 pm00:04:33Manual triggerSucceededTwo files:

140.04 MiB
Pipeline 12Jun 17, 2022, 12:50:14 pmJun 17, 2022, 12:54:49 pm00:04:35Manual triggerSucceededTwo files:

140.05 MiB
Pipeline 14Jun 17, 2022, 1:41:26 pmJun 17, 2022, 1:47:52 pm00:06:25Manual triggerSucceededFour files:
70.03 MiB
Pipeline 14Jun 17, 2022, 1:59:00 pmJun 17, 2022, 2:03:23 pm00:04:23Manual triggerSucceededFour files:
70.03 MiB

Hash Partition

  1. Settings: Default
  2. Partition: Set Partitioning (Hash)
Pipeline NamePartitionsRun StartRun EndDurationTriggered ByStatusSize
Pipeline 12LocalAuthorityCodeJun 17, 2022, 3:11:14 pmJun 17, 2022, 3:15:23 pm00:04:09Manual triggerSucceededTwo files:

140.07 MiB
Pipeline 12PostcodeJun 17, 2022, 3:22:25 pmJun 17, 2022, 3:26:34 pm00:04:09Manual triggerSucceededTwo files:

140.07 MiB
Pipeline 14LocalAuthorityCodeJun 17, 2022, 3:59:05 pmJun 17, 2022, 4:03:37 pm00:04:32Manual triggerSucceededFour files:
70.03 MiB
Pipeline 14PostcodeJun 17, 2022, 3:30:13 pmJun 17, 2022, 3:34:53 pm00:04:39Manual triggerSucceededFour files:
70.03 MiB

Dynamic Range Partition

  1. Settings: Default
  2. Partition: Set Partitioning (Dynamic Range)
Pipeline NamePartitionsRun StartRun EndDurationTriggered ByStatusSize
Pipeline 12LocalAuthorityCodeJun 17, 2022, 4:23:35 pmJun 17, 2022, 4:27:55 pm00:04:20Manual triggerSucceededTwo files:

140.07 MiB
Pipeline 12PostcodeJun 17, 2022, 4:08:50 pmJun 17, 2022, 4:13:30 pm00:04:40Manual triggerSucceededTwo files:

140.07 MiB
Pipeline 14LocalAuthorityCodeJun 17, 2022, 3:59:05 pmJun 17, 2022, 4:03:37 pmManual triggerSucceededFour files:
70.03 MiB
Pipeline 14PostcodeJun 17, 2022, 3:30:13 pmJun 17, 2022, 3:34:53 pmManual triggerSucceededFour files:
70.03 MiB