Azure Data Factory Partition Performance
Post by: syed hussain in All Azure Data Factory Reference Architecture
Summary
There are various optimisation techniques available during the Dataflow Sink stage, in this post, I’m going to present some performance test results using various partition models. In this post, I am going to look at three: Round Robin, Hash and Dynamic Range.
Performance Baseline
- Settings: Output to Single File.
- Partition: Single.
With a Single Partition in place, the average runtime for a Triggered Pipeline is:
Pipeline Name | Run Start | Run End | Duration | Triggered By | Status | Size |
---|---|---|---|---|---|---|
Pipeline 1 | Jun 17, 2022, 11:56:10 am | Jun 17, 2022, 12:01:36 pm | 00:05:26 | Manual trigger | Succeeded | 280 MiB |
Pipeline 1 | Jun 17, 2022, 12:14:07 pm | Jun 17, 2022, 12:19:17 pm | 00:05:09 | Manual trigger | Succeeded | 280 MiB |
Round Robin Partition
- Settings: Default
- Partition: Set Partitioning (Round Robin)
Pipeline Name | Partitions | Run Start | Run End | Duration | Triggered By | Status | Size |
---|---|---|---|---|---|---|---|
Pipeline 1 | 2 | Jun 17, 2022, 12:32:18 pm | Jun 17, 2022, 12:36:51 pm | 00:04:33 | Manual trigger | Succeeded | Two files: 140.04 MiB |
Pipeline 1 | 2 | Jun 17, 2022, 12:50:14 pm | Jun 17, 2022, 12:54:49 pm | 00:04:35 | Manual trigger | Succeeded | Two files: 140.05 MiB |
Pipeline 1 | 4 | Jun 17, 2022, 1:41:26 pm | Jun 17, 2022, 1:47:52 pm | 00:06:25 | Manual trigger | Succeeded | Four files: 70.03 MiB |
Pipeline 1 | 4 | Jun 17, 2022, 1:59:00 pm | Jun 17, 2022, 2:03:23 pm | 00:04:23 | Manual trigger | Succeeded | Four files: 70.03 MiB |
Hash Partition
- Settings: Default
- Partition: Set Partitioning (Hash)
Pipeline Name | Partitions | Run Start | Run End | Duration | Triggered By | Status | Size | |
---|---|---|---|---|---|---|---|---|
Pipeline 1 | 2 | LocalAuthorityCode | Jun 17, 2022, 3:11:14 pm | Jun 17, 2022, 3:15:23 pm | 00:04:09 | Manual trigger | Succeeded | Two files: 140.07 MiB |
Pipeline 1 | 2 | Postcode | Jun 17, 2022, 3:22:25 pm | Jun 17, 2022, 3:26:34 pm | 00:04:09 | Manual trigger | Succeeded | Two files: 140.07 MiB |
Pipeline 1 | 4 | LocalAuthorityCode | Jun 17, 2022, 3:59:05 pm | Jun 17, 2022, 4:03:37 pm | 00:04:32 | Manual trigger | Succeeded | Four files: 70.03 MiB |
Pipeline 1 | 4 | Postcode | Jun 17, 2022, 3:30:13 pm | Jun 17, 2022, 3:34:53 pm | 00:04:39 | Manual trigger | Succeeded | Four files: 70.03 MiB |
Dynamic Range Partition
- Settings: Default
- Partition: Set Partitioning (Dynamic Range)
Pipeline Name | Partitions | Run Start | Run End | Duration | Triggered By | Status | Size | |
---|---|---|---|---|---|---|---|---|
Pipeline 1 | 2 | LocalAuthorityCode | Jun 17, 2022, 4:23:35 pm | Jun 17, 2022, 4:27:55 pm | 00:04:20 | Manual trigger | Succeeded | Two files: 140.07 MiB |
Pipeline 1 | 2 | Postcode | Jun 17, 2022, 4:08:50 pm | Jun 17, 2022, 4:13:30 pm | 00:04:40 | Manual trigger | Succeeded | Two files: 140.07 MiB |
Pipeline 1 | 4 | LocalAuthorityCode | Jun 17, 2022, 3:59:05 pm | Jun 17, 2022, 4:03:37 pm | Manual trigger | Succeeded | Four files: 70.03 MiB | |
Pipeline 1 | 4 | Postcode | Jun 17, 2022, 3:30:13 pm | Jun 17, 2022, 3:34:53 pm | Manual trigger | Succeeded | Four files: 70.03 MiB |