While working on large database/tables we come across several issues – may be queries running slow i.e. data performance issues, time to perform certain maintenance operations and replicating/deleting data from these tables.
And to deal with these tailbacks, we think of data partitioning as the easiest and best solution. It provides the means to effectively manage and scale your data at a time when tables are growing exponentially.
But to implement this into real scenarios, we not only need to give good amount of thought around like which tables we should be applying this to, granularity of the data to be partitioned, whether indexes to be partitioned or not etc., but also need to take into consideration the fact whether it would be any harm doing so.
In this article we gonna discuss about its basic needs, high level implementation and talk about the scenarios it would really be beneficial to implement the same into.
Purpose/Benefits of data partitioning:
- As stated above, the basic necessity of data partitioning arises when size of the data tables becomes quite large and it’s being experienced that queries on these tables running very slow due to the same. And then by means of the partitions we can think of splitting the whole table into a few smaller ones & make the queries run faster and performs data sorting for I/O operations much faster.
- Further in case of the larger tables, other maintenance activities like index rebuild, compression, statistics update etc. also becomes significantly slower and hence performing the same on partitions make it easier.
- Helps with lower locks if different partitions being inserted, updated, deleted or selected in different transactions.
- It also help to transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a data collection.
When we create a simple data table into SQL Server, a partition is automatically created for the same i.e. until & unless defined, whole data is stored into one partition & hence onto one FileGroup. When partitions are explicitly created, the data is partitioned horizontally, so that groups of rows are mapped into individual partitions accordingly. The best part is that table or index is treated as a single logical entity when queries or updates are performed on the data i.e. it’s completely transparent to applications (as long as you don’t have to change primary & foreign keys): they don’t have to know the table is even partitioned.
Partitioned tables and indexes support all the properties and features associated with designing and querying standard tables and indexes, including constraints, defaults, identity and timestamp values, and triggers.
All partitions of a single index or table must reside in the same database.
Deciding factors to implement or not to partitioning
- Biggest reason to partition the data table is if table contains or is expected to contain lots of data that are used in different ways, probably in different queries. And more importantly we have some field on the basis of which we can partition the data into different partitions e.g. in a large fact table if we have date which can be partition upon for different years to make the queries run faster being executed for different years.
- Queries or updates against the table are not performing as intended, or maintenance costs exceed predefined maintenance periods.
- Frequent lock escalation issue at table level.
- Data Archival: In scenarios like loading latest data into data warehouse where table being accessed heavily at the same time and also we need to switch off the oldest data from that table, partitioning becomes very useful.
- A big question to ask yourself if your table is really big enough to be partitioned? If not, then it would be an overhead of partition management rather an advantage. Also, we would need the Enterprise edition of the SQL Server to be able to implement the same.
Basic Implementation of partitioning
There can be different scenarios/ requirements and approaches:
- Partitioning a new table being created
- Partitioning the existing table
- Transfer / switch partition of a partitioned table.
Partitioning a new table being created
There are a few steps to create a partitioned table as follows:
Partition Function: When we say we want to partition a particular table, we actually mean to store the whole dataset into smaller chunks. And hence, we need to define how to make those data chunks. There comes Partition Function into the picture.
So, Partition Function, is a SQL object that defines how the rows of a table or index are mapped to a set of partitions based on the values of certain column, called a partitioning column (e.g. a datetime column) and how the boundaries of the partitions are defined. And this essentially defines how many partition are going to be there for a particular table.
Computed columns that participate in a partition function must be explicitly marked PERSISTED. All data types that are valid for use as index columns can be used as a partitioning column, except timestamp. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified
Partition Scheme: As mentioned earlier, different partitions can be stored differently across various filegroups, a Partition Scheme allows to do that. So, it is a database object that maps the partitions of a partition function to a set of filegroups. The primary reason for placing your partitions on separate filegroups is to make sure that you can independently perform backup operations on partitions.
Now, let’s take an example to define all these objects and apply them onto a data table to make it partitioned table.
e.g. A fact table ‘dbo.Fact_Orders’ where we receive lots of order transactions (OrderId, Amount, SalespersonID, OrderDate) and this is being queried based upon orderdate and hence we need to partition it upon OrderDate.
Create Partition Function:
This partition function partitions a table into 4 partitions, one for each quarter of a year 2014 worth of values in a datetime column.
‘RIGHT’ keyword defines the boundaries that partitions will have records on the basis of OrderDate (or any column this partition function will be applied upon) like
Partition1 -> OrderDate < ‘01Jan2014’
Partition2 -> OrderDate >= ‘01Jan2014’ AND OrderDate < ‘01Apr2014’
Partition3 -> OrderDate >= ‘01Apr2014’ AND OrderDate < ‘01Jul2014’
Partition4 -> OrderDate >= ‘01Jul2014’ AND OrderDate < ‘01Oct2014’
Partition5 -> OrderDate > ‘01Oct2014’
As shown right most partition ‘Partition5′ will contain all the values >’01Oct2014’. LEFT is the default value unless specified.
Now if we want to check which partition a particular record (a date) resides into:
Create Partition Scheme:
This will make it to store all the partitions on a single filegroup i.e. the primary filegroup.
If we want different partitions to be stored into different filegroups, we need to specify them explicitly as below:
These filegroups need to be pre-defined. If extra filegroup is specified then that will be used for next used partition. We will talk about later how to add new partitions to the existing function.
Now while creating the table we can assign this partition scheme (& thus the partition function) to the table by means of the partitioning column and hence table structure will be inline to this portioning strategy.
As we have defined certain partitions for our fact table based upon the current requirement i.e. for the year 2014. Now, in the next year, let’s say we need to add some more partitions. It sounds like what we want to do is to create a new boundary for our existing partitioning implementation e.g. add another partition for Q1 of 2015. Also, we will later talk about how to remove an existing obsolete partition which is no more required.
And this can be implemented via ALTER PARTITION FUNCTION statement as follows:
SPLIT RANGE: This adds a new partition boundary to the existing partition function by splitting one the existing ranges into two.
This will work absolutely fine because we have already got an additional filegroup left unassigned to accommodate a new partition. So, when we add filegroups to a partition scheme and mention some extra filegroup at the end which will automatically be marked as NEXT USED i.e. to be used for the next partition to be added and hence nothing else needs to be done.
But, if we haven’t defined any extra filegroup to the partition scheme, then before splitting the range of a partition function, we would need to alter the partition scheme as below:
Or in case Primary filegroup to be used: