This article will show you two options for storing and generating hierarchical data in SQL server. There are many business cases where you have a parent-child relationship between data and you need to store and retrieve it to depict a hierarchy. In the past, your options would be to write a stored procedure that called itself recursively or handling the recursion in your code. Here are a couple of options where SQL Server is handling some of the work for you.
Common Table Expression (CTE)
Common Table Expression was introduced in SQL Server 2005. A CTE is a temporary result set and can be used in place of temporary or derived tables and also to perform recursive queries.
A CTE has two parts
A recursive CTE expands the definition of the table expression and contains two parts:
An anchor query which is the source of the recursion along with a UNION ALL statement and a second query which recurses across the anchor query
An outer query which references the routine and specifies the number of recursion levels
Scripts to Generate Sample Data
For the purposes of this demonstration, let’s create a table and some data that will be used by our queries.
CREATE TABLE Category(
Id int IDENTITY(1,1) NOT NULL,
CategoryName nvarchar(50) NOT NULL,
ParentCategory int NULL,
CONSTRAINT PK_Category PRIMARY KEY CLUSTERED
(
Id ASC
)
)
GO
ALTER TABLE Category WITH CHECK ADD CONSTRAINT FK_Category_Category FOREIGN KEY(ParentCategory)
REFERENCES Category (Id)
GO
Insert into Category (CategoryName, ParentCategory)
Values
('Education',null),
('Entertainment',null),
('Employment',null),
('Books',2),
('Music',2),
('Movies',2),
('Non-fiction',4),
('Fiction',4),
('Mystery',4),
('Romance',4),
('Biography',4),
('Reference',4),
('Rock N Roll',5),
('Blues',5),
('Jazz',5),
('Progressive',5),
('Rap',5),
('Pop',5),
('Comedy',6),
('Action',6),
('Documentary',6),
('Science Fiction',6),
('Family',6),
('Drama',6),
('Job Search',3),
('Salary',3),
('Unemployment',3)
Expected Result
This is the result we are looking for:
Education
Employment
Job Search
Salary
Unemployment
Entertainment
Books
Biography
Fiction
Mystery
Non-fiction
Reference
Romance
Movies
Action
Comedy
Documentary
Drama
Family
Science Fiction
Music
Blues
Jazz
Pop
Progressive
Rap
Rock N Roll
CTE Recursive Query Performing Alpha Sorting of Categories
Here is the recursive CTE query that will create the output above.
With CTE_CategoryList( Id, CategoryName, ParentCategory, [Level], Sort )
as
(
select Id, CAST(Catego
ryName as varchar(500)) as Sort
fr
om Category
CategoryName, ParentCategory, 0 as Level,
where ParentCategory is null
UNION ALL
select c.Id, c.CategoryName, c.ParentCategory, p.Level +1,
CAST(p.Sort + c.CategoryName as
varchar(500)) as Sort
from Category c
INNER JOIN CTE_CategoryList p on p.Id = c.ParentCategory
)
select SPACE(Level *4) + CategoryName from CTE_CATEGORYLIST
order by Sort
As mentioned in the beginning of this article, the recursive CTE is made up of a WITH clause containing an anchor query and a recursive query joined by the UNION ALL clause as well as the outer query. Notice that the anchor query in our case is only pulling data where ParentCategory is null. This insures we get the top dogs in our category list. From there the recursive query iterates over the remaining items and sets up our levels and sorting correctly.
Since you cannot use the ORDER BY clause in a CTE, I am forcing the order of the result by creating a sort field which is simply grabbing the first 3-5 characters of the CategoryName field. My outer query is using the sort field to order the results. One thing to note if you are not familiar with CTEs, the outer query must follow directly after the CTE with nothing else in between. However, you can concatenate CTE queries by separating each with a comma if you need to chain several queries together. In addition, a recursive CTE may specify the max level of recursion by adding the clause OPTION (MAXRECURSION 100) to the end of the outer query.
HierarchyId data type
In SQL Server 2008, a new datatype, hierarchyid, was introduced to be used to represent hierarchical data. Hierarchyid can be used to create tables with a hierarchical structure or to reference the hierarchical structure of data in another location. The hierarchyid data type represents the position of a row within a hierarchical structure and can utilize depth-first and breadth-first indexes. It is extremely compact and given two hierarchyid values a and b, a < b means a comes before b in a depth-first traversal.
A column of type hierarchyid does not automatically represent a tree. It is up to the user to generate and assign hierarchyid values in a way that creates the desired relationship between rows. There is no guarantee that hierarchyid values are unique unless a unique key or constraint is used. Relationships are not enforced in the same manner as a foreign key relationship. A parent can be deleted leaving a child without a parent relationship. The user would need to enforce these types of relationships.
We will use the HierarchyId datatype for the primary key in our NewCatalog table and demonstrate some of the functions available for inserting and retrieving data. We no longer need the Id or ParentCategory columns but will leave them for the purposes of this demonstration.
There can only be one root in a table containing the hierarchyid, therefore, we will create a separate row to represent the root of our new tree and our previous 3 roots, i.e., Entertainment, Education and Employment will be the descendants of this root.
Scripts to generate sample data
CREATE TABLE NewCategory(
CategoryNode hierarchyid,
Id int NULL,
CategoryName nvarchar(50) NOT NULL,
ParentCategory int NULL,
CONSTRAINT PK_NewCategory PRIMARY KEY CLUSTERED(CategoryNode)
)
GO
insert into NewCategory (CategoryNode, CategoryName)
values (HierarchyId::GetRoot(), 'Root')
This will insert a row into the table with a CategoryNode of ‘/’ and CategoryName ‘Root’.
DECLARE @rootId as Hierarchyid
SET @rootId = (select CategoryNode from NewCategory where CategoryName = 'Root')
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@rootId.GetDescendant(null,null),'Education', 1, NULL)
This will insert a row into the table with a CategoryNode of ‘/1/’ and CategoryName ‘Education’.
DECLARE @rootId as Hierarchyid
DECLARE @educationId as HierarchyId
SET @rootId = (select CategoryNode from NewCategory where CategoryName = 'Root')
SET @educationId = (select CategoryNode from NewCategory where CategoryName = 'Education')
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@rootId.GetDescendant(@educationId,null),'Entertainment', 1, NULL)
As you can see, this is not the way you would perform these inserts in a real world situation. Instead, you would write a stored procedure that performs the lookup and does the insert for you using the hierarchyid functions.
CREATE PROCEDURE insertNewCategory
(@CategoryName varchar(50),
@ParentName varchar(50),
@LegacyId int,
@LegacyParentId int
)
AS
BEGIN
DECLARE @parentId HIERARCHYID
DECLARE @lastChild HIERARCHYID
IF @ParentName IS NULL OR @CategoryName IS NULL
BEGIN
Print 'ParentName and CategoryName are required!'
RETURN
END
---- Get the hierarchyid of the Parent Category
SELECT @parentId = CategoryNode
FROM NewCategory
WHERE CategoryName = @ParentName
---- Get the MAX child node of this parent
SELECT @lastChild = MAX(CategoryNode)
FROM NewCategory
WHERE CategoryNode.GetAncestor(1) = @parentId
---- Insert this item as the next item in the Parent Category
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@parentId.GetDescendant(@lastChild,null),@CategoryName, @LegacyId, @LegacyParentId)
END
And now, your inserts are performed by calling the stored procedure
exec insertNewCategory 'Employment', 'Root', 2, NULL
exec insertNewCategory 'Books', 'Entertainment', 4, 2
exec insertNewCategory 'Music','Entertainment', 5, 2
exec insertNewCategory 'Movies','Entertainment', 6, 2
exec insertNewCategory 'Non-fiction','Books', 7, 4
exec insertNewCategory 'Fiction','Books', 8, 4
exec insertNewCategory 'Mystery','Books', 9, 4
exec insertNewCategory 'Romance','Books', 10, 4
exec insertNewCategory 'Biography','Books', 11, 4
exec insertNewCategory 'Reference','Books', 12, 4
exec insertNewCategory 'Rock N Roll','Music', 13, 5
exec insertNewCategory 'Blues','Music', 14, 5
exec insertNewCategory 'Jazz','Music', 15, 5
exec insertNewCategory 'Progressive','Music', 16, 5
exec insertNewCategory 'Rap','Music', 17, 5
exec insertNewCategory 'Pop','Music', 18, 5
exec insertNewCategory 'Comedy','Movies', 19, 6
exec insertNewCategory 'Action','Movies', 20, 6
exec insertNewCategory 'Documentary','Movies', 21, 6
exec insertNewCategory 'Science Fiction','Movies', 22, 6
exec insertNewCategory 'Family','Movies', 23, 6
exec insertNewCategory 'Drama','Movies', 24, 6
exec insertNewCategory 'Job Search','Employment', 25, 3
exec insertNewCategory 'Salary','Employment', 26, 3
exec insertNewCategory 'Unemployment','Employment', 27, 3
Let’s have a look at our data now …
select CategoryNode.ToString() as CategoryNodeString, * from NewCategory
where CategoryName != 'Root'
Notice that calling CategoryNode.ToString() converted our hierarchyid into a string representation. You can also use the Parse method, i.e, HIERARCHYID.Parse(‘/1/’) to convert from a string to a hierarchyid.
HierarchyId Query to Retrieve Results Based on HierarchyId Sorting
To retrieve the category data in a tree structure similar to our CTE example, we can utilize the following query:
select SPACE(CategoryNode.GetLevel()*4) + CategoryName
fromwhere Cate
goryName != 'Root'
NewCategory
order by CategoryNode
Since we are ordering this data by the CategoryNode value, it’s being returned in the order it was entered into the table. To return it in alphabetical order, we can write a recursive routine to query the data.
CTE Recursive Query and HierarchyId to Retrieve Results Based on Alpha Sorting of Categories
With CTE_CategoryList( CategoryName, CategoryNode, [Level], Sort )
Or you can insert the values in alphabetical order by modifying the stored procedure that performs the inserts
as
(
select CategoryName, CategoryNode, 0 as Level,
CAST(CategoryName as varchar(100)) as Sort
from NewCategory
where CategoryNode.GetAncestor(1).ToString() = '/'
UNION ALL
select c.CategoryName, c.CategoryNode, p.Level +1,
CAST(p.Sort + c.CategoryName as
varchar(100)) as Sort
from NewCategory c
INNER JOIN CTE_CategoryList p on
c.CategoryNode.GetAncestor(1) = p.CategoryNode
)
select SPACE(Level *4) + CategoryName from CTE_CATEGORYLIST
order by Sort
ALTER PROCEDURE insertNewCategory
And let’s see how the data looks after using this new stored procedure.
(@CategoryName varchar(50),
@ParentName varchar(50),
@LegacyId int,
@LegacyParentId int
)
AS
BEGIN
DECLARE @parentId HIERARCHYID
DECLARE @insertAfterChild HIERARCHYID
DECLARE @insertBeforeChild HIERARCHYID
IF @ParentName IS NULL OR @CategoryName IS NULL
BEGIN
Print 'ParentName and CategoryName are required!'
RETURN
END
---- Get the hierarchyid of the Parent Category
SELECT @parentId = CategoryNode
FROM NewCategory
WHERE CategoryName = @ParentName
---- Get the child node of this parent
---- to insert after
SELECT Top 1 @insertAfterChild = CategoryNode
FROM NewCategory
WHERE CategoryNode.GetAncestor(1) = @parentId
and CategoryName < @CategoryName
Order by CategoryName desc
---- see if there is one to insert before
SELECT Top 1 @insertBeforeChild =CategoryNode
FROM NewCategory
WHERE CategoryNode.GetAncestor(1) = @parentId
and CategoryName >= @CategoryName
Order by CategoryName asc
---- Insert this item as the next item in the Parent Category
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@parentId.GetDescendant(@insertAfterChild,@insertBeforeChild),@CategoryName, @LegacyId, @LegacyParentId)
END
And let’s see how the data looks after using this new stored procedure.
Notice the use of negative and decimal values in our CategoryNodeString in order to accommodate inserts between two existing values and still maintain the hierarchy.
Now, if we execute the same query as above, we will get the results back in the order they were stored in the tree which is based on the alphanumeric value of the category rather than the random order they were inserted. Since the hierarchyid, CategoryNode, is our primary key and therefore stored using a clustered index, retrieving data sorted on the CategoryNode will be extremely efficient.