Skip Ribbon Commands
Skip to main content

Working with Hierarchical Data in SQL Server

Date: 4/22/2011 | Contributor: Mike Curtis | Keywords: Development; SQL; Stored Procedures; Data; Hierarchies

This article will show you two options for storing and generating hierarchical data in SQL server.  There are many business cases where you have a parent-child relationship between data and you need to store and retrieve it to depict a hierarchy.   In the past, your options would be to write a stored procedure that called itself recursively or handling the recursion in your code.  Here are a couple of options where SQL Server is handling some of the work for you.

Common Table Expression (CTE)

Common Table Expression was introduced in SQL Server 2005.  A CTE is a temporary result set and can be used in place of temporary or derived tables and also to perform recursive queries.    

A CTE has two parts

  • AWITH clause containing a select statement that generates a valid table
  • An outer SELECT statement that references the table expression

A recursive CTE expands the definition of the table expression and contains two parts:

  • An anchor query which is the source of the recursion along with a UNION ALL statement and a second query which recurses across the anchor query
  • An outer query which references the routine and specifies the number of recursion levels

Scripts to Generate Sample Data

 For the purposes of this demonstration, let’s create a table and some data that will be used by our queries.

CREATE TABLE Category(
Id int IDENTITY(1,1) NOT NULL,
CategoryName nvarchar(50) NOT NULL,
ParentCategory int NULL,
CONSTRAINT PK_Category PRIMARY KEY CLUSTERED
(
Id ASC
)
)
GO
ALTER TABLE Category WITH CHECK ADD CONSTRAINT FK_Category_Category FOREIGN KEY(ParentCategory)
REFERENCES Category (Id)
GO
Insert into Category (CategoryName, ParentCategory)
Values
('Education',null),
('Entertainment',null),
('Employment',null),
('Books',2),
('Music',2),
('Movies',2),
('Non-fiction',4),
('Fiction',4),
('Mystery',4),
('Romance',4),
('Biography',4),
('Reference',4),
('Rock N Roll',5),
('Blues',5),
('Jazz',5),
('Progressive',5),
('Rap',5),
('Pop',5),
('Comedy',6),
('Action',6),
('Documentary',6),
('Science Fiction',6),
('Family',6),
('Drama',6),
('Job Search',3),
('Salary',3),
('Unemployment',3)

 

Expected Result

This is the result we are looking for:

Education

Employment

    Job Search

    Salary

    Unemployment

Entertainment

    Books

        Biography

        Fiction

        Mystery

        Non-fiction

        Reference

        Romance

    Movies

        Action

        Comedy

        Documentary

        Drama

        Family

        Science Fiction

    Music

        Blues

        Jazz

        Pop

        Progressive

        Rap

        Rock N Roll

 

CTE Recursive Query Performing Alpha Sorting of Categories 

Here is the recursive CTE query that will create the output above.

With CTE_CategoryList( Id, CategoryName, ParentCategory, [Level], Sort )

as

(

select Id,

CAST(Catego
ryName as varchar(500)) as Sort
fr
om Category
CategoryName, ParentCategory, 0 as Level,

 

 

 

 

 

where ParentCategory is null

UNION ALL

select c.Id, c.CategoryName, c.ParentCategory, p.Level +1,
CAST(p.Sort + c.CategoryName as
varchar(500)) as Sort
from Category c
INNER JOIN CTE_CategoryList p on p.Id = c.ParentCategory
)
select SPACE(Level *4) + CategoryName from CTE_CATEGORYLIST
order by Sort
 
As mentioned in the beginning of this article, the recursive CTE is made up of a WITH clause containing an anchor query and a recursive query joined by the UNION ALL clause as well as the outer query.  Notice that the anchor query in our case is only pulling data where ParentCategory is null.  This insures we get the top dogs in our category list.  From there the recursive query iterates over the remaining items and sets up our levels and sorting correctly. 
 
Since you cannot use the ORDER BY clause in a CTE, I am forcing the order of the result by creating a sort field which is simply grabbing the first 3-5 characters of the CategoryName field.    My outer query is using the sort field to order the results.    One thing to note if you are not familiar with CTEs, the outer query must follow directly after the CTE with nothing else in between.  However, you can concatenate CTE queries by separating each with a comma if you need to chain several queries together.    In addition, a recursive CTE may specify the max level of recursion by adding the clause OPTION (MAXRECURSION 100) to the end of the outer query.
 

HierarchyId data type

In SQL Server 2008, a new datatype, hierarchyid, was introduced to be used to represent hierarchical data.    Hierarchyid can be used to create tables with a hierarchical structure or to reference the hierarchical structure of data in another location.    The hierarchyid data type represents the position of a row within a hierarchical structure and can utilize depth-first and breadth-first indexes.    It is extremely compact and given two hierarchyid values a and b, a < b means a comes before b in a depth-first traversal.

 
A column of type hierarchyid does not automatically represent a tree.  It is up to the user to generate and assign hierarchyid values in a way that creates the desired relationship between rows.  There is no guarantee that hierarchyid values are unique unless a unique key or constraint is used.  Relationships are not enforced in the same manner as a foreign key relationship.  A parent can be deleted leaving a child without a parent relationship.  The user would need to enforce these types of relationships.
 
We will use the HierarchyId datatype for the primary key in our NewCatalog table and demonstrate some of the functions available for inserting and retrieving data.  We no longer need the Id or ParentCategory columns but will leave them for the purposes of this demonstration.
 
There can only be one root in a table containing the hierarchyid, therefore, we will create a separate row to represent the root of our new tree and our previous 3 roots, i.e., Entertainment, Education and Employment will be the descendants of this root.
 

Scripts to generate sample data

 
CREATE TABLE NewCategory(
CategoryNode hierarchyid,
Id int NULL,
CategoryName nvarchar(50) NOT NULL,
ParentCategory int NULL,
CONSTRAINT PK_NewCategory PRIMARY KEY CLUSTERED(CategoryNode)
)
GO
insert into NewCategory (CategoryNode, CategoryName)
values (HierarchyId::GetRoot(), 'Root')
This will insert a row into the table with a CategoryNode of ‘/’ and CategoryName ‘Root’.
DECLARE @rootId as Hierarchyid
SET @rootId = (select CategoryNode from NewCategory where CategoryName = 'Root')
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@rootId.GetDescendant(null,null),'Education', 1, NULL)
This will insert a row into the table with a CategoryNode of ‘/1/’ and CategoryName ‘Education’.
DECLARE @rootId as Hierarchyid
DECLARE @educationId as HierarchyId
SET @rootId = (select CategoryNode from NewCategory where CategoryName = 'Root')
SET @educationId = (select CategoryNode from NewCategory where CategoryName = 'Education')
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@rootId.GetDescendant(@educationId,null),'Entertainment', 1, NULL)
As you can see, this is not the way you would perform these inserts in a real world situation. Instead, you would write a stored procedure that performs the lookup and does the insert for you using the hierarchyid functions.
CREATE PROCEDURE insertNewCategory
(@CategoryName varchar(50),
@ParentName varchar(50),
@LegacyId int,
@LegacyParentId int
)
AS
BEGIN
DECLARE @parentId HIERARCHYID
DECLARE @lastChild HIERARCHYID
IF @ParentName IS NULL OR @CategoryName IS NULL
BEGIN
Print 'ParentName and CategoryName are required!'
RETURN
END
---- Get the hierarchyid of the Parent Category
SELECT @parentId = CategoryNode
FROM NewCategory
WHERE CategoryName = @ParentName
---- Get the MAX child node of this parent
SELECT @lastChild = MAX(CategoryNode)
FROM NewCategory
WHERE CategoryNode.GetAncestor(1) = @parentId
---- Insert this item as the next item in the Parent Category
insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)
values
(@parentId.GetDescendant(@lastChild,null),@CategoryName, @LegacyId, @LegacyParentId)
END
And now, your inserts are performed by calling the stored procedure
exec insertNewCategory 'Employment', 'Root', 2, NULL
exec insertNewCategory 'Books', 'Entertainment', 4, 2
exec insertNewCategory 'Music','Entertainment', 5, 2
exec insertNewCategory 'Movies','Entertainment', 6, 2
exec insertNewCategory 'Non-fiction','Books', 7, 4
exec insertNewCategory 'Fiction','Books', 8, 4
exec insertNewCategory 'Mystery','Books', 9, 4
exec insertNewCategory 'Romance','Books', 10, 4
exec insertNewCategory 'Biography','Books', 11, 4
exec insertNewCategory 'Reference','Books', 12, 4
exec insertNewCategory 'Rock N Roll','Music', 13, 5
exec insertNewCategory 'Blues','Music', 14, 5
exec insertNewCategory 'Jazz','Music', 15, 5
exec insertNewCategory 'Progressive','Music', 16, 5
exec insertNewCategory 'Rap','Music', 17, 5
exec insertNewCategory 'Pop','Music', 18, 5
exec insertNewCategory 'Comedy','Movies', 19, 6
exec insertNewCategory 'Action','Movies', 20, 6
exec insertNewCategory 'Documentary','Movies', 21, 6
exec insertNewCategory 'Science Fiction','Movies', 22, 6
exec insertNewCategory 'Family','Movies', 23, 6
exec insertNewCategory 'Drama','Movies', 24, 6
exec insertNewCategory 'Job Search','Employment', 25, 3
exec insertNewCategory 'Salary','Employment', 26, 3
exec insertNewCategory 'Unemployment','Employment', 27, 3
Let’s have a look at our data now …
select CategoryNode.ToString() as CategoryNodeString, * from NewCategory
where CategoryName != 'Root'
Notice that calling CategoryNode.ToString() converted our hierarchyid into a string representation.  You can also use the Parse method, i.e, HIERARCHYID.Parse(‘/1/’) to convert from a string to a hierarchyid.
 

 

HierarchyId Query to Retrieve Results Based on HierarchyId Sorting

To retrieve the category data in a tree structure similar to our CTE example, we can utilize the following query:

select SPACE(CategoryNode.GetLevel()*4) + CategoryName

from

where Cate
goryName != 'Root'
NewCategory

 

 

 

 

 

order by CategoryNode

Since we are ordering this data by the CategoryNode value, it’s being returned in the order it was entered into the table.     To return it in alphabetical order, we can write a recursive routine to query the data.  

CTE Recursive Query and HierarchyId to Retrieve Results Based on Alpha Sorting of Categories

 With CTE_CategoryList( CategoryName, CategoryNode, [Level], Sort )

Or you can insert the values in alphabetical order by modifying the stored procedure that performs the inserts

as

(

select CategoryName, CategoryNode, 0 as Level,

CAST(CategoryName as varchar(100)) as Sort

from NewCategory

where CategoryNode.GetAncestor(1).ToString() = '/'

UNION ALL

select c.CategoryName, c.CategoryNode, p.Level +1,

CAST(p.Sort + c.CategoryName as

varchar(100)) as Sort

from NewCategory c

INNER JOIN CTE_CategoryList p on

c.CategoryNode.GetAncestor(1) = p.CategoryNode

)

select SPACE(Level *4) + CategoryName from CTE_CATEGORYLIST

order by Sort

ALTER PROCEDURE insertNewCategory

And let’s see how the data looks after using this new stored procedure.

(@CategoryName varchar(50),

@ParentName varchar(50),

@LegacyId int,

@LegacyParentId int

)

AS

BEGIN

DECLARE @parentId HIERARCHYID

DECLARE @insertAfterChild HIERARCHYID

DECLARE @insertBeforeChild HIERARCHYID

IF @ParentName IS NULL OR @CategoryName IS NULL

BEGIN

Print 'ParentName and CategoryName are required!'

RETURN

END

---- Get the hierarchyid of the Parent Category

SELECT @parentId = CategoryNode

FROM NewCategory

WHERE CategoryName = @ParentName

---- Get the child node of this parent

---- to insert after

SELECT Top 1 @insertAfterChild = CategoryNode

FROM NewCategory

WHERE CategoryNode.GetAncestor(1) = @parentId

and CategoryName < @CategoryName

Order by CategoryName desc

---- see if there is one to insert before

SELECT Top 1 @insertBeforeChild =CategoryNode

FROM NewCategory

WHERE CategoryNode.GetAncestor(1) = @parentId

and CategoryName >= @CategoryName

Order by CategoryName asc

---- Insert this item as the next item in the Parent Category

insert into NewCategory (CategoryNode, CategoryName, Id, ParentCategory)

values

(@parentId.GetDescendant(@insertAfterChild,@insertBeforeChild),@CategoryName, @LegacyId, @LegacyParentId)

END

And let’s see how the data looks after using this new stored procedure.

Notice the use of negative and decimal values in our CategoryNodeString in order to accommodate inserts between two existing values and still maintain the hierarchy.

Now, if we execute the same query as above, we will get the results back in the order they were stored in the tree which is based on the alphanumeric value of the category rather than the random order they were inserted. Since the hierarchyid, CategoryNode, is our primary key and therefore stored using a clustered index, retrieving data sorted on the CategoryNode will be extremely efficient.


About the authors: