T-SQL GRAPH Table Tips : Demystifying SHORTEST_PATH function

Introduction

This article is the last one in the series of blogs that I’ve written on GRAPH tables in SQLServer.
The previous two articles were published in Technet Wiki and can be checked from the below link.

Introduction to Graph tables
https://social.technet.microsoft.com/wiki/contents/articles/37993.t-sql-graph-based-tables-in-sql-2017.aspx

Graph table enhancements : Edge constraints
https://social.technet.microsoft.com/wiki/contents/articles/53162.graph-table-enhancements-edge-constraints-system-utility-functions.aspx

In this article, we shall see the latest enhancements that have been introduced in GRAPH tables in SQL 2019 version, i.e SHORTEST_PATH function.

SHORTEST_PATH allows us to traverse the GRAPH table using a given search condition and return the matching nodes based on it. As such this function is always used in conjunction with the MATCH function as an option inside. The search can be done repeatedly in a recursive manner until last level is reached or a predefined number of iterations is done as specified by an argument.

The traversed result shall include multiple nodes within the graph. Hence most often it would require the use of an aggregate function like STRING_AGG to present the result as a single column inside our SELECT query.

Illustration

For illustrating the functionality, lets consider the below sample data

-- Create a graph demo database
IF NOT EXISTS (SELECT * FROM sys.databases WHERE NAME = 'graphdemo')
	CREATE DATABASE GraphDemo;
GO

USE GraphDemo;
GO

-- Create NODE tables
CREATE TABLE Person (
  ID INTEGER PRIMARY KEY,
  name VARCHAR(100)
) AS NODE;

CREATE TABLE Restaurant (
  ID INTEGER NOT NULL,
  name VARCHAR(100),
  city VARCHAR(100)
) AS NODE;

CREATE TABLE City (
  ID INTEGER PRIMARY KEY,
  name VARCHAR(100),
  stateName VARCHAR(100)
) AS NODE;

-- Create EDGE tables. 
CREATE TABLE likes (rating INTEGER) AS EDGE;
CREATE TABLE friendOf AS EDGE;
CREATE TABLE livesIn AS EDGE;
CREATE TABLE locatedIn AS EDGE;
CREATE TABLE siblingOf AS EDGE;

-- Insert data into node tables. Inserting into a node table is same as inserting into a regular table
INSERT INTO Person (Id, name)
	VALUES (1, 'John')
		 , (2, 'Mary')
		 , (3, 'Alice')
		 , (4, 'Jacob')
		 , (5, 'Julie');

INSERT INTO Person (Id, name)
	VALUES (6,'Patrick'),
	(7,'Janet'),
	(8,'Divya'),
	(9,'Marc'),
	(10,'Peter')

INSERT INTO Person (Id, name)
VALUES (11,'Henry')

INSERT INTO Person (Id, name)
VALUES (12,'Micheal'),
(13,'Leone')

INSERT INTO Restaurant (Id, name, city)
	VALUES (1, 'Taco Dell','Bellevue')
		 , (2, 'Ginger and Spice','Seattle')
		 , (3, 'Noodle Land', 'Redmond');

INSERT INTO City (Id, name, stateName)
	VALUES (1,'Bellevue','wa')
		 , (2,'Seattle','wa')
		 , (3,'Redmond','wa');

-- Insert into edge table. While inserting into an edge table,
-- you need to provide the $node_id from $from_id and $to_id columns.
/* Insert which restaurants each person likes */
INSERT INTO likes 
	VALUES ((SELECT $node_id FROM Person WHERE ID = 1), (SELECT $node_id FROM Restaurant WHERE ID = 1), 9)
		 , ((SELECT $node_id FROM Person WHERE ID = 2), (SELECT $node_id FROM Restaurant WHERE ID = 2), 9)
		 , ((SELECT $node_id FROM Person WHERE ID = 3), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9)
		 , ((SELECT $node_id FROM Person WHERE ID = 4), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9)
		 , ((SELECT $node_id FROM Person WHERE ID = 5), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9);

/* Associate in which city live each person*/
INSERT INTO livesIn 
	VALUES ((SELECT $node_id FROM Person WHERE ID = 1), (SELECT $node_id FROM City WHERE ID = 1))
		 , ((SELECT $node_id FROM Person WHERE ID = 2), (SELECT $node_id FROM City WHERE ID = 2))
		 , ((SELECT $node_id FROM Person WHERE ID = 3), (SELECT $node_id FROM City WHERE ID = 3))
		 , ((SELECT $node_id FROM Person WHERE ID = 4), (SELECT $node_id FROM City WHERE ID = 3))
		 , ((SELECT $node_id FROM Person WHERE ID = 5), (SELECT $node_id FROM City WHERE ID = 1));

/* Insert data where the restaurants are located */
INSERT INTO locatedIn 
	VALUES ((SELECT $node_id FROM Restaurant WHERE ID = 1), (SELECT $node_id FROM City WHERE ID =1))
		 , ((SELECT $node_id FROM Restaurant WHERE ID = 2), (SELECT $node_id FROM City WHERE ID =2))
		 , ((SELECT $node_id FROM Restaurant WHERE ID = 3), (SELECT $node_id FROM City WHERE ID =3));

/* Insert data into the friendOf edge */
INSERT INTO friendOf 
	VALUES ((SELECT $NODE_ID FROM Person WHERE ID = 1), (SELECT $NODE_ID FROM Person WHERE ID = 2))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 2), (SELECT $NODE_ID FROM Person WHERE ID = 3))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 3), (SELECT $NODE_ID FROM Person WHERE ID = 1))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 4), (SELECT $NODE_ID FROM Person WHERE ID = 2))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 5), (SELECT $NODE_ID FROM Person WHERE ID = 4));
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 6), (SELECT $NODE_ID FROM Person WHERE ID = 3))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 7), (SELECT $NODE_ID FROM Person WHERE ID = 6))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 9), (SELECT $NODE_ID FROM Person WHERE ID = 7))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 10), (SELECT $NODE_ID FROM Person WHERE ID = 9))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 8), (SELECT $NODE_ID FROM Person WHERE ID = 5))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 1))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 10), (SELECT $NODE_ID FROM Person WHERE ID = 11))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 7), (SELECT $NODE_ID FROM Person WHERE ID = 11))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 3), (SELECT $NODE_ID FROM Person WHERE ID = 13))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 12))
		 , ((SELECT $NODE_ID FROM Person WHERE ID = 12), (SELECT $NODE_ID FROM Person WHERE ID = 13))

INSERT INTO siblingOf
	VALUES
	((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 12)),
	((SELECT $NODE_ID FROM Person WHERE ID = 12), (SELECT $NODE_ID FROM Person WHERE ID = 13))

The above data can be visualized into a graph like below

SHORTEST PATH Scenario

For simplicity only the major nodes inside the graph are included. We shall see how SHORTEST_PATH function can be applied in the above case for traversing through the friends as well as sibling nodes from a given node or by means of a search condition.

Simple Scenario – MATCH function

First lets try a simple scenario of getting every direct friend of John. Looking at the graph, we can see that for this requirement we need to traverse just a single level of node using the relationship friendof. Since this doesnt require any repeated or recursive logic, this can be directly achieved using a MATCH function as shown below

SELECT p1.name AS Person,p2.name AS friend
FROM Person AS p1, friendOf ,Person AS p2
WHERE MATCH(p1-(friendOf)->p2)
AND p1.name = 'John'

The result of the code will include the direct friends of John which is Mary as per the graph

Repetitive Traversal – SHORTEST_PATH function

Now lets go one step further and gets next level friends (friends of friends) for John. In this case we need to repeat the graph traversal to one more level.

This is made possible by the SHORTEST_PATH function which accepts an arbitrary length pattern as parameter and traverses the path based on it. Since we need only up to second level we can set to repeat the pattern up to 2 times. Also since the path includes multiple nodes we would require using string concatenation function like STRING_AGG to return the nodes as a delimited list. Also the edge table and node table which has to be traversed repeatedly should be included with FOR PATH clause to enable the recursive processing.

The modified query will look like below

SELECT p1.name AS Person,
STRING_AGG(p2.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendPath,
LAST_VALUE(p2.name)  WITHIN GROUP (GRAPH PATH) AS Friend
FROM Person AS p1, friendOf FOR PATH,Person FOR PATH AS p2
WHERE MATCH(SHORTEST_PATH(p1(-(friendOf)->p2){1,2}))
AND p1.name = 'John'

The result of the above query will be as below

Person	FriendPath	Friend
--------------------------
John	Mary	Mary
John	Mary->Alice	Alice

The results shows two paths one direct friend of John i.e Mary and next level will have Mary’s friends which is Alice. The FriendsPath gives entire path of nodes traversed and LAST_VALUE function is used to return the last friend node at the current traversed level.

Extending this to one more level, the query looks like this. An additional column based on COUNT function is also included to count the level being traversed each time

SELECT p1.name AS Person,
STRING_AGG(p2.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendPath,
LAST_VALUE(p2.name)  WITHIN GROUP (GRAPH PATH) AS Friend,
COUNT(p2.Name) WITHIN GROUP(GRAPH PATH) AS TraversedLevel
FROM Person AS p1, friendOf FOR PATH,Person FOR PATH AS p2
WHERE MATCH(SHORTEST_PATH(p1(-(friendOf)->p2){1,3}))
AND p1.name = 'John'

And the result looks like the below

Person	FriendPath	Friend	TraversedLevel
-------------------------------------------
John	Mary	Mary	1
John	Mary->Alice	Alice	2
John	Mary->Alice->John	John	3
John	Mary->Alice->Leone	Leone	3

Till now we were considering scenarios involving a single relationship. Now lets consider a case where two relationships are involved.

The problem statement is as follows
Get the details of all people along with restaurants they liked and location of the restaurant, for those who are friends of the people who liked a restaurant located in Seattle.
For the above problem statement we would require three sets of relationships based on FriendOf, liked and LocatedIn
Since this doesnt require any recursive or repetitive logic, we dont need to use SHORTEST_PATH function in this case

The corresponding query can be given as below

SELECT p1.name AS LikedPerson,r2.name AS LikedRestaurant,c2.name as LocatedIn,p2.name as FriendOf,r1.name AS LikedRestaurant_Seattle,c1.name AS LocatedIn_Seattle
FROM Person AS p1, likes as l1, Restaurant AS r1,locatedIn AS ln1,City AS c1,friendOf,Person AS p2,Restaurant AS r2,likes as l2,City AS c2,locatedIn AS ln2
WHERE MATCH (c2<-(ln2)-r2<-(l2)-p1-(friendOf)->p2-(l1)->r1-(ln1)->c1)
AND c1.name = 'Seattle'

The result looks as below

LikedPerson	LikedRestaurant	LocatedIn	FriendOf	LikedRestaurant_Seattle	LocatedIn_Seattle
--------------------------------------------------------------------------------------------------
John	Taco Dell	Bellevue	Mary	Ginger and Spice	Seattle
Jacob	Noodle Land	Redmond	Mary	Ginger and Spice	Seattle

The results shows that there are two people John and Jacob who are friends of Mary who liked the restaurant Ginger and Spice which is located in Seattle. The results also returns the restaurants liked by John and Jacob and gives their location as well.

Multiple Repetitive Traversal – SHORTEST_PATH with LAST_NODE

Now lets try a recursive (repeating) relationships example using SHORTEST_PATH function

Consider the scenario where we need to get details of all persons who are friends with person who is a sibling of Leone

The persons can be at any level in the friends network with the person who is in the sibling network of Leone. As such, this would require usage of SHORTEST_PATH function on both the relationships for multi level traversal.

Also for the above scenario we would require first traversing friends of person till the last level and then associate the last level node to other nodes using sibling relationship until we reach Leone. So its like one relationship recursively traversed till it reaches the end level and from the last node, start the traversal of the second relationship until we reach the intended node Leone. This is made possible by the use of LAST_NODE within the SHORTEST_PATH function call based on sibling relationship.
When SHORTEST_PATH sees the LAST_NODE, it takes the latest node of the last relationship traversed and starts from there by applying the current specified relationship (siblingOf in this case)

Based on the above approach, the query can be written as per below

  SELECT *
	FROM
	(
	SELECT
		Person1.name AS PersonName, 
		STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
		STRING_AGG(Person3.name, '->') WITHIN GROUP (GRAPH PATH) AS Siblings,
		LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS Sibling,
		LAST_VALUE(Person3.name) WITHIN GROUP (GRAPH PATH) AS FinalPerson
	FROM
		Person AS Person1,
		friendOf FOR PATH AS fo,
		Person FOR PATH  AS Person2,
		Person FOR PATH  AS Person3,
		SiblingOf FOR PATH AS so
	WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+) AND SHORTEST_PATH(LAST_NODE(Person2)(-(so)->Person3)+))
	)t
	WHERE FinalPerson = 'Leone'

The query will give you a result as below

PersonName	Friends	Siblings	Sibling	FinalPerson
---------------------------------------------------------
Janet	Henry	Micheal->Leone	Henry	Leone
Peter	Henry	Micheal->Leone	Henry	Leone
Marc	Janet->Henry	Micheal->Leone	Henry	Leone

So there are three persons returned in the resultset. Janet who is a friend of Henry will be returned as Henry is a sibling to Leone through relationship traversed through Micheal. Similarly Peter will also be returned who has a friendOf relationship with Henry. Another addition to the result would be Marc who has friendOf relationship with Henry through Janet.
Thus Janet, Peter and Marc are the persons who are friends with a person (Henry) who is a sibling of Leone.

Mixing Repetitive and Non-repetitive Relationships

Before concluding our findings, lets see one more example where we shall traverse the graph in opposite direction and use a non-repetitive relationship in addition to the repetitive ones

The scenario requires finding all siblings who has in their friends network a friend who has liked a restaurant which is situated at Bellevue

The above case involves traversing two repetitive relationships, sibilingOf and friendOf and also determining the liked restaurant based on the simple likes relationship from the LAST_NODE of the friends collection. Hence this shall be regarded as an example for hybrid traversal which includes recursive and simple paths interelated

Accordingly, the query can be written as below

SELECT *
	FROM
	(
	SELECT
		Person1.name AS PersonName, 
		STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Siblings,
		STRING_AGG(Person3.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendsOfSiblings,
		LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS Sibling,
		LAST_VALUE(Person3.name) WITHIN GROUP (GRAPH PATH) AS FinalFriend,
		r.Name as LikedRestaurant,
		r.city as RestaurantCity
	FROM
		Person AS Person1,
		friendOf FOR PATH AS fo,
		Person FOR PATH  AS Person2,
		Person FOR PATH  AS Person3,
		SiblingOf FOR PATH AS so,
		likes AS l,
		Restaurant AS r
	WHERE MATCH(SHORTEST_PATH(Person1(-(so)->Person2)+) AND SHORTEST_PATH(LAST_NODE(Person2)(<-(fo)-Person3)+) AND LAST_NODE(Person3)-(l)->r)
	) AS r
	WHERE RestaurantCity = 'Bellevue'

If you check the above query, it includes two recursive traversals of the graph using siblingOf and friendOf relationships and includes a simple traversal at the end based on likes relationship.

The result would be obtained as per the below

PersonName	Siblings	FriendsOfSiblings	Sibling	FinalFriend	LikedRestaurant	RestaurantCity
--------------------------------------------------------------------------------------------------
Micheal	Leone	Alice->Mary->John	Leone	John	Taco Dell	Bellevue
Henry	Micheal->Leone	Alice->Mary->John	Leone	John	Taco Dell	Bellevue

On analyzing the result we can see that there are two persons Micheal and Henry who has friends in the network of their siblings who has liked the restaurant in Bellevue city. In both of the cases, their sibling Leone has friend John who has liked Taco Dell restaurant in Bellevue which is what you will getting in your result.

Conclusion

As seen from the above illustrations, SHORTEST_PATH is a very useful addition to SQL Server GRAPH table functions which helps in repetitive or recursive traversal of the graph based on a defined relationship.