Parent Child Relationships - Recursive Hierarchy (In Development)
This document explains how to use recursive joins in a database to model hierarchical data, specifically focusing on user hierarchies and chart aggregation in Panintelligence.
The traditional method of unpacking hierarchical data using multiple self-joins is limited by the need to guess maximum depth and struggles with aggregating values up the hierarchy.
A new method using drill-to-chart functionality in Panintelligence allows traversal of any depth hierarchy and exposes details at the lowest levels.
This new method requires three charts/tables: a top-level chart, a recursive drill chart, and a detail table. Chart filters and recursive drill steps are used to implement the hierarchy.
For aggregating data up the hierarchy (e.g., summing chart counts for a user and their descendants), a recursive CTE (Common Table Expression) is used in the data connection, overcoming limitations of the traditional method.
What is a recursive join?
Self join, Stubby join, pigs ear….. This concept can be known by many names. It is however a very common practise in a database to allow you to model a hierarchy. It is so useful it exists on four tables in the panintelligence repository database.
In panintelligence we have multiple places where you can have a hierarchy.
Users (MIS_USERS)
Categories (MIS_CATEGORIES)
Organisations (MIS_ORGANISATIONS)
Data Connections - Objects (Folders) (MIS_COLUMNS)
If you have a tree structure in your application, you probably have structure in your database that is almost identical (Unless it’s held as unstructured JSON - but that’s a different story).
The benefits of this RECURSIVE approach are, that it creates a dynamic structure where you do not need to specify the depth or even the meaning of the hierarchy.
For the rest of this document we are going to focus on the table MIS_USERS in the repository.
If we look at the table structure of MIS_USERS in the database - we will see that it contains (alongside all the other columns) a SERIAL - This is the unique identifier of a user (The Primary Key) and PARENT_ID, the is the recursive identifier which joins one user to their parent (a foreign key). THESE SHOULD BE INDEXED IN YOUR DB. IF NOT PUT AN INDEX ON THEM PLEASE AND THANK YOU.
Now the thing we need to understand is that a table can be joined to itself. What we really mean is that a record in a table can be joined to another record in that table.
An example - The old way
This may still be appropriate in some use cases.
So Let’s consider the Hierarchy we have in our dashboard;
Their is a root node called Welcome Administrator (there will always be some kind of root node. in the panintelligence system we can tell it’s a root node because it’s PARENT_ID = 0, and there are never any other users with an ID of 0. It is also common in other systems for the PARENT_ID or equivalent to be NULL.
Greta and Neil are direct descendants (children) of the Welcome Administrator user. George and Ronan are the direct descendants (children) of Greta, and descendants of Welcome Administrator.
NOTE: There can be (not in panintelligence) but in your own structures a very dangerous data quality issue. If the system does not validate correctly and would ever allow Greta to also become a child of Ronan, then an infinite loop is created! - This can be very bad. You will see later a max depth (eject from loop clause to save the day) - Actually panintelligence would still exit after a timeout or max rows, but we do have to be careful. If you find such a data quality feel free to tut very loudly at you system architect.
Everything so far has sounded wonderful. But now we hit the problem. Business Intelligence tools - like panintelligence really really struggle with this concept.
Traditionally - they force you to unpack this relationship. This means that you end up creating a structure which looks like the following;
Where the top table alias (mis_users_1) has a WHERE clause to limit it to the root node.
And each join - has the recursive PARENT_ID = SERIAL
Notice how the join is specified as a LEFT outer join as we do not know how deep the hierarchy will be.
This structure allows us to now show the users at the different depths.
Or even drill through them;
But it has these issues;
I have to build the depth (the number of table aliases) based on guessing what the max depth could be.
I can only ever then use it if the charts always start drilling from the top node.
I will normally end up unpacking (flattening) this structure in my Data Warehouse.
** At high volume this may still be appropriate at high volume and deeply nested hierarchies for perfromance.
And probably most significantly;
It can’t sum (AVG/MIN/MAX - any aggregation) a value up the hierarchy tree.
Let’s look at point 4 in more detail.
So in the repository there is a table called MIS_DEFINED_CHARTS, this holds the definition of all the charts created and crucially there is a column named USER_ID, this links the chart to the user who created it.
Now I want to show the users in the hierarchy and be able to aggregate up the list to show how many charts where built not only by Greta, but also all her descendants - which in this case is George and Ronan.
So if we do a simple fetch of the data.
Greta created 1 chart, George created 1 chart and Ronan created 1 chart (I’ve kept the numbers very simple).
When we look at Greta then we want a chart count of 3, compared to 1 when we drill down to, or view George or Ronan.
(This is a silly example but imagine Greta is the Sales team lead, when we view her numbers we want to see the combined sales of the team, and when we look at George or Ronan, just their individual sales).
We can accomplish this with the structure above, but only if we know the depth + that every branch of the tree has an identical depth.
So this will only partly work for our specific example;
And only if the items are always only linked to the users at level 3. We fail to count the chart that Greta created. Her number shows as 2, not 3.
If this was reporting sales - and she was on commission we now have one very angry Greta!, and a very disappointing sales value.
The new method
This involves 2 stages - and depending on your use case may only require the first.
If you require values to aggregate up the hierarchy as in point 4 above then you will also need to consider the second stage.
Stage 1 - Drill to chart
In this stage we use drill to chart to automatically traverse the hierarchy. To Do this you will need 3 Charts / Tables.
I this example we are going to have a bar chart that starts at the top level. Then another bar chart that is used recursively. Then finaly a table that show the details of the charts at the lowest level. The data connection to support this is as follows.
In the structure above we add MIS_USERS twice.
The first alias we add our objects including the serial and the Parent ID
The Second we have a count of user (SERIALs) and join it from serial = parent_id this is then a count of the direct descendants (Children) that each user has.
So now if I test them in the table I get the following data.
The child count is important as it will tell us when to exit the recursive drill (i.e. When there are no more children.
Now for the 2 charts an a table.
Top Level Chart
I Add the user name, and any Measure - in this case Child Count
This shows all users - so I add a filter to identify the root item (user), in this case Parent_ID = 0
I save this Chart.
Now I am going to create the table that I will finally drill to.
I am going to add some additional columns to the users table for the details of the user.
The table does not require any filters etc.
Now we will build the recursive drill portion;
It starts out just like the top chart
But in the filters - we apply chart filters manually, and set Serial = {{Parent ID}}
At this moment it will return no data as it has no value to drill from.
Save it and re-edit it.
Now we will put in the recursive drill.
Now we create the following drill step.
Where Child Count < 1 (i.e. no more children) then show the lowest level.
OR drill back to self.
Finally We need to go back to edit the top level chart.
We give it a drill to chart node.
This has a default action of drilling to the recursive step.
The final thing we need to do is add the serial to be passed as a filter. We add this to the additional data.
Note: in the example - I have used Child Count as the measure in the chart itself. If my chart contained a different measure then I would have included it also in the additional data tab.
If you wanted to start the drill at another level, say from Greta, then you can just do away with the top level chart and drill from any other item passing the users you are interested in.
Stage 2 - Aggregating data
Stage 1 gets us the recursive chart so we can now drill through any hierachy to any depth and expose the detail at those bottom nodes. But what if we want to see an accumulation of a value.
Let’s go back and add defined charts and chart count to our structure.
At this point if I drill to Greta, I’d get only 1 as the chart count but I want to count the charts from her descendants.
In This case I will now use a recursive CTE - See below for DB Support
A CTE (Common Table Expression) is a temporary, named result set in SQL that is defined within a query and can be referenced later in that query. It improves query readability, simplifies complex queries, and allows for recursive queries in cases such as hierarchical data processing.
The code I require for my Recursive CTE is
WITH RECURSIVE Hierarchy AS (
SELECT
SERIAL AS ID,
SERIAL AS RootID,
0 AS Depth
FROM
MIS_USERS
UNION ALL
SELECT
child.SERIAL AS ID,
parent.RootID AS RootID,
parent.Depth + 1 AS Depth
FROM
MIS_USERS child
INNER JOIN
Hierarchy parent ON child.PARENT_ID = parent.ID
WHERE
parent.Depth < 10 -- Required break point in case of infinite loop, where descendants become parents, data quality issue
)
SELECT
RootID,
ID,
Depth
FROM
Hierarchy
WHERE
RootID = 4 -- The entry point (Greta = 4) , this does not need to start at the top node
ORDER BY
RootID, ID
You can re-write this for your own scenario.
In the data connection we are going to add this Table With the CTE .
We can add an object to count the children of all records (Users)
Then we use this as a join between MIS_USERS and the table we want to aggregate over MIS_DEFINED_CHARTS
Explanation:
The recursive CTE unpacks data, so If I run the SQL for user serial 4 (Greta)
The query runs 3 rows. One for Greta, and a row for each descendant. Ronan + George.
If I add more nested users under Greta.
The return now becomes
As it includes all descendants no matter their depth (there is an exit @10 to protect an infinite loop in case of poor data quality - that is optional / up to you)
We are using this as a link table in our structure as it means that as Greta - I get all items (charts) linked to all my descendants.
It also allows us to count the number of ROOT_ID - 1, as if this > 0 Then there are descendants, so we can use this to exit our recursive drill path.
Limitatations;
Not All databases support CTES' and recurusive CTE’s - though this is a common pattern that continues to be adopted by more and more technologies. As of Jan 2025 this is the current state of play;
Databases that support CTE’s - and Recursive CTE
Databases / Extent of CTE support
CUSTOMER NEWS - Our November 24 Release Is Now Available. Check Out Our Webinar Series - Up Next - Organisations - 13th Feb 25... Register HERE