For those of you familiar with recursive common table expressions in SQL, iterating through a parent-child hierarchy of data is a fairly straight forward process. There are several examples available which demonstrate how one could approach this problem in Power Query M. In this post, we’ll use recursion and dynamic field retrieval to loop through and dynamically flatten a parent-child hierarchy using Power Query M.
Before we begin, let’s take a quick look at an example of the function being invoked in Power BI. This should hopefully give you some idea of where we’re heading with this.
Dynamically Flatten a Parent-Child Hierarchy using Power Query M
Let’s look at some sample data to get a feel for the type of data the function requires as input and the resultant dataset the function will output.
ParentNodeID | ParentNodeName | ChildNodeID | ChildNodeName |
---|---|---|---|
100 | Stringer | 2 | Shamrock |
200 | Avon | 201 | Levy |
200 | Avon | 202 | Brianna |
200 | Avon | 203 | Wee-Bey |
2 | Shamrock | 3 | Slim Charles |
3 | Slim Charles | 51 | Bodie |
3 | Slim Charles | 52 | Poot |
3 | Slim Charles | 53 | Bernard |
51 | Bodie | 61 | Sterling |
51 | Bodie | 62 | Pudding |
52 | Poot | 61 | Sterling |
52 | Poot | 62 | Pudding |
ParentNodeID | ChildNodeID1 | ChildNodeID2 | ChildNodeID3 | ChildNodeID4 | ParentNodeName | ChildNodeName1 | ChildNodeName2 | ChildNodeName3 | ChildNodeName4 | HierarchyLevel | HierarchyPath | IsLeafLevel | HierarchyNodeID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100 | Stringer | 1 | 100 | false | 100 | ||||||||
100 | 2 | Stringer | Shamrock | 2 | 100|2 | false | 2 | ||||||
100 | 2 | 3 | Stringer | Shamrock | Slim Charles | 3 | 100|2|3 | false | 3 | ||||
100 | 2 | 3 | 51 | Stringer | Shamrock | Slim Charles | Bodie | 4 | 100|2|3|51 | false | 51 | ||
100 | 2 | 3 | 51 | 61 | Stringer | Shamrock | Slim Charles | Bodie | Sterling | 5 | 100|2|3|51|61 | true | 61 |
100 | 2 | 3 | 51 | 62 | Stringer | Shamrock | Slim Charles | Bodie | Pudding | 5 | 100|2|3|51|62 | true | 62 |
100 | 2 | 3 | 52 | Stringer | Shamrock | Slim Charles | Poot | 4 | 100|2|3|52 | false | 52 | ||
100 | 2 | 3 | 52 | 61 | Stringer | Shamrock | Slim Charles | Poot | Sterling | 5 | 100|2|3|52|61 | true | 61 |
100 | 2 | 3 | 52 | 62 | Stringer | Shamrock | Slim Charles | Poot | Pudding | 5 | 100|2|3|52|62 | true | 62 |
100 | 2 | 3 | 53 | Stringer | Shamrock | Slim Charles | Bernard | 4 | 100|2|3|53 | true | 53 | ||
200 | Avon | 1 | 200 | false | 200 | ||||||||
200 | 202 | Avon | Brianna | 2 | 200|202 | true | 202 | ||||||
200 | 201 | Avon | Levy | 2 | 200|201 | true | 201 | ||||||
200 | 203 | Avon | Wee-Bey | 2 | 200|203 | true | 203 |
The fFlattenHiearchy function consists of an outer function and a recursive inner function.
The outer function will:
The inner function will:
The below code is used to build the basic hierarchy structure:
( hierarchyTable as table ,parentKeyColumnIdentifier as text ,parentNameColumnIdentifier as text ,childKeyColumnIdentifier as text ,childNameColumnIdentifier as text ) as table => let #"Get Root Parents" = Table.Distinct( Table.SelectColumns( Table.NestedJoin(hierarchyTable ,parentKeyColumnIdentifier ,hierarchyTable ,childKeyColumnIdentifier ,"ROOT.PARENTS" ,JoinKind.LeftAnti ) ,{ parentKeyColumnIdentifier ,parentNameColumnIdentifier } ) ), #"Generate Hierarchy" = fGetNextHierarchyLevel( #"Get Root Parents" ,parentKeyColumnIdentifier ,1 ), fGetNextHierarchyLevel = ( parentsTable as table ,nextParentKeyColumnIdentifier as text ,hierarchyLevel as number ) => let vNextParentKey = childKeyColumnIdentifier & Text.From(hierarchyLevel), vNextParentName = childNameColumnIdentifier & Text.From(hierarchyLevel), #"Left Join - hierarchyTable (Get Children)" = Table.NestedJoin(parentsTable ,nextParentKeyColumnIdentifier ,hierarchyTable ,parentKeyColumnIdentifier ,"NODE.CHILDREN" ,JoinKind.LeftOuter ), #"Expand Column - NODE.CHILDREN" = Table.ExpandTableColumn(#"Left Join - hierarchyTable (Get Children)" ,"NODE.CHILDREN" ,{ childKeyColumnIdentifier ,childNameColumnIdentifier },{ vNextParentKey ,vNextParentName } ), #"Filter Rows - Parents with Children" = Table.SelectRows(#"Expand Column - NODE.CHILDREN" ,each Record.Field(_,vNextParentKey) <> null and Record.Field(_,vNextParentKey) <> Record.Field(_,nextParentKeyColumnIdentifier) ), #"Generate Next Hierarchy Level" = if Table.IsEmpty(#"Filter Rows - Parents with Children") then parentsTable else Table.Combine( { parentsTable ,@fGetNextHierarchyLevel( #"Filter Rows - Parents with Children" ,vNextParentKey ,hierarchyLevel + 1 ) } ) in #"Generate Next Hierarchy Level" in #"Generate Hierarchy"
Additional metadata columns can be added to the hierarchy by using the code below. My original approach was to update these columns in each iteration of the hierarchy and although the code was slightly more digestible in comparison to the below, I did find it came at the expense of performance.
#"Add Column - HierarchyPath" = Table.AddColumn(#"Generate Hierarchy", "HierarchyPath" ,each Text.Combine( List.Transform( Record.FieldValues( Record.SelectFields(_, List.Select(Table.ColumnNames(#"Generate Hierarchy") ,each Text.StartsWith(_,childKeyColumnIdentifier) or Text.StartsWith(_,parentKeyColumnIdentifier) ) ) ) ,each Text.From(_) ) ,"|" ) ,type text ), #"Add Column - HierarchyNodeID" = Table.AddColumn(#"Add Column - HierarchyPath", "HierarchyNodeID" ,each List.Last(Text.Split([HierarchyPath],"|")) ,type text ), #"Add Column - HierarchyLevel" = Table.AddColumn(#"Add Column - HierarchyNodeID", "HierarchyLevel" ,each List.Count(Text.Split([HierarchyPath],"|")) ,Int64.Type ), #"Add Column - IsLeafLevel" = Table.AddColumn(#"Add Column - HierarchyLevel", "IsLeafLevel" ,each List.Contains( List.Transform( Table.Column( Table.NestedJoin(hierarchyTable ,childKeyColumnIdentifier ,hierarchyTable ,parentKeyColumnIdentifier ,"LEAFLEVEL.CHILDREN" ,JoinKind.LeftAnti ) ,childKeyColumnIdentifier ) ,each Text.From(_) ) ,List.Last(Text.Split([HierarchyPath],"|")) ) ,type logical )
Below you will find the full code for the function along with some documentation towards the end. You can plug the below code straight into a blank query in Power BI and reference it from your hierarchy query to flatten it.
let fFlattenHierarchy = ( hierarchyTable as table ,parentKeyColumnIdentifier as text ,parentNameColumnIdentifier as text ,childKeyColumnIdentifier as text ,childNameColumnIdentifier as text ) as table => let #"Get Root Parents" = Table.Distinct( Table.SelectColumns( Table.NestedJoin(hierarchyTable ,parentKeyColumnIdentifier ,hierarchyTable ,childKeyColumnIdentifier ,"ROOT.PARENTS" ,JoinKind.LeftAnti ) ,{ parentKeyColumnIdentifier ,parentNameColumnIdentifier } ) ), #"Generate Hierarchy" = fGetNextHierarchyLevel( #"Get Root Parents" ,parentKeyColumnIdentifier ,1 ), fGetNextHierarchyLevel = ( parentsTable as table ,nextParentKeyColumnIdentifier as text ,hierarchyLevel as number ) => let vNextParentKey = childKeyColumnIdentifier & Text.From(hierarchyLevel), vNextParentName = childNameColumnIdentifier & Text.From(hierarchyLevel), #"Left Join - hierarchyTable (Get Children)" = Table.NestedJoin(parentsTable ,nextParentKeyColumnIdentifier ,hierarchyTable ,parentKeyColumnIdentifier ,"NODE.CHILDREN" ,JoinKind.LeftOuter ), #"Expand Column - NODE.CHILDREN" = Table.ExpandTableColumn(#"Left Join - hierarchyTable (Get Children)" ,"NODE.CHILDREN" ,{ childKeyColumnIdentifier ,childNameColumnIdentifier },{ vNextParentKey ,vNextParentName } ), #"Filter Rows - Parents with Children" = Table.SelectRows(#"Expand Column - NODE.CHILDREN" ,each Record.Field(_,vNextParentKey) <> null and Record.Field(_,vNextParentKey) <> Record.Field(_,nextParentKeyColumnIdentifier) ), #"Generate Next Hierarchy Level" = if Table.IsEmpty(#"Filter Rows - Parents with Children") then parentsTable else Table.Combine( { parentsTable ,@fGetNextHierarchyLevel( #"Filter Rows - Parents with Children" ,vNextParentKey ,hierarchyLevel + 1 ) } ) in #"Generate Next Hierarchy Level", #"Add Column - HierarchyPath" = Table.AddColumn(#"Generate Hierarchy", "HierarchyPath" ,each Text.Combine( List.Transform( Record.FieldValues( Record.SelectFields(_, List.Select(Table.ColumnNames(#"Generate Hierarchy") ,each Text.StartsWith(_,childKeyColumnIdentifier) or Text.StartsWith(_,parentKeyColumnIdentifier) ) ) ) ,each Text.From(_) ) ,"|" ) ,type text ), #"Add Column - HierarchyNodeID" = Table.AddColumn(#"Add Column - HierarchyPath", "HierarchyNodeID" ,each List.Last(Text.Split([HierarchyPath],"|")) ,type text ), #"Add Column - HierarchyLevel" = Table.AddColumn(#"Add Column - HierarchyNodeID", "HierarchyLevel" ,each List.Count(Text.Split([HierarchyPath],"|")) ,Int64.Type ), #"Add Column - IsLeafLevel" = Table.AddColumn(#"Add Column - HierarchyLevel", "IsLeafLevel" ,each List.Contains( List.Transform( Table.Column( Table.NestedJoin(hierarchyTable ,childKeyColumnIdentifier ,hierarchyTable ,parentKeyColumnIdentifier ,"LEAFLEVEL.CHILDREN" ,JoinKind.LeftAnti ) ,childKeyColumnIdentifier ) ,each Text.From(_) ) ,List.Last(Text.Split([HierarchyPath],"|")) ) ,type logical ) in #"Add Column - IsLeafLevel", //Documentation fFlattenHierarchyType = type function ( hierarchyTable as (type table meta [ Documentation.FieldCaption = "Hierarchy" ,Documentation.LongDescription = "A table containing a parent-child hierarchy" ] ) ,parentKeyColumnIdentifier as (type text meta [ Documentation.FieldCaption = "Parent Key Column Identifier" ,Documentation.LongDescription = "The name of the column used to identify the key of the parent node in the hierarchy" ,Documentation.SampleValues = { "ParentID" } ] ) ,parentNameColumnIdentifier as (type text meta [ Documentation.FieldCaption = "Parent Name Column Identifier" ,Documentation.LongDescription = "The name of the column used to identify the name of the parent node in the hierarchy" ,Documentation.SampleValues = { "ParentName" } ] ) ,childKeyColumnIdentifier as (type text meta [ Documentation.FieldCaption = "Child Key Column Identifier" ,Documentation.LongDescription = "The name of the column used to identify the key of the child node in the hierarchy" ,Documentation.SampleValues = { "ChildID" } ] ) ,childNameColumnIdentifier as (type text meta [ Documentation.FieldCaption = "Child Name Column Identifier" ,Documentation.LongDescription = "The name of the column used to identify the name of the child node in the hierarchy" ,Documentation.SampleValues = { "ChildName" } ] ) ) as list meta [ Documentation.Name = "fFlattenHierarchy" ,Documentation.LongDescription = "Returns a flattened hierarchy table from a parent-child hierarchy table input." & "The number of columns returned is based on the depth of the hierarchy. Each child node will be prefixed" & "with the value specified for the childNameColumnIdentifier parameter" ,Documentation.Examples = { [ Description = "Returns a flattened hierarchy table from a parent-child hierarchy table" ,Code = "fFlattenHierarchy(barksdaleOrganisation, ""ParentNodeID"", ""ParentNodeName"", ""ChildNodeID"", ""ChildNodeName"")" ,Result = "{100,2,3,51,62,""Stringer"",""Shamrock"",""Slim Charles"",""Bodie"",""Pudding"",5,""100|2|3|51|62"",TRUE,62}" & ",{100,2,3,51,""Stringer"",""Shamrock"",""Slim Charles"",""Bodie"",4,""100|2|3|51"",FALSE,51}" ] } ] in Value.ReplaceType(fFlattenHierarchy, fFlattenHierarchyType)
It looks fairly verbose but in reality, it’s just 10 steps to flatten a hierarchy and bolt on the metadata columns, you could probably blame my code formatting for thinking otherwise. Anyone who is used to seeing common table expressions in SQL will hopefully find the logic defined in the function familiar.
In terms of performance, I’ve not pitted this against other methods nor done any performance testing but I’ve executed it against a ragged hierarchy with 11 levels spanning several thousand rows and it spat out results fairly instantaneously.