Monthly Archive February 2019

ByJames Pretorius

Dynamically Flatten a Parent-Child Hierarchy using Power Query M

Introduction

For those of you familiar with recursive common table expressions in SQL, iterating through a parent-child hierarchy of data is a fairly straight forward process. There are several examples available which demonstrate how one could approach this problem in Power Query M. In this post, we’ll use recursion and dynamic field retrieval to loop through and dynamically flatten a parent-child hierarchy using Power Query M.

Show and Tell

Before we begin, let’s take a quick look at an example of the function being invoked in Power BI. This should hopefully give you some idea of where we’re heading with this.

Dynamically Flatten a Parent-Child Hierarchy using Power Query M

Dynamically Flatten a Parent-Child Hierarchy using Power Query M

Sample Data

Let’s look at some sample data to get a feel for the type of data the function requires as input and the resultant dataset the function will output.

Input

ParentNodeIDParentNodeNameChildNodeIDChildNodeName
100Stringer2Shamrock
200Avon201Levy
200Avon202Brianna
200Avon203Wee-Bey
2Shamrock3Slim Charles
3Slim Charles51Bodie
3Slim Charles52Poot
3Slim Charles53Bernard
51Bodie61Sterling
51Bodie62Pudding
52Poot61Sterling
52Poot62Pudding

Output

ParentNodeIDChildNodeID1ChildNodeID2ChildNodeID3ChildNodeID4ParentNodeNameChildNodeName1ChildNodeName2ChildNodeName3ChildNodeName4HierarchyLevelHierarchyPathIsLeafLevelHierarchyNodeID
100Stringer1100false100
1002StringerShamrock2100|2false2
10023StringerShamrockSlim Charles3100|2|3false3
1002351StringerShamrockSlim CharlesBodie4100|2|3|51false51
100235161StringerShamrockSlim CharlesBodieSterling5100|2|3|51|61true61
100235162StringerShamrockSlim CharlesBodiePudding5100|2|3|51|62true62
1002352StringerShamrockSlim CharlesPoot4100|2|3|52false52
100235261StringerShamrockSlim CharlesPootSterling5100|2|3|52|61true61
100235262StringerShamrockSlim CharlesPootPudding5100|2|3|52|62true62
1002353StringerShamrockSlim CharlesBernard4100|2|3|53true53
200Avon1200false200
200202AvonBrianna2200|202true202
200201AvonLevy2200|201true201
200203AvonWee-Bey2200|203true203

Power Query M Function

The fFlattenHiearchy function consists of an outer function and a recursive inner function.

The outer function will:

  1. Identify the top level parents
  2. Trigger the flattening
  3. Bolt on a few metadata columns

The inner function will:

  1. Self join onto the input hierarchy table to identify the next level of children within the hierarchy
  2. Filter out rows where the parent = child, to avoid infinite recursion

Flatten the hierarchy

The below code is used to build the basic hierarchy structure:

(
  hierarchyTable as table
  ,parentKeyColumnIdentifier as text
  ,parentNameColumnIdentifier as text
  ,childKeyColumnIdentifier as text
  ,childNameColumnIdentifier as text
) as table =>
  let
    #"Get Root Parents" = Table.Distinct(
      Table.SelectColumns(
        Table.NestedJoin(hierarchyTable
          ,parentKeyColumnIdentifier
          ,hierarchyTable
          ,childKeyColumnIdentifier
          ,"ROOT.PARENTS"
          ,JoinKind.LeftAnti
        )
        ,{
          parentKeyColumnIdentifier
          ,parentNameColumnIdentifier
        }
      )
    ),
    #"Generate Hierarchy" = fGetNextHierarchyLevel(
      #"Get Root Parents"
      ,parentKeyColumnIdentifier
      ,1
    ),
    fGetNextHierarchyLevel = (
      parentsTable as table
      ,nextParentKeyColumnIdentifier as text
      ,hierarchyLevel as number
    ) =>
      let 
        vNextParentKey = childKeyColumnIdentifier & Text.From(hierarchyLevel),
        vNextParentName = childNameColumnIdentifier & Text.From(hierarchyLevel),
        #"Left Join - hierarchyTable (Get Children)" = Table.NestedJoin(parentsTable
          ,nextParentKeyColumnIdentifier
          ,hierarchyTable
          ,parentKeyColumnIdentifier
          ,"NODE.CHILDREN"
          ,JoinKind.LeftOuter
        ),
        #"Expand Column - NODE.CHILDREN" = Table.ExpandTableColumn(#"Left Join - hierarchyTable (Get Children)"
          ,"NODE.CHILDREN"
          ,{
              childKeyColumnIdentifier
              ,childNameColumnIdentifier
          },{
              vNextParentKey
              ,vNextParentName
          }
        ),
        #"Filter Rows - Parents with Children" = Table.SelectRows(#"Expand Column - NODE.CHILDREN"
          ,each Record.Field(_,vNextParentKey) <> null 
              and Record.Field(_,vNextParentKey) <> Record.Field(_,nextParentKeyColumnIdentifier)
        ),
        #"Generate Next Hierarchy Level" = if Table.IsEmpty(#"Filter Rows - Parents with Children")
          then parentsTable 
          else Table.Combine(
            {
              parentsTable
              ,@fGetNextHierarchyLevel(
                #"Filter Rows - Parents with Children"
                ,vNextParentKey
                ,hierarchyLevel + 1
              )
            }
        )
      in 
        #"Generate Next Hierarchy Level"
  in
     #"Generate Hierarchy"

Add additional metadata

Additional metadata columns can be added to the hierarchy by using the code below. My original approach was to update these columns in each iteration of the hierarchy and although the code was slightly more digestible in comparison to the below, I did find it came at the expense of performance.

#"Add Column - HierarchyPath" = Table.AddColumn(#"Generate Hierarchy",
      "HierarchyPath"
      ,each Text.Combine(
        List.Transform(
          Record.FieldValues(
            Record.SelectFields(_,
              List.Select(Table.ColumnNames(#"Generate Hierarchy")
                ,each Text.StartsWith(_,childKeyColumnIdentifier) 
                  or Text.StartsWith(_,parentKeyColumnIdentifier)
              )
            )
          )
          ,each Text.From(_)
        )
        ,"|"
      )
      ,type text
    ),
    #"Add Column - HierarchyNodeID" = Table.AddColumn(#"Add Column - HierarchyPath",
      "HierarchyNodeID"
      ,each List.Last(Text.Split([HierarchyPath],"|"))
      ,type text
    ),
    #"Add Column - HierarchyLevel" = Table.AddColumn(#"Add Column - HierarchyNodeID",
      "HierarchyLevel"
      ,each List.Count(Text.Split([HierarchyPath],"|"))
      ,Int64.Type
    ),
    #"Add Column - IsLeafLevel" = Table.AddColumn(#"Add Column - HierarchyLevel",
      "IsLeafLevel"
      ,each List.Contains(
          List.Transform(
            Table.Column(
              Table.NestedJoin(hierarchyTable
              ,childKeyColumnIdentifier
              ,hierarchyTable
              ,parentKeyColumnIdentifier
              ,"LEAFLEVEL.CHILDREN"
              ,JoinKind.LeftAnti
              )
              ,childKeyColumnIdentifier
            )
            ,each Text.From(_)
          )
        ,List.Last(Text.Split([HierarchyPath],"|"))
      )
      ,type logical
    )

The whole caboodle

Below you will find the full code for the function along with some documentation towards the end. You can plug the below code straight into a blank query in Power BI and reference it from your hierarchy query to flatten it.

let 
  fFlattenHierarchy = (
    hierarchyTable as table
    ,parentKeyColumnIdentifier as text
    ,parentNameColumnIdentifier as text
    ,childKeyColumnIdentifier as text
    ,childNameColumnIdentifier as text
  ) as table =>
  let
    #"Get Root Parents" = Table.Distinct(
      Table.SelectColumns(
        Table.NestedJoin(hierarchyTable
          ,parentKeyColumnIdentifier
          ,hierarchyTable
          ,childKeyColumnIdentifier
          ,"ROOT.PARENTS"
          ,JoinKind.LeftAnti
        )
        ,{
          parentKeyColumnIdentifier
          ,parentNameColumnIdentifier
        }
      )
    ),
    #"Generate Hierarchy" = fGetNextHierarchyLevel(
      #"Get Root Parents"
      ,parentKeyColumnIdentifier
      ,1
    ),
    fGetNextHierarchyLevel = (
      parentsTable as table
      ,nextParentKeyColumnIdentifier as text
      ,hierarchyLevel as number
    ) =>
      let 
        vNextParentKey = childKeyColumnIdentifier & Text.From(hierarchyLevel),
        vNextParentName = childNameColumnIdentifier & Text.From(hierarchyLevel),
        #"Left Join - hierarchyTable (Get Children)" = Table.NestedJoin(parentsTable
          ,nextParentKeyColumnIdentifier
          ,hierarchyTable
          ,parentKeyColumnIdentifier
          ,"NODE.CHILDREN"
          ,JoinKind.LeftOuter
        ),
        #"Expand Column - NODE.CHILDREN" = Table.ExpandTableColumn(#"Left Join - hierarchyTable (Get Children)"
          ,"NODE.CHILDREN"
          ,{
              childKeyColumnIdentifier
              ,childNameColumnIdentifier
          },{
              vNextParentKey
              ,vNextParentName
          }
        ),
        #"Filter Rows - Parents with Children" = Table.SelectRows(#"Expand Column - NODE.CHILDREN"
          ,each Record.Field(_,vNextParentKey) <> null 
              and Record.Field(_,vNextParentKey) <> Record.Field(_,nextParentKeyColumnIdentifier)
        ),
        #"Generate Next Hierarchy Level" = if Table.IsEmpty(#"Filter Rows - Parents with Children")
          then parentsTable 
          else Table.Combine(
            {
              parentsTable
              ,@fGetNextHierarchyLevel(
                #"Filter Rows - Parents with Children"
                ,vNextParentKey
                ,hierarchyLevel + 1
              )
            }
        )
      in 
        #"Generate Next Hierarchy Level",
    #"Add Column - HierarchyPath" = Table.AddColumn(#"Generate Hierarchy",
      "HierarchyPath"
      ,each Text.Combine(
        List.Transform(
          Record.FieldValues(
            Record.SelectFields(_,
              List.Select(Table.ColumnNames(#"Generate Hierarchy")
                ,each Text.StartsWith(_,childKeyColumnIdentifier) 
                  or Text.StartsWith(_,parentKeyColumnIdentifier)
              )
            )
          )
          ,each Text.From(_)
        )
        ,"|"
      )
      ,type text
    ),
    #"Add Column - HierarchyNodeID" = Table.AddColumn(#"Add Column - HierarchyPath",
      "HierarchyNodeID"
      ,each List.Last(Text.Split([HierarchyPath],"|"))
      ,type text
    ),
    #"Add Column - HierarchyLevel" = Table.AddColumn(#"Add Column - HierarchyNodeID",
      "HierarchyLevel"
      ,each List.Count(Text.Split([HierarchyPath],"|"))
      ,Int64.Type
    ),
    #"Add Column - IsLeafLevel" = Table.AddColumn(#"Add Column - HierarchyLevel",
      "IsLeafLevel"
      ,each List.Contains(
          List.Transform(
            Table.Column(
              Table.NestedJoin(hierarchyTable
              ,childKeyColumnIdentifier
              ,hierarchyTable
              ,parentKeyColumnIdentifier
              ,"LEAFLEVEL.CHILDREN"
              ,JoinKind.LeftAnti
              )
              ,childKeyColumnIdentifier
            )
            ,each Text.From(_)
          )
        ,List.Last(Text.Split([HierarchyPath],"|"))
      )
      ,type logical
    )
  in
    #"Add Column - IsLeafLevel",
  //Documentation
  fFlattenHierarchyType = type function (
    hierarchyTable as (type table meta [
      Documentation.FieldCaption = "Hierarchy"
      ,Documentation.LongDescription = "A table containing a parent-child hierarchy"
      ]
    )
    ,parentKeyColumnIdentifier as (type text meta [
      Documentation.FieldCaption = "Parent Key Column Identifier"
      ,Documentation.LongDescription = "The name of the column used to identify the key of the parent node in the hierarchy"
      ,Documentation.SampleValues = { "ParentID" }
      ]
    )
    ,parentNameColumnIdentifier as (type text meta [
      Documentation.FieldCaption = "Parent Name Column Identifier"
      ,Documentation.LongDescription = "The name of the column used to identify the name of the parent node in the hierarchy"
      ,Documentation.SampleValues = { "ParentName" }
      ]
    )
    ,childKeyColumnIdentifier as (type text meta [
      Documentation.FieldCaption = "Child Key Column Identifier"
      ,Documentation.LongDescription = "The name of the column used to identify the key of the child node in the hierarchy"
      ,Documentation.SampleValues = { "ChildID" }
      ]
    )
    ,childNameColumnIdentifier as (type text meta [
      Documentation.FieldCaption = "Child Name Column Identifier"
      ,Documentation.LongDescription = "The name of the column used to identify the name of the child node in the hierarchy"
      ,Documentation.SampleValues = { "ChildName" }
      ]
    )
  ) as list meta [
    Documentation.Name = "fFlattenHierarchy"
    ,Documentation.LongDescription = "Returns a flattened hierarchy table from a parent-child hierarchy table input."
      & "The number of columns returned is based on the depth of the hierarchy. Each child node will be prefixed"
      & "with the value specified for the childNameColumnIdentifier parameter"
    ,Documentation.Examples = {
      [
      Description = "Returns a flattened hierarchy table from a parent-child hierarchy table"
      ,Code = "fFlattenHierarchy(barksdaleOrganisation, ""ParentNodeID"", ""ParentNodeName"", ""ChildNodeID"", ""ChildNodeName"")"
      ,Result = "{100,2,3,51,62,""Stringer"",""Shamrock"",""Slim Charles"",""Bodie"",""Pudding"",5,""100|2|3|51|62"",TRUE,62}"
        & ",{100,2,3,51,""Stringer"",""Shamrock"",""Slim Charles"",""Bodie"",4,""100|2|3|51"",FALSE,51}"
      ]
    }
  ]
in
  Value.ReplaceType(fFlattenHierarchy, fFlattenHierarchyType)

Final Thoughts

It looks fairly verbose but in reality, it’s just 10 steps to flatten a hierarchy and bolt on the metadata columns, you could probably blame my code formatting for thinking otherwise. Anyone who is used to seeing common table expressions in SQL will hopefully find the logic defined in the function familiar.

In terms of performance, I’ve not pitted this against other methods nor done any performance testing but I’ve executed it against a ragged hierarchy with 11 levels spanning several thousand rows and it spat out results fairly instantaneously.