Hierarchy Scores

This module contains a Python implementation of the Krackhardt hierarchy score.

Introduction

Biological signaling can be modeled as a directed network, where nodes represent genes/proteins and edges represent signaling interactions.

The hierarchy of such a network can be quantified using various metrics, including the Krackhardt hierarchy score. This score measures the degree to which the network exhibits a perfect hierarchy, with higher scores indicating greater hierarchy.

In R the sna package presents methods to compute graph hierarchy including Krackhardt’s score (defined as \(A/MaxV\) where \(A\) is the number of unordered pairs of points in Dr that are asymmetrically linked), and there are other hierarchy scores implemented in Python such as Flow Hierarchy Score Luo and Magee 2011. However, despite its utility, there is currently no native implementation of the Krackhardt hierarchy score in Python.

Krackhardt hierarchy score

As defined in Krackhardt, David. (1994). Graph Theoretical Dimensions of Informal Organization. Computational Organization Theory. 89.

The graph hierarchy condition states that in a digraph D, for each pair of points where one (Pi) can reach another (Pj), the second (Pj) can't reach the first (Pi). 
For example, in a formal organization chart a high lvl employee can reach through the chain of command her subordinate's subordinate. If the formal organization is working "properly", this lower lvl employee can't simultaneously reach the high lvl employee.
To measure the degree of hierarchy of digraph D, a new digraph Dr must be created. Dr is defined as the reachability digraph of D. Each point in D exists in Dr; moreover, the line (Pi,Pj) exists in Dr if and only if Pi can reach Pj in D. If D is graph hierarchic, then Dr will have no symmetric lines in it (i.e. if the line (Pi,Pj) exists in Dr then the line (Pj,Pi) does not).

The degree of hierarchy then is defined as:

\[ Graph Hierarchy = 1 - [V/MaxV] \] Where \(V\) is the number of unordered pairs of points in Dr that are symmetrically linked , and \(MaxV\) the number of unordered pairs of points in Dr where Pi is linked to Pj or viceversa.

for i in rpy2.situation.iter_info(): # Print Rpy2 info
    print(i)

pandas2ri.activate()

rpy2 version:
3.5.11
Python version:
3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:39:40) [Clang 15.0.7 ]
Looking for R's HOME:
    Environment variable R_HOME: /Users/ferran/miniconda3/envs/pykrack/lib/R
    Calling `R RHOME`: /Users/ferran/miniconda3/envs/pykrack/lib/R
    Environment variable R_LIBS_USER: None
R's additions to LD_LIBRARY_PATH:
/usr/local/lib/R/library/stats/libs/:/usr/local/lib/R/library/stats/libs/
R version:
    In the PATH: R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
    Loading R library from rpy2: OK
Additional directories to load R packages from:
None
C extension compilation:
  include:
  ['/Users/ferran/miniconda3/envs/pykrack/lib/R/include']
  libraries:
  ['R', 'pcre2-8', 'lzma', 'bz2', 'z', 'dl', 'm', 'iconv', 'icuuc', 'icui18n']
  library_dirs:
  ['/Users/ferran/miniconda3/envs/pykrack/lib', '/Users/ferran/miniconda3/envs/pykrack/lib/R/lib', '/Users/ferran/miniconda3/envs/pykrack/lib']
  extra_compile_args:
  ['-std=c99']
  extra_link_args:
  ['-fopenmp', '-Wl,-dead_strip_dylibs', '-Wl,-pie', '-Wl,-headerpad_max_install_names', '-Wl,-dead_strip_dylibs', '-Wl,-rpath,/Users/ferran/miniconda3/envs/pykrack/lib']
Directory for the R shared library:
lib
CFFI extension type
  Environment variable: RPY2_CFFI_MODE
  Value: CFFI_MODE.ANY
  ABI: PRESENT
  API: PRESENT

source

compute_hierarchy

 compute_hierarchy (G, metric='pykrack')

Compute one of the possible hierarchy scores

	Type	Default	Details
G			Directed NetworkX graph
metric	str	pykrack	Type of hierarchy metric to compute. Accepted types are: ‘pykrack’ for this module’s implementation of the Krackhardt score. ‘rsnakrack’ for the sna implementation in R. ‘hierarchy_flow’ for the Luo and Magee 2011 as implemented in the NetworkX package.
Returns	float		One of the possible hierarchy scores

G1 = nx.DiGraph([(1, 2), (2, 3), (1, 5), (2, 4), (4, 6), (5, 6), (3, 1)])

G2 = nx.DiGraph([(1, 2), (2, 3), (1, 5), (2, 4), (4, 6), (5, 6)])

G3 = nx.DiGraph([(1, 2), (4, 6)])

#Graph from OmniPath
#using the following link: https://omnipathdb.org/interactions?genesymbols=yes&fields=type&license=academic&directed=0
#Gets us all interactions, no matter type nor undirected/vs directed
omnidata = pd.read_csv("https://omnipathdb.org/interactions?genesymbols=yes&fields=type&license=academic&directed=1", sep="\t")

G4 = nx.from_pandas_edgelist(omnidata, source="source_genesymbol", target="target_genesymbol",edge_attr="type", create_using=nx.DiGraph)

G = G1

print(graph_properties(G))

Edges:7, Nodes:6, Avg Degree:2.3333333333333335, Density:0.23333333333333334

Using product from itertools is around 4 times faster when iterating through the nodes of the Omnipath graph.

# %%time
# countlist = []
# for i in G.nodes():
#     for j in G.nodes():
#         pair = [i,j]
#         countlist.append(pair)

# print(len(countlist))

# %%time
# countlist = []
# for pair in product(G.nodes(), G.nodes()):
#     countlist.append(pair)

# print(len(countlist))

compute_hierarchy(G, metric="pykrack")

CPU times: user 232 µs, sys: 22 µs, total: 254 µs
Wall time: 257 µs

0.7857142857142857

compute_hierarchy(G, metric="rsnakrack")

CPU times: user 972 ms, sys: 103 ms, total: 1.08 s
Wall time: 1.16 s

0.7857142857142857

compute_hierarchy(G, metric="hierarchy_flow")

CPU times: user 323 µs, sys: 19 µs, total: 342 µs
Wall time: 349 µs

0.5714285714285714