Graphs - Features

Importance-based features: Useful for predicting influential nodes in a graph

Example: predicting celebrity users in a social network
- Node degree
- Different node centrality measures
Structure-based features: Useful for predicting a particular role a node plays in a graph:

Example: Predicting protein functionality in a protein-protein interaction network.
- Node degree
- Clustering coefficient
- Graphlet count vector

Goal: Characterize the structure and position of a node in the network

Node Degrees#

Node degree counts the neighboring nodes without capturing their importance.

$k_i$ : the number of edges adjacent to node $i$

Undirected

\bar{k}=\frac{1}{N}\sum_{i=1}^{N}k_i=\frac{2E}{N}

Directed

\bar{k} = \frac{E}{N},\ \bar{k}^{in} = \bar{k}^{out}

Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets

Authors-to-Papers (they authored)

Node centrality#

$c_v$ , takes the node importance in a graph into account.

Eigenvector centrality#

c_v = \frac{1}{\lambda}\sum_{u \in N(v)}c_u

$\lambda$ is normalization const. (largest eigenvalue of $\bf A$ ).

the above equation models centrality in a recursive manner.

Eigenvector and Eigenvalue
 $A\bf{u} = \lambda \bf{u}$ 
$A$ : matrix, $\bf{u}$ : Eigenvector, $\lambda$ : Eigenvalue

$|A - \lambda I| = 0$

The prefix eigen- is adopted from the German word eigen(cognate with the English word own) for ‘proper’, ‘characteristic’, ‘own’.

Eigenvectors and eigenvalues - 3Blue1Brown (YouTube)

Eigenvalue - mathsisfun.com

Rewrite the recursive equation in the matrix form.

\lambda \bf{c} = \bf {Ac}

$\bf A$ : Adjacency matrix

$\bf c$ : Centrality vector

The largest eigenvalue $\lambda_{max}$ is always positive and unique (by Perron-Frobenius Theorem).

The eigenvector $c_{max}$ corresponding to $\lambda_{max}$ is used for centrality.

Betweenness centrality#

A node is important if it lies on many shortest paths between other nodes.

c_v = \sum_{s\neq v\neq t} \frac{\text{\#(shortest paths between {\it s} and {\it t} that contain {\it v})}} {\text{\#(shortest paths between {\it s} and {\it t})}}

Closeness centrality#

A node is important if it has small shortest path lengths to all other nodes.

c_v = \frac{1}{\sum_{u\neq v} \text{shortest path length between {\it u} and {\it v}}}

Some more.. centrality

Clustering coefficient#

Measures how connected $v$ ‘s neighboring nodes.

해당 노드와 연결된 노드의 연결성이 어떤지.

$\text{frac} = \frac{\text{Numerator}}{\text{Denominator}}$

e_v = \frac{\text{\#(edges among neighboring nodes)}}{\binom{k_v}{2}} \small \in [0, 1]

Graphlet#

Observation: Clustering coefficient counts the #(triangles) in the ego-network

Induced subgraph is another graph, formed from a subset of vertices and all of the edges connecting the vertices in the subset.

Graph Isomorphism: Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic.

We can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets).

https://doi.org/10.1038/s41598-020-69795-1

Goal: Describe network structure around node $u$

Graphlets are small subgraphs that describe the structure of node $u$ ’s network neighborhood.

Graphlet Degree Vector (GDV): Graphlet-base features for nodes

if possible graphlets are a, b, c, d.
- to count the induced subgraph of each graphlet in a graph.
- result vector: [2, 1, 0, 2]

Analogy:

Degree counts #(edges) that a node touches
Clustering coefficient counts #(triangles) that a node touches.
GDV counts #(graphlets) that a node $t$

Graphlet degree vector provides a measure of a node’s local network topology:

a more detailed measure of local topological similarity than node degrees or clustering coefficient.

But! Graphlet is a expensive method.

Link-Level Features#

Two formulations of the link prediction task:

Links missing at random: Remove a random set of links and then aim to predict them
Links over time: Given $G[t_0, t_0']$ a graph defined by edges up to time $t_0'$ , output a ranked list $L$ of edges (not in $G[t_0, t_0']$ ) that are predicted to appear in time $G[t_1, t_1']$

Predict connectivity via Proximity (CS224W lecture slide)

Distance-based feature#

Local neighborhood overlap#

Common neighbors: $|N(v_1) \cap N(v_2)|$
Jaccard’s coefficient: $\frac{|N(v_1) \cap N(v_2)|}{|N(v_1) \cup N(v_2)|}$
Adamic-Adar index: $\sum_{u\in N(v_1)\cap N(v_2)}\frac{1}{\log{(k_u)}}$

Limitation: Metric is always zero if the two nodes do not have any neighbors in common.

Global neighborhood overlap#

Katz index#

count the number of walks of all lengths between a given pair of nodes.

$P_{uv}^{(K)}$ : # walks of length $K$ between $u$ and $v$

$P^{(K)} = A^k$ , Use adjacency matrix powers!

S_{v_1 v_2} = \sum_{l=1}^{\infty}\beta^l A^l_{v_1 v_2}

$\beta \in (0, 1)$ : discount factor

$\sum_{i=0}^{\infty}\beta^i A^i = (I - \beta A)^{-1}$

Katz index matrix is computed in closed-form:

S = \sum_{i=1}^{\infty}\beta^i A^i = (I - \beta A)^{-1} - I

Graph-Level Features#

Goal: We want features that characterize the structure of an entire graph.

cs224w - lecture 2. slide

Graph Kernels#

Measure similarity between two graphs:

Graphlet Kernel, Shervashidze, Nino, et al. “Efficient graphlet kernels for large graph comparison.” Artificial Intelligence and Statistics. 2009.
Weisfeiler-Lehman Kernel, Shervashidze, Nino, et al. “Weisfeiler-lehman graph kernels.” Journal of Machine Learning Research 12.9 (2011).
Random-walk kernel
Shortest-path graph kernel

Goal: Design graph feature vector $\phi (G)$

Key idea: Bag-of-Words (BoW) for a graph

Recall: BoW simply uses the word counts as features for documents (no ordering considered)

단순히 노드를 단어로 칭하고 진행하면,

cs224w - lecture 2. slide

위 처럼 같다고 판단하게 된다.

cs224w - lecture 2. slide

Both Graphlet Kernel and Weisfeihler-Lehman(WL) Kernel use Bag-of-* representation of graph, where * is more sophisticated than node degrees!

Graphlet Kernel#

Key idea: Count the number of different graphlets in a graph.

Definition of graphlets here is slightly different from the node-level features.

The two differences are:

Nodes in graphlets here do not need to be connected (allows for isolated nodes)
The graphlets here are not rooted.

Computational efficiency is not good.

Weisfeiler-Lehman Kernel#

cs224w - lecture 2. slide

Process

Assign an initial color $c^{(0)}(v)$ to each node $v$ .

Iteratively refine node colors by

c^{k+1}(v) = \text{HASH}({c^{(k)}(v),{c^{(k)}(u)}_{u\in N(v)}})

After $K$ steps of color refinement, $c^{(K)}(v)$ summarizes the structure of $K$ -hop neighborhood

Different colors capture different 𝐾-hop neighborhood structures

Graph is represented as Bag-of-colors

Computationally efficient

Closely related to Graph Neural Networks

previous Graphs - Overview
next Graphs - Embedding

Graphs - Features

Contents

Node-Level Features#

Node Degrees#

Node centrality#

Eigenvector centrality#

Betweenness centrality#

Closeness centrality#

Clustering coefficient#

Graphlet#

Link-Level Features#

Distance-based feature#

Local neighborhood overlap#

Global neighborhood overlap#

Katz index#

Graph-Level Features#

Graph Kernels#

Graphlet Kernel#

Weisfeiler-Lehman Kernel#