Graphs - Features

Summary of GNN lectures

Node-Level Features#

  • Importance-based features: Useful for predicting influential nodes in a graph

    Example: predicting celebrity users in a social network

    • Node degree
    • Different node centrality measures
  • Structure-based features: Useful for predicting a particular role a node plays in a graph:

    Example: Predicting protein functionality in a protein-protein interaction network.

    • Node degree
    • Clustering coefficient
    • Graphlet count vector

Goal: Characterize the structure and position of a node in the network

Node Degrees#

Node degree counts the neighboring nodes without capturing their importance.

kik_i: the number of edges adjacent to node ii

  • Undirected

    kˉ=1Ni=1Nki=2EN  \bar{k}=\frac{1}{N}\sum_{i=1}^{N}k_i=\frac{2E}{N}
  • Directed

    kˉ=EN, kˉin=kˉout  \bar{k} = \frac{E}{N},\ \bar{k}^{in} = \bar{k}^{out}

Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets

Authors-to-Papers (they authored)

Node centrality#

cvc_v, takes the node importance in a graph into account.

Eigenvector centrality#

cv=1λuN(v)cuc_v = \frac{1}{\lambda}\sum_{u \in N(v)}c_u

λ\lambda is normalization const. (largest eigenvalue of A\bf A).

the above equation models centrality in a recursive manner.

Eigenvector and Eigenvalue

Au=λu  A\bf{u} = \lambda \bf{u}

AA: matrix, u\bf{u}: Eigenvector, λ\lambda: Eigenvalue

AλI=0|A - \lambda I| = 0

The prefix eigen- is adopted from the German word eigen(cognate with the English word own) for ‘proper’, ‘characteristic’, ‘own’.

Rewrite the recursive equation in the matrix form.

λc=Ac\lambda \bf{c} = \bf {Ac}

A\bf A: Adjacency matrix

c\bf c: Centrality vector

The largest eigenvalue λmax\lambda_{max} is always positive and unique (by Perron-Frobenius Theorem).

The eigenvector cmaxc_{max} corresponding to λmax\lambda_{max} is used for centrality.

Betweenness centrality#

A node is important if it lies on many shortest paths between other nodes.

cv=svt#(shortest paths between s and t that contain v)#(shortest paths between s and t)c_v = \sum_{s\neq v\neq t}
\frac{\text{\#(shortest paths between {\it s} and {\it t} that contain {\it v})}}
{\text{\#(shortest paths between {\it s} and {\it t})}}

Closeness centrality#

A node is important if it has small shortest path lengths to all other nodes.

cv=1uvshortest path length between u and vc_v = \frac{1}{\sum_{u\neq v}
\text{shortest path length between {\it u} and {\it v}}}
  • Some more.. centrality

Clustering coefficient#

Measures how connected vv‘s neighboring nodes.

해당 노드와 연결된 노드의 연결성이 어떤지.

frac=NumeratorDenominator\text{frac} = \frac{\text{Numerator}}{\text{Denominator}}

ev=#(edges among neighboring nodes)(kv2)[0,1]e_v = \frac{\text{\#(edges among neighboring nodes)}}{\binom{k_v}{2}} \small \in [0, 1]

Graphlet#

Observation: Clustering coefficient counts the #(triangles) in the ego-network

Induced subgraph is another graph, formed from a subset of vertices and all of the edges connecting the vertices in the subset.

Graph Isomorphism: Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic.

We can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets).

https://doi.org/10.1038/s41598-020-69795-1 https://doi.org/10.1038/s41598-020-69795-1

Goal: Describe network structure around node uu

Graphlets are small subgraphs that describe the structure of node uu’s network neighborhood.

Graphlet Degree Vector (GDV): Graphlet-base features for nodes

  • if possible graphlets are a, b, c, d.
    • to count the induced subgraph of each graphlet in a graph.
    • result vector: [2, 1, 0, 2]

Analogy:

  • Degree counts #(edges) that a node touches

  • Clustering coefficient counts #(triangles) that a node touches.

  • GDV counts #(graphlets) that a node tt

Graphlet degree vector provides a measure of a node’s local network topology:

a more detailed measure of local topological similarity than node degrees or clustering coefficient.

But! Graphlet is a expensive method.


Link-Level Features#

Two formulations of the link prediction task:

  • Links missing at random: Remove a random set of links and then aim to predict them
  • Links over time: Given G[t0,t0]G[t_0, t_0'] a graph defined by edges up to time t0t_0', output a ranked list LL of edges (not in G[t0,t0]G[t_0, t_0']) that are predicted to appear in time G[t1,t1]G[t_1, t_1']
Predict connectivity via Proximity (CS224W lecture slide)

Distance-based feature#

Local neighborhood overlap#

  • Common neighbors: N(v1)N(v2)|N(v_1) \cap N(v_2)|

  • Jaccard’s coefficient: N(v1)N(v2)N(v1)N(v2)\frac{|N(v_1) \cap N(v_2)|}{|N(v_1) \cup N(v_2)|}

  • Adamic-Adar index: uN(v1)N(v2)1log(ku)\sum_{u\in N(v_1)\cap N(v_2)}\frac{1}{\log{(k_u)}}

Limitation: Metric is always zero if the two nodes do not have any neighbors in common.

Global neighborhood overlap#

Katz index#

count the number of walks of all lengths between a given pair of nodes.

Puv(K)P_{uv}^{(K)}: # walks of length KK between uu and vv

P(K)=AkP^{(K)} = A^k, Use adjacency matrix powers!

Sv1v2=l=1βlAv1v2lS_{v_1 v_2} = \sum_{l=1}^{\infty}\beta^l A^l_{v_1 v_2}

β(0,1)\beta \in (0, 1): discount factor

i=0βiAi=(IβA)1\sum_{i=0}^{\infty}\beta^i A^i = (I - \beta A)^{-1}

Katz index matrix is computed in closed-form:

S=i=1βiAi=(IβA)1IS = \sum_{i=1}^{\infty}\beta^i A^i = (I - \beta A)^{-1} - I

Graph-Level Features#

Goal: We want features that characterize the structure of an entire graph.

cs224w - lecture 2. slide

Graph Kernels#

Measure similarity between two graphs:

  • Graphlet Kernel, Shervashidze, Nino, et al. “Efficient graphlet kernels for large graph comparison.” Artificial Intelligence and Statistics. 2009.
  • Weisfeiler-Lehman Kernel, Shervashidze, Nino, et al. “Weisfeiler-lehman graph kernels.” Journal of Machine Learning Research 12.9 (2011).
  • Random-walk kernel
  • Shortest-path graph kernel

Goal: Design graph feature vector ϕ(G)\phi (G)

Key idea: Bag-of-Words (BoW) for a graph

  • Recall: BoW simply uses the word counts as features for documents (no ordering considered)

단순히 노드를 단어로 칭하고 진행하면,

cs224w - lecture 2. slide

위 처럼 같다고 판단하게 된다.

cs224w - lecture 2. slide

Both Graphlet Kernel and Weisfeihler-Lehman(WL) Kernel use Bag-of-* representation of graph, where * is more sophisticated than node degrees!

Graphlet Kernel#

Key idea: Count the number of different graphlets in a graph.

Definition of graphlets here is slightly different from the node-level features.

The two differences are:

  • Nodes in graphlets here do not need to be connected (allows for isolated nodes)
  • The graphlets here are not rooted.

Computational efficiency is not good.

Weisfeiler-Lehman Kernel#

cs224w - lecture 2. slide

Process

  1. Assign an initial color c(0)(v)c^{(0)}(v) to each node vv.
  2. Iteratively refine node colors by
    ck+1(v)=HASH(c(k)(v),c(k)(u)uN(v))  c^{k+1}(v) = \text{HASH}({c^{(k)}(v),{c^{(k)}(u)}_{u\in N(v)}})
  3. After KK steps of color refinement, c(K)(v)c^{(K)}(v) summarizes the structure of KK-hop neighborhood
  • Different colors capture different 𝐾-hop neighborhood structures

Graph is represented as Bag-of-colors

Computationally efficient

Closely related to Graph Neural Networks