Summary of GNN lectures
Importance-based features: Useful for predicting influential nodes in a graph
Example: predicting celebrity users in a social network
Structure-based features: Useful for predicting a particular role a node plays in a graph:
Example: Predicting protein functionality in a protein-protein interaction network.
Goal: Characterize the structure and position of a node in the network
Node degree counts the neighboring nodes without capturing their importance.
: the number of edges adjacent to node
Undirected
Directed
Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets
Authors-to-Papers (they authored)
, takes the node importance in a graph into account.
is normalization const. (largest eigenvalue of ).
the above equation models centrality in a recursive manner.
Eigenvector and Eigenvalue
: matrix, : Eigenvector, : Eigenvalue
The prefix eigen- is adopted from the German word eigen(cognate with the English word own) for ‘proper’, ‘characteristic’, ‘own’.
- Eigenvectors and eigenvalues - 3Blue1Brown (YouTube)
- Eigenvalue - mathsisfun.com
Rewrite the recursive equation in the matrix form.
: Adjacency matrix
: Centrality vector
The largest eigenvalue is always positive and unique (by Perron-Frobenius Theorem).
The eigenvector corresponding to is used for centrality.
A node is important if it lies on many shortest paths between other nodes.
A node is important if it has small shortest path lengths to all other nodes.
Measures how connected ‘s neighboring nodes.
해당 노드와 연결된 노드의 연결성이 어떤지.
Observation: Clustering coefficient counts the #(triangles) in the ego-network
Induced subgraph is another graph, formed from a subset of vertices and all of the edges connecting the vertices in the subset.
Graph Isomorphism: Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic.
We can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets).
https://doi.org/10.1038/s41598-020-69795-1
Goal: Describe network structure around node
Graphlets are small subgraphs that describe the structure of node ’s network neighborhood.
Graphlet Degree Vector (GDV): Graphlet-base features for nodes
Analogy:
Degree counts #(edges) that a node touches
Clustering coefficient counts #(triangles) that a node touches.
GDV counts #(graphlets) that a node
Graphlet degree vector provides a measure of a node’s local network topology:
a more detailed measure of local topological similarity than node degrees or clustering coefficient.
But! Graphlet is a expensive method.
Two formulations of the link prediction task:
Predict connectivity via Proximity (CS224W lecture slide)
Common neighbors:
Jaccard’s coefficient:
Adamic-Adar index:
Limitation: Metric is always zero if the two nodes do not have any neighbors in common.
count the number of walks of all lengths between a given pair of nodes.
: # walks of length between and
, Use adjacency matrix powers!
: discount factor
Katz index matrix is computed in closed-form:
Goal: We want features that characterize the structure of an entire graph.
cs224w - lecture 2. slide
Measure similarity between two graphs:
Goal: Design graph feature vector
Key idea: Bag-of-Words (BoW) for a graph
단순히 노드를 단어로 칭하고 진행하면,
cs224w - lecture 2. slide
위 처럼 같다고 판단하게 된다.
cs224w - lecture 2. slide
Both Graphlet Kernel and Weisfeihler-Lehman(WL) Kernel use Bag-of-* representation of graph, where * is more sophisticated than node degrees!
Key idea: Count the number of different graphlets in a graph.
Definition of graphlets here is slightly different from the node-level features.
The two differences are:
Computational efficiency is not good.
cs224w - lecture 2. slide
Process
Graph is represented as Bag-of-colors
Computationally efficient
Closely related to Graph Neural Networks