GCN

Adjacency matrix를 input으로 활용하여 labeled and unlabeled data 모두에서 학습을 가능하게 하였다.
- “the model learn smooth representations of the unlabeled data”
Spectral graph convolutions; chebyshev polynomials를 단순화 하면서 효율성과 표현력을 유지하려고 하였다.
Re-normalize trick을 이용함으로 stability를 높이고자 하였다.

Notation#

$f(\cdot)$ : a neural network like differentiable function
$\lambda$ : a weight decay coefficient
$X$ : a matrix of node feature vectors
$\Delta = D - A$ : the unnormalized graph Laplacian of an undirected graph.
$\cal G = (V, E)$ : undirected graph
$v_i \in \cal N$ : Nodes
$(v_i, v_j) \in \cal E$ : edges
$A \in {\Bbb R}^{N \times N}$ : an adjacency matrix (binary or weighted)
$D_{ii} = \sum_j A_{ij}$ : a degree matrix

Introduction#

기존의 연구들은 ground-truth에 의존하며, 연결된 노드들이 같은 label을 공유하고 있을거라 가정한다. label이 있는 node에 overfit 되기 쉬웠다. overfit을 줄이고자 graph Laplacian regularization을 사용하였다.

저자들은 모델 $f(X, A)$ 에 직접 graph structure $A$ 를 활용함으로 labeled node에만 의존할 필요가 사라져 기존에 사용되던 graph Laplacian regularization을 사용할 필요가 없어졌다.

H^{(l+1)} = \sigma\left({\tilde{D}}^{-\frac{1}{2}}\tilde{A}{\tilde{D}}^{-\frac{1}{2}}H^{(l)}W^{(l)} \right)

$\tilde{A}=A+I_N$

$H^{(l)} \in {\Bbb R}^{N\times D}$ : the matrix of activations; $H^{(0)}=X$

Mark

Code example of the Laplacian matrix:

import networkx as nx
import numpy as np

# Define a graph using the NetworkX library
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3), (2, 4)])

# Compute the adjacency matrix and degree matrix
A = nx.adjacency_matrix(G).toarray()
# array([[0, 1, 1, 1],  # All 4 nodes are connected.
#        [1, 0, 1, 1],
#        [1, 1, 0, 1],
#        [1, 1, 1, 0]])
D = np.diag(np.sum(A, axis=1))
# array([[3, 0, 0, 0],
#        [0, 3, 0, 0],
#        [0, 0, 3, 0],
#        [0, 0, 0, 3]])

# Compute the symmetric normalized Laplacian matrix
D_sqrt_inv = np.sqrt(np.linalg.inv(D))
# array([[0.5773, 0.    , 0.    , 0.    ],
#        [0.    , 0.5773, 0.    , 0.    ],
#        [0.    , 0.    , 0.5773, 0.    ],
#        [0.    , 0.    , 0.    , 0.5773]])
L_sym = np.eye(len(A)) - D_sqrt_inv @ A @ D_sqrt_inv
# array([[ 1.    , -0.3333, -0.3333, -0.3333],
#        [-0.3333,  1.    , -0.3333, -0.3333],
#        [-0.3333, -0.3333,  1.    , -0.3333],
#        [-0.3333, -0.3333, -0.3333,  1.    ]])

Spectral Graph Convolutions#

The normalized graph Laplacian, $L = I_N - D^{-\frac{1}{2}}AD^{-\frac{1}{2}} = U\varLambda U^\top$

g_\theta \star x = U g_\theta U^\top x

$x, \theta$ : scalar values
$g_\theta = \text{diag}(\theta)$ : a filter; assumed an eigenvalues $\Lambda$ of $L$ , i.e. $g_\theta(\Lambda)$
$U$ : the matrix of eigenvectors of the normalized graph Laplacian
$\varLambda$ : a diagonal matrix of its eigenvalues
$U^\top x$ : the graph Fourier transform of $x$ .

A multiplicatioin with the eigenvector matrix $U$ is ${\cal O}(N^2)$ . Thus, using the eigendecomposition of $L$ for computational efficiency.

Mark

Graph Laplacian#

The graph Laplacian matrix $L$ has several important properties, such as being symmetric, positive semi-definite, and having real non-negative eigenvalues. These properties make it a useful tool for analyzing the structure of the graph and defining the convolutional operation in graph neural networks.

The graph Laplacian is a commonly used matrix in graph neural networks (GNNs).

The graph Laplacian describes the connectivity of the nodes in the graph and can be used to define a notion of smoothness over the graph.

Encodes structural information: The graph Laplacian captures the structural information of the graph, including the connections and weights between nodes, which can be used to define the convolutional operation in a GNN.
Localized operation: The graph Laplacian can be used to define a localized convolutional operation that applies to a node and its neighbors, allowing the GNN to learn features that capture the local structure of the graph.
Efficient computation: The graph Laplacian can be efficiently computed and stored, making it practical to use in large-scale graphs.
- $L = D - A$ ; Adj time is ${\cal O}(|E|)$ . Degree time is ${\cal O}(|V|)$ . Both cases have the linear time complexity. In general, many real-world graphs are sparse.
Spectral analysis: The graph Laplacian can be analyzed using spectral methods, which allow for the decomposition of the graph into eigenvectors and eigenvalues that can be used to define the convolutional operation.

The normalized Laplacian also well worked in irregular graphs.

Chebyshev Polynomials#

To circumvent this problem, it was suggested in Hammond et al. (2011) that $g_\theta(\Lambda)$ can be well-approximated by a truncated expansion in terms of Chebyshev polynomials $T_k(x)$ up to $K^\text{th}$ order:

g_{\theta^{'}}(\Lambda) \approx \sum_{k=0}^{K} \theta_{k}^{'}T_k(\tilde\Lambda)

$\tilde\Lambda = \frac{2}{\lambda_\text{max}}\Lambda - I_N$
$\lambda_\text{max}$ : the largest eigenvalue of $L$
$\theta^{'} \in {\Bbb R}^K$ : a vector of Chebyshev coefficients

g_{\theta^{'}}\star x \approx \sum_{k=0}^{K} \theta_{k}^{'}T_k(\tilde{L})x

$\tilde L = \frac{2}{\lambda_\text{max}} L - I_N$
$\tilde{L}$ can easily be verified by noticing that ${(U\varLambda U^\top)}^k = U\varLambda^k U^\top$ .

This expression is now $K$ -localized since it is a $K^{th}$ -order polynomial in the Laplacian, i.e. it depends only on nodes that are at maximum $K$ steps away from the central node ( $K^{th}$ -order neighborhood). ( ${\cal O(|E|)}$ , the linear time complexity)

Chebyshev polynomials를 활용하여, 이차에서 일차로 시간 복잡도가 감소해 computational efficiency를 유지할 수 있다. 또한 a flexibility of the explicit parameterization 유지할 수 있다.

g_{\theta^{'}}\star x \approx \theta^{'}_0 x + \theta^{'}_1 (L - I_N)x = \theta^{'}_0 x - \theta^{'}_1 D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x

The filter parameters, $\theta^{'}_0, \theta^{'}_1$ , can be shared over the whole graph. Filters effectively convolve the $K^{th}$ -order neighborhood of a node. It can be beneficial to constrain the number of parameters further to address overfitting and to minimize the number of operations (such as matrix multiplications) per layer.

Filter parameters를 하나로 보면, $\theta = \theta^{'}_0 = -\theta^{'}_1$ , 아래와 같다.

g_{\theta^{'}}\star x \approx \theta\left(I_N + D^{-\frac{1}{2}}AD^{-\frac{1}{2}}\right)x

위의 식을 따라 층이 깊어지면 $I_N$ 으로 인해, 일부 node들은 exploding이 발생하고, 그외 node들은 상대적으로 vanishing을 겪을 수 있다. 이것을 개선하고자, 저자들은 Identity matrix도 normalize하는, re-normalization trick을 도입하였다.

Re-normalization Trick#

I_N + D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \rarr \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}

$\tilde{A} = A + I_N$
$\tilde{D} = \sum_j \tilde{A}_{ij}$

Identity matrix도 normalize 함으로, 전체적으로 보다 안정한 structure flow를 가져올 수 있게 된다.

Z = \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}X\Theta

$X \in {\Bbb R}^{N\times C}$ : a signal, $C$ is the dimension of features of a input
$\Theta \in {\Bbb R}^{C\times F}$ : filter parameters, $F$ is the dimension of filter parameters
$Z \in {\Bbb R}^{N\times F}$ : a convolved signal (activation)

This equation’s time complexity is ${\cal O(|E|}FC)$ .

structure data인 adjacency matrix를 활용한다.

If using the re-normalized trick of the GCN paper, as follow:

tilde_A = nx.adjacency_matrix(G).toarray() + np.eye(len(A))  # MAIN DIFF: +I
tilde_D = np.diag(np.sum(tilde_A, axis=1))
tilde_D_sqrt_inv = np.sqrt(np.linalg.inv(tilde_D))
L_renorm_sym = tilde_D_sqrt_inv @ tilde_A @ tilde_D_sqrt_inv  # MAIN DIFF: -I
# array([[0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25]])

Self-loop factors is recuded, so it is possible to avoid the overffiting.

Semi-supervised Node Classification#

간단한 예로, 두개의 GCN layer 구성된 모델이 있다. 해당 모델은 아래의 수식과 같다.

$\hat{A} = \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}$ , 사전에 계산이 되어 입력된다. (w.r.t. author’s code)

Z = f(X, A) = \text{softmax}\left(\hat{A}\ \sigma(\hat{A}XW^{(0)})W^{(1)}\right)

$W^{(0)} \in {\Bbb R}^{C\times H}$ , $W^{(1)} \in {\Bbb R}^{H\times F}$

Appendix#

Moore-Penrose#

The Moore-Penrose inverse, also known as the pseudoinverse, is a generalization of the matrix inverse that can be applied to non-square matrices and singular matrices.

The Moore-Penrose inverse is defined as a unique matrix that satisfies four properties:

$AA^+A=A$
$A^{+}AA^{+}=A^{+}$
${(AA^{+})}^{\top}=AA^{+}$
${(A^{+}A)}^\top=A^{+}A$

Laplacian matrix#

In the context of graph theory, the Laplacian matrix is a square matrix that captures the topology of a graph.

First, it encodes the structure of the graph in a compact, matrix form that can be easily manipulated using linear algebra.
- For example, the Laplacian matrix can be used to compute various graph properties, such as the number of connected components, the size of the largest connected component, and the graph conductance.
Second, the Laplacian matrix can be used as a tool for graph-based data analysis and machine learning.
- For example, it can be used to define graph Laplacian regularization terms in optimization problems, which encourage smoothness of functions defined on the graph. It can also be used to define graph Laplacian eigenmaps, which are low-dimensional embeddings of the nodes in the graph that preserve the graph structure.

The Laplacian matrix is a real and symmetric n x n matrix, and it has n nonnegative eigenvalues, denoted by $\lambda_1, \lambda_2, ..., \lambda_n$ . The eigenvectors of the Laplacian matrix are n linearly independent vectors $v_1, v_2, ..., v_n$ , which satisfy the eigenvalue equation ${\rm L} v_i = λ_i v_i$ for $i = 1, 2, ..., n$ .

Spectrum of a graph#

In mathematics, the spectrum of a graph refers to the set of eigenvalues of its adjacency matrix, Laplacian matrix, or other related matrices. The eigenvalues of these matrices provide information about the structure and properties of the graph, such as its connectivity, number of connected components, and expansion properties.

The spectrum of a graph is a fundamental concept in graph theory and has important applications in various fields, including computer science, physics, and social network analysis. For example, the spectral properties of a graph can be used to analyze its robustness to attacks, the diffusion of information or influence through the network, and the performance of graph-based machine learning algorithms.

Symmetric normalized Laplacian#

It is a symmetric and positive semidefinite matrix, which means that it has real eigenvalues and orthogonal eigenvectors.

The eigenvalues of the symmetric normalized Laplacian matrix can be used to analyze the spectral properties of the graph.
The eigenvectors can be used for dimensionality reduction and clustering.

The degree matrix $D$ is a diagonal matrix with the degree of each node on the diagonal. However, it can be viewed as a special case of the Laplacian matrix, where the adjacency matrix is the identity matrix. In this case, the Laplacian matrix is $L = D - I$ , and the eigenvalues and eigenvectors of $L$ coincide with those of D.

The adjacency matrix $A$ is a matrix that encodes the topology of the graph.

$L_\text{sym} := {(D^{+})}^{1/2}L{(D^{+})}^{1/2}=I−{(D^{+})}^{1/2}A{(D^{+})}^{1/2}$

L^\text{sym}_{i,j} := \begin{cases} 1 &\text{if }i=j \text{ and} \deg(v_i) \neq 0\\ -\frac{1}{\sqrt{\deg(v_i)\deg(v_j)}} &\text{if }i\neq j \text{ and } v_i \text{ is adjacenct to }v_j\\ 0 &\text{otherwise.} \end{cases}

$AB = I_n = BA$

previous RL - Overview
next OpenAI