GCN

Graph Convolutional Networks

Graph Convolutional Networks#

Summary#

  1. Adjacency matrix를 input으로 활용하여 labeled and unlabeled data 모두에서 학습을 가능하게 하였다.
    • “the model learn smooth representations of the unlabeled data”
  2. Spectral graph convolutions; chebyshev polynomials를 단순화 하면서 효율성과 표현력을 유지하려고 하였다.
  3. Re-normalize trick을 이용함으로 stability를 높이고자 하였다.

Notation#

f()f(\cdot): a neural network like differentiable function
λ\lambda: a weight decay coefficient
XX: a matrix of node feature vectors
Δ=DA\Delta = D - A: the unnormalized graph Laplacian of an undirected graph.
G=(V,E)\cal G = (V, E): undirected graph
viNv_i \in \cal N: Nodes
(vi,vj)E(v_i, v_j) \in \cal E: edges
ARN×NA \in {\Bbb R}^{N \times N}: an adjacency matrix (binary or weighted)
Dii=jAijD_{ii} = \sum_j A_{ij}: a degree matrix

Introduction#

기존의 연구들은 ground-truth에 의존하며, 연결된 노드들이 같은 label을 공유하고 있을거라 가정한다. label이 있는 node에 overfit 되기 쉬웠다. overfit을 줄이고자 graph Laplacian regularization을 사용하였다.

저자들은 모델 f(X,A)f(X, A)에 직접 graph structure AA를 활용함으로 labeled node에만 의존할 필요가 사라져 기존에 사용되던 graph Laplacian regularization을 사용할 필요가 없어졌다.

H(l+1)=σ(D~12A~D~12H(l)W(l))H^{(l+1)} = \sigma\left({\tilde{D}}^{-\frac{1}{2}}\tilde{A}{\tilde{D}}^{-\frac{1}{2}}H^{(l)}W^{(l)} \right)

A~=A+IN\tilde{A}=A+I_N

H(l)RN×DH^{(l)} \in {\Bbb R}^{N\times D}: the matrix of activations; H(0)=XH^{(0)}=X


Spectral Graph Convolutions#

The normalized graph Laplacian, L=IND12AD12=UΛUL = I_N - D^{-\frac{1}{2}}AD^{-\frac{1}{2}} = U\varLambda U^\top

gθx=UgθUxg_\theta \star x = U g_\theta U^\top x

x,θx, \theta: scalar values
gθ=diag(θ)g_\theta = \text{diag}(\theta): a filter; assumed an eigenvalues Λ\Lambda of LL, i.e. gθ(Λ)g_\theta(\Lambda)
UU: the matrix of eigenvectors of the normalized graph Laplacian
Λ\varLambda: a diagonal matrix of its eigenvalues
UxU^\top x: the graph Fourier transform of xx.

A multiplicatioin with the eigenvector matrix UU is O(N2){\cal O}(N^2). Thus, using the eigendecomposition of LL for computational efficiency.

The normalized Laplacian also well worked in irregular graphs.

Chebyshev Polynomials#

To circumvent this problem, it was suggested in Hammond et al. (2011) that gθ(Λ)g_\theta(\Lambda) can be well-approximated by a truncated expansion in terms of Chebyshev polynomials Tk(x)T_k(x) up to KthK^\text{th} order:

gθ(Λ)k=0KθkTk(Λ~)g_{\theta^{'}}(\Lambda) \approx \sum_{k=0}^{K} \theta_{k}^{'}T_k(\tilde\Lambda)

Λ~=2λmaxΛIN\tilde\Lambda = \frac{2}{\lambda_\text{max}}\Lambda - I_N
λmax\lambda_\text{max}: the largest eigenvalue of LL
θRK\theta^{'} \in {\Bbb R}^K: a vector of Chebyshev coefficients

gθxk=0KθkTk(L~)xg_{\theta^{'}}\star x \approx \sum_{k=0}^{K} \theta_{k}^{'}T_k(\tilde{L})x

L~=2λmaxLIN\tilde L = \frac{2}{\lambda_\text{max}} L - I_N
L~\tilde{L} can easily be verified by noticing that (UΛU)k=UΛkU{(U\varLambda U^\top)}^k = U\varLambda^k U^\top.

This expression is now KK-localized since it is a KthK^{th}-order polynomial in the Laplacian, i.e. it depends only on nodes that are at maximum KK steps away from the central node (KthK^{th}-order neighborhood). (O(E){\cal O(|E|)}, the linear time complexity)

Chebyshev polynomials를 활용하여, 이차에서 일차로 시간 복잡도가 감소해 computational efficiency를 유지할 수 있다. 또한 a flexibility of the explicit parameterization 유지할 수 있다.



gθxθ0x+θ1(LIN)x=θ0xθ1D12AD12xg_{\theta^{'}}\star x \approx \theta^{'}_0 x + \theta^{'}_1 (L - I_N)x = \theta^{'}_0 x - \theta^{'}_1 D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x

The filter parameters, θ0,θ1\theta^{'}_0, \theta^{'}_1, can be shared over the whole graph. Filters effectively convolve the KthK^{th}-order neighborhood of a node. It can be beneficial to constrain the number of parameters further to address overfitting and to minimize the number of operations (such as matrix multiplications) per layer.

Filter parameters를 하나로 보면, θ=θ0=θ1\theta = \theta^{'}_0 = -\theta^{'}_1, 아래와 같다.

gθxθ(IN+D12AD12)xg_{\theta^{'}}\star x \approx \theta\left(I_N + D^{-\frac{1}{2}}AD^{-\frac{1}{2}}\right)x

위의 식을 따라 층이 깊어지면 INI_N으로 인해, 일부 node들은 exploding이 발생하고, 그외 node들은 상대적으로 vanishing을 겪을 수 있다. 이것을 개선하고자, 저자들은 Identity matrix도 normalize하는, re-normalization trick을 도입하였다.

Re-normalization Trick#

IN+D12AD12D~12A~D~12I_N + D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \rarr \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}

A~=A+IN\tilde{A} = A + I_N
D~=jA~ij\tilde{D} = \sum_j \tilde{A}_{ij}

Identity matrix도 normalize 함으로, 전체적으로 보다 안정한 structure flow를 가져올 수 있게 된다.

Z=D~12A~D~12XΘZ = \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}X\Theta

XRN×CX \in {\Bbb R}^{N\times C}: a signal, CC is the dimension of features of a input
ΘRC×F\Theta \in {\Bbb R}^{C\times F}: filter parameters, FF is the dimension of filter parameters
ZRN×FZ \in {\Bbb R}^{N\times F}: a convolved signal (activation)

This equation’s time complexity is O(EFC){\cal O(|E|}FC).

structure data인 adjacency matrix를 활용한다.

  • If using the re-normalized trick of the GCN paper, as follow:
tilde_A = nx.adjacency_matrix(G).toarray() + np.eye(len(A))  # MAIN DIFF: +I
tilde_D = np.diag(np.sum(tilde_A, axis=1))
tilde_D_sqrt_inv = np.sqrt(np.linalg.inv(tilde_D))
L_renorm_sym = tilde_D_sqrt_inv @ tilde_A @ tilde_D_sqrt_inv  # MAIN DIFF: -I
# array([[0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25],
#        [0.25, 0.25, 0.25, 0.25]])

Self-loop factors is recuded, so it is possible to avoid the overffiting.

Semi-supervised Node Classification#

간단한 예로, 두개의 GCN layer 구성된 모델이 있다. 해당 모델은 아래의 수식과 같다.

A^=D~12A~D~12\hat{A} = \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}, 사전에 계산이 되어 입력된다. (w.r.t. author’s code)

Z=f(X,A)=softmax(A^ σ(A^XW(0))W(1))Z = f(X, A) = \text{softmax}\left(\hat{A}\ \sigma(\hat{A}XW^{(0)})W^{(1)}\right)

W(0)RC×HW^{(0)} \in {\Bbb R}^{C\times H}, W(1)RH×FW^{(1)} \in {\Bbb R}^{H\times F}


Appendix#

Moore-Penrose#

The Moore-Penrose inverse, also known as the pseudoinverse, is a generalization of the matrix inverse that can be applied to non-square matrices and singular matrices.

The Moore-Penrose inverse is defined as a unique matrix that satisfies four properties:

  1. AA+A=AAA^+A=A
  2. A+AA+=A+A^{+}AA^{+}=A^{+}
  3. (AA+)=AA+{(AA^{+})}^{\top}=AA^{+}
  4. (A+A)=A+A{(A^{+}A)}^\top=A^{+}A

Laplacian matrix#

In the context of graph theory, the Laplacian matrix is a square matrix that captures the topology of a graph.

  1. First, it encodes the structure of the graph in a compact, matrix form that can be easily manipulated using linear algebra.
    • For example, the Laplacian matrix can be used to compute various graph properties, such as the number of connected components, the size of the largest connected component, and the graph conductance.
  2. Second, the Laplacian matrix can be used as a tool for graph-based data analysis and machine learning.
    • For example, it can be used to define graph Laplacian regularization terms in optimization problems, which encourage smoothness of functions defined on the graph. It can also be used to define graph Laplacian eigenmaps, which are low-dimensional embeddings of the nodes in the graph that preserve the graph structure.

The Laplacian matrix is a real and symmetric n x n matrix, and it has n nonnegative eigenvalues, denoted by λ1,λ2,...,λn\lambda_1, \lambda_2, ..., \lambda_n. The eigenvectors of the Laplacian matrix are n linearly independent vectors v1,v2,...,vnv_1, v_2, ..., v_n, which satisfy the eigenvalue equation Lvi=λivi{\rm L} v_i = λ_i v_i for i=1,2,...,ni = 1, 2, ..., n.

Spectrum of a graph#

In mathematics, the spectrum of a graph refers to the set of eigenvalues of its adjacency matrix, Laplacian matrix, or other related matrices. The eigenvalues of these matrices provide information about the structure and properties of the graph, such as its connectivity, number of connected components, and expansion properties.

  • The spectrum of a graph is a fundamental concept in graph theory and has important applications in various fields, including computer science, physics, and social network analysis. For example, the spectral properties of a graph can be used to analyze its robustness to attacks, the diffusion of information or influence through the network, and the performance of graph-based machine learning algorithms.

Symmetric normalized Laplacian#

It is a symmetric and positive semidefinite matrix, which means that it has real eigenvalues and orthogonal eigenvectors.

  • The eigenvalues of the symmetric normalized Laplacian matrix can be used to analyze the spectral properties of the graph.
  • The eigenvectors can be used for dimensionality reduction and clustering.

The degree matrix DD is a diagonal matrix with the degree of each node on the diagonal. However, it can be viewed as a special case of the Laplacian matrix, where the adjacency matrix is the identity matrix. In this case, the Laplacian matrix is L=DIL = D - I, and the eigenvalues and eigenvectors of LL coincide with those of D.

The adjacency matrix AA is a matrix that encodes the topology of the graph.

Lsym:=(D+)1/2L(D+)1/2=I(D+)1/2A(D+)1/2L_\text{sym} := {(D^{+})}^{1/2}L{(D^{+})}^{1/2}=I−{(D^{+})}^{1/2}A{(D^{+})}^{1/2}

Li,jsym:={1if i=j anddeg(vi)01deg(vi)deg(vj)if ij and vi is adjacenct to vj0otherwise.L^\text{sym}_{i,j} := \begin{cases}
1 &\text{if }i=j \text{ and} \deg(v_i) \neq 0\\
-\frac{1}{\sqrt{\deg(v_i)\deg(v_j)}} &\text{if }i\neq j \text{ and } v_i \text{ is adjacenct to }v_j\\
0 &\text{otherwise.}
\end{cases}

AB=In=BAAB = I_n = BA