Graphs - Theory

If each step of GNN’s aggregation can fully retain the neighboring information, the generated node embeddings can distinguish different rooted subtrees.

In other words, most expressive GNN would use an injective neighbor aggregation function at each step.

Computational graph = Rooted subtree

Key observation: Expressive power of GNNs can be characterized by that of neighbor aggregation functions they use.

A more expressive aggregation function leads to a more expressive a GNN.

Injectivly or Surjectivly by ChatGPT

The mapping from node features to node embeddings is neither injective nor surjective in general, as the same node features can map to different embeddings, and different node features can map to the same embedding.

An injective function is a mathematical function that maps unique inputs to unique outputs, meaning that if the function maps two different inputs to the same output, then the inputs must be equal.

Injective aggregation functions can ensure that the graph structure is preserved in the embedding space, as two nodes with different neighbors will have different embeddings even if they have the same node features. This property can be useful in certain applications, such as graph classification, where the goal is to predict the class label of a graph based on its structure.

Neighbor Aggregation#

GCN (mean-pool), Kipf and Welling, ICLR 2017

Element-wise mean pooling + Linear + ReLU non-linearity
- GCN’s aggregation function cannot distinguish different multi-sets with the same color proportion. - Theorem [Xu et al. ICLR 2019] → not injective
GraphSAGE (max-pool), Hamilton et al., NeurIPS 2017

MLP + element-wise max-pooling
- GraphSAGE’s aggregation function cannot distinguish different multi-sets with the same set of distinct colors. - Theorem [Xu et al. ICLR 2019] → not injective

Graph Isomorphism Network (GIN)#

Proof Intuition#

$\Phi\left(\sum_{x\in S} f(x) \right)$

$\Phi$ and $f$ are some non-linear functions.

$f$ produces one-hot encodings of colors. Summation of the one-hot encodings retains all the information about the input multi-set.

Universal Approximation Theorem [Hornik et al., 1989]

1-hidden-layer MLP with sufficiently-large hidden dimensionality and appropriate non-linearity $\sigma(\cdot)$ (including ReLU and sigmoid) can approximate any continuous function to an arbitrary accuracy.

\text{\small MLP}_\Phi\left(\sum_{x\in S} \text{\small MLP}_f (x)\right)

the full model of GIN by relating it to WL graph kernel (traditional way of obtaining graph-level features).

Any injective function over the tuple, $(c^{(k)}(v), \{c^{(k)}(u)\}_{u\in N(v)})$

can be modeled as

\text{\small MLP}_\Phi\left((1+\epsilon)\cdot \text{\small MLP}_f (c^{(k)}(v)) + \sum_{u\in N(v)} \text{\small MLP}_f (c^{(k)}(u)) \right)

$\epsilon$ is a learnable scalar.

previous Graphs - Aug. and Training
next Graphs - KG Embedding

Graphs - Theory

Contents

How Expressive are GNNs?#

Expressive Power of GNNs#

Local Neighborhood Structures#

Neighbor Aggregation#

Graph Isomorphism Network (GIN)#

Proof Intuition#