Graphs - KG Embedding

Knowledge Graph Embedding

This post based on 10. Knowledge Graph Embeddings of CS224W

Knowledge Graphs#

Knowledge in graph form:

  • Capture entities, types, and relationships
  • Nodes are entities
  • Nodes are labeled with their types
  • Edges between two nodes capture relationships between entities

Knowledge Graph (KG) is an example of a heterogeneous graph

  • Bibliographic Networks
    • Node types: paper, title, author, conference, year
    • Relation types: pubWhere, pubYear, hasTitle, hasAuthor, cite
  • Bio Knowledge Graphs
    • Node types: drug, disease, adverse event, protein, pathways
    • Relation types: has_func, causes, assoc, treats, is_a

Examples of knowledge graphs:

  • Google Knowledge Graph
  • Amazon Product Graph
  • Facebook Graph API
  • IBM Watson
  • Microsoft Satori
  • Project Hanover/Literome
  • LinkedIn Knowledge Graph
  • Yandex Object Answer

Building a Knowledge Graph-based Dialogue System at the 2nd ConvAI Summer School

Datasets#

Publicly available KGs:

  • FreeBase, Wikidata, Dbpedia, YAGO, NELL, etc.

Common characteristics:

  • Massive: Millions of nodes and edges;

  • Incomplete: Many true edges are missing

enumerating all the possible facts is intractable!


Given an enormous KG, can we complete the KG?

  • For a given (head, relation), we predict missing tails.
    • (Note this is slightly different from link prediction task)

KG Representation#

Edges in KG are represented as triples (h,r,t)(h, r, t)

  • head (hh) has relation (rr) with tail (tt)

Key Idea:

  • Model entities and relations in the embedding/vector space Rd{\Bbb R}^d.
    • Associate entities and relations with shallow embeddings
    • Note we do not learn a GNN here!
  • Given a true triple (h,r,t)(h, r, t), the goal is that the embedding of (h,r)(h, r) should be close to the embedding of tt.
    • How to embed (h,r)(h, r)?
    • How to define closeness?

We are going to learn about different KG embedding models (shallow/transductive embs):

  • Different models are…
    • …based on different geometric intuitions
    • …capture different types of relations (have different expressivity)

Relations:#

  • Symmetric Relation
  • Antisymmetric Relation
  • Inverse Relation
  • Composition (Transitive) Relation
  • 1-to-N Relation

TransE#

https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html

Translation Intuition:

For a triple (h,r,t)(h, r, t), h,r,tRd\bf{h, r, t} \in {\Bbb R}^d, h+rt\bf{h + r \approx t} if the given fact is true else h+rt\bf{h + r \neq t}.

  • Scoring function: fr(h,t)=h+rtf_r (h,t) = -\| \bf{h + r - t} \|

Relations in a heterogeneous KG have different properties:

Example:

  • Symmetry: If the edge (hh, “Roommate”, tt) exists in KG, then the edge (tt, “Roommate”, hh) should also exist.
  • Inverse relation: If the edge (hh, “Advisor”, tt) exists in KG, then the edge tt, “Advisee”, hh should also exist.

Can we categorize these relation patterns?

Are KG embedding methods (e.g., TransE) expressive enough to model these patterns?

Antisymmetric Relations:#

r(h,t)¬r(t,h), h,tr(h,t) \rArr \neg r(t,h),\ \forall h,t

  • Antisymmetric: Hypernym
    • h+r=t\bf{h + r = t}, but t+rh\bf{t + r \neq h}

Inverse Relations:#

r2(h,t)r1(t,h)r_2 (h,t) \rArr r_1 (t,h)

  • Example : (Advisor, Advisee)
    • h+r2=t\bf{h + r_{\rm 2} = t}, we can set r1=r2\bf{r}_1 = -\bf{r}_2

Composition (Transitive) Relations:#

r1(x,y)r2(y,z)r3(x,z), x,y,zr_1 (x,y) \land r_2 (y,z) \rArr r_3 (x, z),\ \forall x, y, z

  • Example: My mother’s husband is my father.
    • r3=r1+r2\bf{r}_3 = \bf{r}_1 + \bf{r}_2

1-to-N relations:#

r(h,t1),r(h,t2),,r(h,tn)r(h, t_1), r(h, t_2), …, r(h, t_n) are all True.

  • Example: (h,r,t1)(h, r, t_1) and (h,r,t2)(h, r, t_2) both exist in the knowledge graph, e.g., rr is “StudentsOf”
    • TransE cannot model 1-to-N relations; t1\bf{t}_1 and t2\bf{t}_2 will map to the same vector, although they are different entities
    • t1=h+r=t2\bf{t}_1 = \bf{h + r} = \bf{t}_2, contradictory t1t2\bf{t}_1 \neq \bf{t}_2

Symmetric relations#

r(h,t)r(t,h), h,tr(h,t) \rArr r(t, h),\ \forall h,t

  • Example: Family, Roommate
    • TransE cannot model symmetric relations
    • only if r=0,h=t\bf{r} = 0, \bf{h = t}

TransR#

Learning Entity and Relation Embeddings for Knowledge Graph Completion

  • TransE models translation of any relation in the same embedding space.
  • Can we design a new space for each relation and do translation in relation-specific space?

TransR: model entities as vectors in the entity space ℝ𝑑 and model each relation as vector in relation space rRk{\bf r} \in \Bbb{R}^k with MrRk×d{\bf M}_r \in {\Bbb R}^{k\times d} as the projection matrix.

h=Mrh{\bf h}_\perp = {\bf M}_r {\bf h}, t=Mrt{\bf t}_\perp = {\bf M}_r {\bf t}

Score function: fr(h,t)=h+rtf_r (h,t) = -\| \bf{h_\perp + r - t_\perp} \|

Note different symmetric relations may have different Mr{\bf M}_r.

Symmetric relations:#

r(h,t)r(t,h), h,tr(h,t) \rArr r(t, h),\ \forall h,t

  • Example: Family, Roommate
  • TransR can model symmetric relations
    • r=0, h=Mrh=Mrt=t{\bf r}=0,\ {\bf h}_\perp = {\bf M}_r {\bf h} = {\bf M}_r {\bf t} = {\bf t}_\perp

Antisymmetric relations:#

  • TransR can model antisymmetric relations
    • r0, Mrh+r=Mrt{\bf r} \neq 0,\ {\bf M}_r {\bf h} + {\bf r} = {\bf M}_r {\bf t}, Then Mrt+rMrh{\bf M}_r {\bf t} + {\bf r} \neq {\bf M}_r {\bf h}

1-to-N Relations:#

r(h,t1),r(h,t2),,r(h,tn)r(h, t_1), r(h, t_2), …, r(h, t_n) are all True.

  • TransR can model 1-to-N relations
    • We can learn Mr{\bf M}_r so that t=Mrt1=Mrt2{\bf t}_\perp = {\bf M}_r {\bf t}_1 = {\bf M}_r {\bf t}_2

Inverse Relations:#

  • TransR can model inverse relations

    r2=r1{\bf r}_2 = -{\bf r}_1, Mr1=Mr2{\bf M}_{r_1} = {\bf M}_{r_2}. Then Mr1t+r1=Mr1h{\bf M}_{r_1} {\bf t} + {\bf r}_1 = {\bf M}_{r_1} {\bf h} and Mr2h+r2=Mr2t{\bf M}_{r_2} {\bf h} + {\bf r}_2 = {\bf M}_{r_2} {\bf t}

Composition Relations:#

  • TransR can model composition relations

    High-level intuition: TransR models a triple with linear functions, they are chainable.

    • Background:
      Kernel space of a matrix M{\bf M}: hKer(M){\bf h} \in \text{Ker}({\bf M}), then Mh=0{\bf Mh = 0}.

Assume Mraga=ra{\bf M}_{r_a} {\bf g}_a = {\bf r}_a and Mrbgb=rb{\bf M}_{r_b} {\bf g}_b = {\bf r}_b

  • For r1(x,y)r_1(x, y):

    Mr1x+r1=Mr1yyxg1+Ker(Mr1)yx+g1+Ker(Mr1)  \begin{aligned}
      &{\bf M}_{r_1} {\bf x} + {\bf r}_1 = {\bf M}_{r_1} {\bf y} \rarr\\
      &{\bf y - x} \in {\bf g}_1 + \text{Ker}({\bf M}_{r_1}) \rarr\\
      &{\bf y} \in {\bf x} + {\bf g}_1 + \text{Ker}({\bf M}_{r_1})
      \end{aligned}
  • For r2(y,z)r_2(y, z):

    Mr2y+r2=Mr2zzyg2+Ker(Mr2)zy+g2+Ker(Mr2)  \begin{aligned}
      &{\bf M}_{r_2} {\bf y} + {\bf r}_2 = {\bf M}_{r_2} {\bf z} \rarr\\
      &{\bf z - y} \in {\bf g}_2 + \text{Ker}({\bf M}_{r_2}) \rarr\\
      &{\bf z} \in {\bf y} + {\bf g}_2 + \text{Ker}({\bf M}_{r_2})
      \end{aligned}
  • Then,

    zx+g1+g2+Ker(Mr1)+Ker(Mr2){\bf z} \in {\bf x} + {\bf g}_1 + {\bf g}_2+ \text{Ker}({\bf M}_{r_1}) + \text{Ker}({\bf M}_{r_2})
  • Construct Mr3{\bf M}_{r_3}, s.t. Ker(Mr3)=Ker(Mr1)+Ker(Mr2)\text{Ker}({\bf M}_{r_3}) = \text{Ker}({\bf M}_{r_1}) + \text{Ker}({\bf M}_{r_2})

Since,
- dim(Ker(Mr3))dim(Ker(Mr1))\text{dim}\left(\text{Ker}({\bf M}_{r_3})\right) \ge \text{dim}\left(\text{Ker}({\bf M}_{r_1})\right)
- Mr3{\bf M}_{r_3} has the same shape as Mr1{\bf M}_{r_1}

We know Mr3{\bf M}_{r_3} exists!

Set r3=Mr3(g1+g2){\bf r}_3 = {\bf M}_{r_3}({\bf g}_1 + {\bf g}_2)

We are done! We have Mr3x+r3=Mr3z{\bf M}_{r_3} {\bf x} + {\bf r}_3 = {\bf M}_{r_3} {\bf z}


DistMult#

Embedding Entities and Relations for Learning and Inference in Knowledge Bases

https://arxiv.org/abs/1412.6575

The scoring function fr(h,t)f_r(h, t) is negative of L1/L2 distance in TransE and TransR.

Entities and relations using vectors in Rk{\Bbb R}^k

  • Score function: fr(h,t)=<h,r,t>=ihiriti,h,r,tRkf_r(h, t) = < {\bf h}, {\bf r}, {\bf t} > = \sum_i {\bf h}_i \cdot {\bf r}_i \cdot {\bf t}_i,\quad {\bf h}, {\bf r}, {\bf t} \in {\Bbb R}^k

Bilinear modeling

Intuition of the score function: Can be viewed as a cosine similarity between hr{\bf h} \cdot {\bf r} and t{\bf t}

1-to-N Relations:#

r(h,t1),r(h,t2),,r(h,tn)r(h, t_1), r(h, t_2), …, r(h, t_n) are all True.

<h,r,t1>=<h,r,t2>< {\bf h}, {\bf r}, {\bf t}_1 > = < {\bf h}, {\bf r}, {\bf t}_2 >

Symmetric Relations:#

r(h,t)r(t,h), h,tr(h,t) \rArr r(t, h),\ \forall h,t

fr(h,t)=<h,r,t>=ihiriti=<t,r,h>=fr(t,h)f_r(h, t) = < {\bf h}, {\bf r}, {\bf t} > = \sum_i {\bf h}_i \cdot {\bf r}_i \cdot {\bf t}_i = < {\bf t}, {\bf r}, {\bf h} > = f_r(t, h)

Then, r(h,t)r(h, t) and r(t,h)r(t, h) always have same score! → Cannot possible Antisymmetric relations

Also, Inverse relations not possible.

fr2(h,t)=<h,r2,t>=<t,r1,h>=fr1(t,h)f_{r_2}(h, t) = < {\bf h}, {\bf r}_2, {\bf t} > = < {\bf t}, {\bf r}_1, {\bf h} > = f_{r_1}(t, h)

  • This mean r2=r1{r_2} = {r_1}.
    • But semantically this does not make sense: The embedding of “Advisor” should not be the same with “Advisee”.

Composition relations:#

r1(x,y)r2(y,z)r3(x,z), x,y,zr_1 (x,y) \land r_2 (y,z) \rArr r_3 (x, z),\ \forall x, y, z

ComplEx#

Trouillon et al, Complex Embeddings for Simple Link Prediction, ICML 2016

Based on Distmult, ComplEx embeds entities and relations in Complex vector space

ComplEx: model entities and relations using vectors in Ck{\Bbb C}^k

Imaginary: 허수

  • Score function: fr(h,t)=Re(ihiritˉi)f_r(h, t) = \text{Re}(\sum_i {\bf h}_i \cdot {\bf r}_i \cdot {\bf \bar{t}}_i)

Antisymmetric Relations:#

  • The model is expressive enough to learn
    • High, fr(h,t)=Re(ihiritˉi)f_r(h, t) = \text{Re}(\sum_i {\bf h}_i \cdot {\bf r}_i \cdot {\bf \bar{t}}_i)
    • Low, fr(t,h)=Re(itirihˉi)f_r(t, h) = \text{Re}(\sum_i {\bf t}_i \cdot {\bf r}_i \cdot {\bf \bar{h}}_i)

Due to the asymmetric modeling using complex conjugate.

Symmetric Relations:#

r(h,t)r(t,h), h,tr(h,t) \rArr r(t, h),\ \forall h,t

When Im(r)=0\text{Im}({\bf r}) = 0, we have

fr(h,t)=Re(ihiritˉi)=iRe(rihitˉi)=iriRe(hitˉi)=iriRe(hˉiti)=iRe(rihˉiti)=fr(t,h)\begin{aligned}
f_r(h, t) &= \text{Re}(\sum_i {\bf h}_i \cdot {\bf r}_i \cdot {\bf \bar{t}}_i)\\
&= \sum_i \text{Re}({\bf r}_i \cdot {\bf h}_i \cdot {\bf \bar{t}}_i) \\
&= \sum_i {\bf r}_i \cdot \text{Re}( \cdot {\bf h}_i \cdot {\bf \bar{t}}_i) \\
&= \sum_i {\bf r}_i \cdot \text{Re}( \cdot {\bf \bar{h}}_i \cdot {\bf t}_i) \\
&= \sum_i \text{Re}( {\bf r}_i \cdot {\bf \bar{h}}_i \cdot {\bf t}_i) = f_r(t, h)
\end{aligned}

Inverse Relations:#

  • r=rˉ2{\bf r} = {\bf \bar{r}}_2
  • r2=arg maxrRe(<h,r,tˉ>)=arg maxrRe(<t,r,hˉ>)=r1{\bf r}_2 = \argmax\limits_r \text{Re}(<{\bf h}, {\bf r}, {\bf \bar{t}}>) = \argmax\limits_r \text{Re}(<{\bf t}, {\bf r}, {\bf \bar{h}}>) = {\bf r}_1

ComplEx share the same property with DistMult

  • Cannot possible Composition relations
  • Can possible 1-to-N relations

KG Embeddings in Practice#

  1. Different KGs may have drastically different relation patterns!

  2. There is not a general embedding that works for all KGs, use the table to select models

  3. Try TransE for a quick run if the target KG does not have much symmetric relations

  4. Then use more expressive models, e.g., ComplEx, RotatE (TransE in Complex space)