Abstract

Knowledge graph representation learning (KGRL) aims to convert entities and relations into feature vectors, thereby facilitating their effective integration into contemporary AI models. This half-day tutorial explores the ongoing research avenues and key aspects being investigated in the latest KGRL research. In particular, this tutorial provides a review of seminal works in KGRL from four different perspectives: (1) KGRL methods incorporating multimodal data in generating knowledge representations, (2) inductive KGRL methods that allow inference on new KGs without retraining KGRL models, (3) KG foundation models that pretrain a KGRL model using various KGs and apply it to different KGs, and (4) representation learning methods on hyper-relational KGs that extend the vanilla KGs to represent enriched information. Along with four lecturing sessions, two hands-on exercise sessions will also be provided, where audiences can run the KGRL methods and analyze the results. As long as they have a basic background in machine learning and non-trivial programming skills, all the materials will be easy to follow.

Presenter

Joyce Jiyoung Whang

Joyce Jiyoung Whang is an associate professor at the School of Computing at KAIST, where she has led the Big Data Intelligence Lab since Jul. 2020. Before joining KAIST, she was an assistant professor of Computer Science and Engineering at Sungkyunkwan University (SKKU) from Mar. 2016 to Jun. 2020. She received her Ph.D. degree in Computer Science from the University of Texas at Austin in Dec. 2015 under the supervision of Professor Inderjit Dhillon.

Schedule

Time Slot	Tutorial Time	Program
9:00 - 10:30	9:00-9:10	Opening & Introduction to Knowledge Graphs
	9:10-9:40	[Lecture 1] KG Embedding with Multimodal Data
	9:40-10:10	[Lecture 2] Inductive Reasoning on KGs
	10:10-10:30	[Exercise 1] Hands-on Practice of Inductive KGRL
10:30-11:00	Break Time
11:00-12:30	11:00-11:30	[Lecture 3] KG Foundation Models
	11:30-12:00	[Lecture 4] Representation Learning on HKGs
	12:00-12:20	[Exercise 2] Hands-on Practice of HKGRL
	12:20-12:30	Discussion & Closing

Program

Lecture 1: Knowledge Graph Embedding with Multimodal Data

While traditional KG embedding methods assume that only triplets are provided and these are the only sources of computing KG embedding vectors, entities or relations can accompany their textual descriptions. Additionally, entities can be either discrete or numeric in real-world KGs, and sometimes entities or triplets can be associated with visual features. This session discusses multimodal KG embedding methods that integrate various visual, textual, or numerical data into KG embedding vectors, enabling the embedding vectors to encode not only the structural information of KGs but also the additional multimodal features.

Lecture 2: Inductive Reasoning on Knowledge Graphs

Inductive KGRL methods allow a model to compute representations on a new inference KG different from a training KG, enabling the prediction of missing triplets on the inference KG consisting of new entities and relations appearing at inference time. This is distinguished from a transductive learning setting, where the entity and relation representation vectors are learned based on a fixed KG, and only new combinations of them are considered for KG completion at inference time. This session discusses seminal works on inductive KGRL methods, with a particular emphasis on the core ideas and mechanisms they propose for enabling inductive reasoning over KGs.

Exercise 1: Hands-on Practice of an Inductive KGRL Method

This session provides the source codes and scripts for running an inductive KGRL method on Google Colab. Additionally, participants will be able to examine the results both quantitatively and qualitatively by computing MRR or Hit@K scores and visualizing a subset of the results. All the required codes for this analysis, along with step-by-step instructions, will be provided.

Lecture 3: Foundation Models for Knowledge Graph Reasoning

The inductive learning paradigm on KGRL leads to its extension to foundation models. While inductive KGRL methods and KG foundation models share similar objectives in generalizing their abilities to unseen KGs, KG foundation models further extend these capabilities to arbitrary KGs. KG foundation models are pre-trained on diverse KGs and accommodate the varying underlying distributions across different KGs, allowing the model to be generalized to any KG. This session will cover recently proposed KG foundation models in the literature.

Lecture 4: Representation Learning on Hyper-Relational KGs

Although KGs represent facts as triplets, this format often oversimplifies complex information, motivating the extension to hyper-relational KGs (HKGs). In an HKG, a triplet is extended to a hyper-relational fact, where a set of qualifiers is attached to a triplet to represent auxiliary information, and a qualifier is a pair of a relation and an entity. HKGs allow for a more flexible representation of facts, with varying qualifiers as needed, to enrich information. On the other hand, learning representations of entities and relations on HKGs becomes more challenging due to the complicated structure between triplets and qualifiers, as well as the entity-level, relation-level, and fact-level connectivities. This session introduces the state-of-the-art HKG representation learning methods.

Exercise 2: Hands-on Practice of a Hyper-Relational KGRL Method

Coding materials will be provided to run and examine a recently proposed HKG method, allowing participants to conduct simple case studies of the method, such as analyzing the model’s prediction results for specific link prediction problems. Exercise problems include both transductive and inductive inference on HKGs. All the experiments will be conducted in Google Colab.

Prerequisites

This tutorial’s type is introductory to specialized; it provides foundational concepts of KGRL while also covering specialized topics, including recent research directions such as multimodal, inductive, foundation, and hyper-relational methods. The level is beginner to intermediate, designed for participants with a basic background in machine learning.

The content is suitable for participants who have a basic understanding of machine learning concepts and programming skills, but no prior experience with KGs or representation learning is required. Audiences who wish to participate in hands-on exercises should prepare their laptops.