Table of Contents

Project: High Dimensional Relational Learning

This page proposes material (pdf, code and data) related to the paper “A latent factor model for highly multi-relational data” presented by R. Jenatton et al. at NIPS 2012 [2].

Abstract

Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations.

Papers

Code

The Matlab code used to run the experiments in [2] is available for download: (code).
See the included README and comments in the code for details. The code does not require any additional library .

Data

Multi-relational benchmarks

The datasets Kinships [1], UMLS [3] are Nations [4] are included in the code archive in mat format.

Subject-Verb-Object Tensor Data

The new NLP data introduced in [2] is available for download (ASCII format): (svo-data).
See the paper and the included README for details.

Contacts

Rodolphe Jenatton: CMAP, UMR CNRS 7641, Ecole Polytechnique, France.
Nicolas Le Roux: Criteo, France.
Antoine Bordes: Heudiasyc, UMR CNRS 7253, Université de Technologie de Compiègne, France.
Guillaume Obozinski: INRIA - SIERRA Project Team, Ecole Normale Supérieure, France.

Note: This work was done while Nicolas Le Roux was with INRIA - SIERRA Project Team, Ecole Normale Supérieure, France.

References

[1] W. Denham. The detection of patterns in Alyawarra nonverbal behavior. PhD thesis, 1973.
[2] R. Jenatton, N. Le Roux, A. Bordes and G. Obozinski. A Latent Factor Model for Highly Multi-relational Data. Advances in Neural Information Processing Systems 25, 2012.
[3] A. T. McCray. An upper level ontology for the biomedical domain. Comparative and Functional Genomics, 4:80–88, 2003.
[4] R. J. Rummel. Dimensionality of nations project: Attributes of nations and behavior of nation dyads. In ICPSR data file, pages 1950–1965. 1999.