UMR CNRS 7253

Everest
Everest
Everest
Everest
Everest

Site Tools


en:smemlj12

Project: Semantic Matching Energy Function

This page proposes material (pdf, code and data) related to the paper “A Semantic Matching Energy Function for Learning with Multi-relational Data” published by A. Bordes et al. in Machine Learning [1].

Abstract

Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language process- ing. In this paper, we present a new neural network designed to embed multi-relational graphs into a flexible continuous vector space in which the original data is kept and enhanced. The network is trained to encode the semantics of these graphs in order to assign low energy values to plausible components. We demonstrate that it can scale up to tens of thousands of nodes and thousands of types of relation while reaching competitive performance on benchmark tasks such as link prediction. This is assessed on standard datasets from the literature as well as on data from a real-world knowledge base (WordNet). Besides, we present how our method can be applied to perform word-sense disambiguation in a context of open-text semantic parsing, where the goal is to learn to assign a structured meaning representation to almost any sentence of free text.

Papers

  • Journal version (Machine Learning): (link)
  • Workshop version (ICLR 13): (openreview)
  • Conference version (AISTATS 12): (pdf)
  • Related paper on Structured Embeddings (AAAI 11): (pdf) (erratum)

Code

The Python code used to run the experiments in [1] is available from Github: (code).
See the included README and comments in the code for details. The code requires the Theano library.
This code contains the core library to use the algorithms of the paper and the experimental script to re-do the experiments (data below). It also includes the code of the Structured Embeddings paper.

Data

  • UMLS + Nations + Kinships. Python cPickle format: (data). See [1] or the references therein for more details.

Entity ranking

  • WordNet. ASCII format: (data). See [1] or the included README for more details.
  • Freebase. ASCII format: (data). See [2] or the included README for more details.

Contacts

Antoine Bordes: Heudiasyc, UMR CNRS 7253, Université de Technologie de Compiègne, France.
Xavier Glorot: LISA, DIRO, Université de Montréal, Canada.
Jason Weston: Google, New York, USA.
Yoshua Bengio: LISA, DIRO, Université de Montréal, Canada.

References

[1] A. Bordes, X. Glorot, J. Weston and Y. Bengio. A Semantic Matching Energy Function for Learning with Multi-relational Data. Machine Learning Journal - Special Issue on Learning Semantics. DOI: 10.1007/s10994-013-5363-6. 2013.
[2] A. Bordes, J. Weston, R. Collobert and Y. Bengio. Learning Structured Embeddings of Knowledge Bases. Proceedings of the 25th Conference on Artificial Intelligence (AAAI), 2011.
[3] W. Denham. The detection of patterns in Alyawarra nonverbal behavior. PhD thesis, 1973.
[4] A. T. McCray. An upper level ontology for the biomedical domain. Comparative and Functional Genomics, 4:80–88, 2003.
[5] R. J. Rummel. Dimensionality of nations project: Attributes of nations and behavior of nation dyads. In ICPSR data file, pages 1950–1965. 1999.