====== Project: Translating Embeddings for Modeling Multi-relational Data ====== This page proposes material (pdf, code and data) related to the paper "Translating Embeddings for Modeling Multi-relational Data" published by A. Bordes et al. in Proceedings of NIPS 2013 [1]. ===== Abstract ===== We consider the problem of embedding entities and relationships of multi-relational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose, TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples. ===== Papers ===== * Conference version (NIPS 13): {{en:cr_paper_nips13.pdf|(pdf)}} [[http://nips.cc/Conferences/2013/Program/event.php?ID=4044|(nips)]] {{en:transe_nips13_poster.pdf|(poster)}}\\ * Related paper on using this method for improving relation extraction (EMNLP 13): [[http://aclweb.org/anthology//D/D13/D13-1136.pdf|(pdf)]] ===== Code ===== The Python code used to run the experiments in [1] **is now available** from Github as part of the SME library [3]: {{https://github.com/glorotxa/SME|(code)}}. It allows to reproduce the main experiments from the paper (from Table 3) using the Raw metric (Results can be slightly different due to the Stochastic Gradient training -- for FB15k, we can reach even better results). See the included README and comments in the code for details on how to use it. The code requires the [[http://deeplearning.net/software/theano/ | Theano library]]. ===== Data ===== * **Freebase (FB15k)**. ASCII format: {{en:fb15k.tgz|(data)}}. See [1] or the included README for more details. (FB1M also in [1] will not be released). * **WordNet**. ASCII format: {{en:wordnet-mlj12.tar.gz|(data)}}. See [3] or the included README for more details. ===== Contacts ===== [[https://www.hds.utc.fr/~bordesan/ | Antoine Bordes]]: Heudiasyc, UMR CNRS 7253, Université de Technologie de Compiègne, France.\\ [[https://www.hds.utc.fr/~nusunier/ | Nicolas Usunier]]: Heudiasyc, UMR CNRS 7253, Université de Technologie de Compiègne, France.\\ [[http://www.thespermwhale.com/jaseweston/ | Jason Weston]]: Google, New York, USA.\\ ===== References ===== [1] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston and O. Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In Advances of Neural Information Processing Systems 2013. \\ [2] J. Weston, A. Bordes, O. Yakhnenko and N. Usunier. Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, USA. 2013.\\ [3] A. Bordes, X. Glorot, J. Weston and Y. Bengio. A Semantic Matching Energy Function for Learning with Multi-relational Data. Machine Learning Journal - Special Issue on Learning Semantics. 2012.