UMR CNRS 7253

The main objective of this project is to bring a leap forward in large sparse tensors manipulation by proposing new ways to represent them in a high-level manner. By high-level, we mean that these representations would enable to condense the original tensors, to complete them by filling in missing values, and to ease their matching and merging. Our goal is to develop new methods able to do so on large-scale tensors encoding real-world Knowledge Bases (KBs).

This is an ambitious program but we strongly believe that we have the assets for its realization. Learning to summarize high- dimensional sparse tensors is a recent research topic, which has important implications in many applications. In spite of that, only few projects consider this problem as the focal point of their program. We deliberately make this innovative choice because of the influence that progress in this direction could have in many domains. Besides, this complicated task is exciting because it entails to face many new and original scientific challenges with Machine Learning.

The most prominent technical originalities of our project come from the Machine Learning techniques we bring into this field, and their combination in a single line of work. Thus, we will adapt and mix techniques from statistical relational learning, Deep Learning (DL) and evidence theory. With its ability to learn non-linear representations of complex data, DL is appealing for our project. Besides, DL could be fruitfully combined with statistical relational learning models so that tensor dependencies could be directly integrated within the neural network architecture. This could lead to non-linear models able to explicitly take into account the KBs structure. Until now, almost no work in this direction has been proposed. Evidence theory is also highly relevant to the problem of KB matching, where multiple data sources need to be aggregated. Because it is very efficient to combine classifiers or different sources of information with various uncertainty degrees, evidence theory might bring a major improvement in tensor matching. Still, this has little been used in this context, mostly because of the large- scale of the data. It is true that the high-dimensionality is the main difficulty of this project.

We will have to create methods with low complexity and good scaling properties. Deep Learning can have this ability but it involves non-convex problems whose optimization is non-trivial and must be tuned carefully. As we said earlier, evidence theory methods usually suffer from high complexity. However, in our case, these methods will not be applied directly to the large raw data but to a condensed high-level version (obtained with DL). Hence, we will save some orders of magnitude of the data dimensions. Other complications of our proposed directions involve model selection and initialization. However, our team has experience of this kind of issues and the ability to define solutions. With these methods and our qualifications, we expect to outperform the state-of-the-art in link prediction and tensor matching.

UMR CNRS 7253

Site Tools

Sidebar

Page Tools

User Tools