Ljubljana, 6-8 September
IDA2007 CONFERENCE
The 7th International Symposium on Intelligent Data Analysis
Plenaries

Entropy Properties of a Decision Rule Class in Connection with machine learning abilities

Abstract

Many methods of Machine Learning are based on the idea of empirical risk minimisation. It is to find a decision rule or a model from some set which most perfectly fits the data presented in the training set. This idea is based on the large number law: empirical risk converges to real risk, if the training set is large enough. But if the class of decision rules or models is too large (in some sense) one meets the problem of oferfitting, the model perfectly corresponds to the data presented in the training set, but shows large errors on new data. It is due to the fact that only uniform convergence of empirical risk to the real risk guarantees closeness of the optimal model behaviour on the training set and on the new data.

We introduce the notion of entropy of a decision rule class over a fixed sample sequence as log of the number of possible classifications of the sequence by the rules of the class. Maximum entropy over sequences of a fixed length l determines sufficient condition of the uniform convergence and corresponding estimates. But only average entropy H(l) behaviour determine necessary and sufficient condition of the uniform convergence. The condition is that H(l) / l (average entropy per symbol) should go to zero when the sequence length goes to infinity.

If the condition does not hold then there exists a set of objects with non zero probability measure, such that almost all sequences of arbitrary finite length from this set may be divided in all possible ways by the rules of the class. One can easily see, that in this case overfitting is inevitable.

Similar results are found for real dependencies instead of decision rules.

Alexey Chervonenkis


Ground Facts, Rules and Probabilistic Inference for Cyc

Abstract

One aspect of Cyc is a very large, logic-based knowledge base that includes, inter-alia, large amounts of background knowledge over a wide variety of domains, but it is more than that; the Cyc project is an attempt to move towards general artificial intelligence by supporting automated reasoning about a very wide variety of real-world concerns. To support that goal, Cyc also encompasses, obviously enough, and inference engine able to reason over a large, contextual, knowledge base, but it also includes components for interpreting and producing natural language, acquiring knowledge and responding to user queries, and for interfacing with other software.

Applying logic to representation of general knowledge, /at scale/, and using it in the production of intelligent behaviors has been difficult enough; unfortunately it is becoming clear that doing so using traditional logics is probably not sufficient, either for satisfying a long term goal of supporting general intelligence, or even for shorter term goals, like recognizing, interpreting, and elaborating descriptions of piracy events.

In this talk, I'll briefly describe what Cyc is, and has been, and how it is growing, touch on an early approach to abductive reasoning and classification in a traditional logical framework, and some difficulties with that approach, and then describe recent, very initial work training the Markov Logic networks based on ground facts and rules within the millions of axioms of the Cyc KB. Finally I'll sketch a vision for a system that truly integrates both sound, deductive reasoning, and the bounded unsoundness of probabilistic classification, induction, abduction and deduction.

Michael Witbrock, Cycorp Europe (cycorp.eu)


All rights reserved © Authors