In case M is probabilistic, there is a straightforward way of doing this: one generally uses the Shannon-Fano code, coding data D using -log P(D|M) bits. This choice has some optimality properties; also, it enables one to re-interpret (parts of) MDL as a form of Bayesian statistics. In case M is not probabilistic (for example, a class of functions together with some error measure) it is not so clear how to associate codes with models.
In this talk, we present a simple formal recipe for associating codes with models. Our method works for arbitrary model classes (they can but need not be probabilistic) and error measures, thereby substantially extending the applicability of MDL. The recipe we give satisfies several optimality properties.
We present some theoretical results concerning our recipe, which seem
promising for practical applications. First of all, it removes a certain
arbitrariness that has been inherent in previous proposals. Second, it
allows one to optimize the behaviour of MDL against (almost) any error
function one wishes. For example, if one uses probabilistic model classes
(like Naive Bayes) for classification, where the (non-probabilistic) zero/one-loss
function is used, our recipe gives somewhat different codelengths than
'classical' MDL. Our theorems imply that, under fairly general circumstances
(even if the data generating machinery is quite unrelated to any of the
models under consideration), our recipe should improve on 'classical' MDL.
Third, if one intends to use the model one inferred from data to predict
future data against a different error functions than the one
one used during learning (this often happens in practice!), then using
our recipe will still often lead to reasonable predictions.
In this talk, we present our recipe, show how it works and discuss
its practical and theoretical aspects. We also discuss connections to Maximum
Entropy. Finally, we show that our recipe has several other applications
outside of optimizing the trade-off between complexity and goodness-of-fit.
The talk assumes no prior knowledge of information theory.
| Date: Thurs., Feb. 4 |
|
Place: Cordura 100
|
Return to the seminar schedule