Bayesian learning is a widely used class of techniques in the online machine learning community, especially in large scale domains, due to its conceptional simplicity and philosophical intuitions. In particular, it expresses the expert knowledge and the current belief of the world through the choice of a prior distribution, which is then subsequently updated and refined by (future) observations. This concept is theoretically backed up (in parametric models) by the celebrated Bernstein-von Mises Theorem, which guarantees that under some mild conditions, the process of belief updates (i.e., the posteriori) asymptotically converges (within the parameter space) towards the parameters of the true distribution, from which the observations are in fact generated. This implies that if we have sufficiently many updates (that typically tends to infinity), the solution derived from a Bayesian learning technique also converges to the true solution of the underlying problem (also under some non-restrictive conditions).
However, in many real-world applications, we do not have access to indefinitely large number of updates, due to various physical constraints. For example, it might be difficult to carry out technically and monetarily involved experiments in order to observe new updates, such as in medical trials or sparse event detections. In such cases, there is no theoretical underpinning for the correctness of Bayesian learning in general. To date, while the behaviour of Bayesian methods is well understood from the asymptotic point of view, there is only a limited number of results in the machine learning literature about the finite-time analysis (i.e., performance analysis with finite samples) of Bayesian learning algorithms. Furthermore, these results are typically restricted to special classes of conjugate distributions, and thus, they cannot be extended into the general framework. Given this, it is essential to provide generic mechanisms that can guarantee the (near) optimal performance of the Bayesian online learning approach for finite sample sizes, which can be applied to a variety of machine learning domains (i.e., they are not tailored to special cases).
To tackle this issue, our recent research work (Long Tran-Thanh, Botond Szabo and Nick R. Jennings, 2014) proposes generic theoretical tools that are suitable for finite-time analysis of Bayesian online learning methods. In particular, we derive two concentration inequalities that can be described as follows. The first provides an upper bound for the amount of mass that a posteriori can put outside a small ball, centred at the true parameter of interest, after n (finite) number of updates. The second provides an upper bound for the probability that the distance between the expected value of the posteriori and the true parameter of interest is larger than a certain small value. These inequalities only require mild assumptions on the prior and likelihood functions that hold in many practical statistical models. By using these inequalities, we can indeed provide generic finite-time performance analysis for many Bayesian online learning methods, such as Value of Perfect Information based learning, or Thompson Sampling. Note that our results for the former is the first of its kind, while our results for the latter is significantly more generic, compared to the state of the art, as it requires much milder assumptions on the prior and the likelihood functions.