Deviance Information Criteria
Deviation information criteria (DIC) is a metric used to compare Bayesian models. It is closely related to the Akaike information criteria (AIC) which is defined as \(2k - 2 \ln \hat{\mathcal{L}}\), where k is the number of parameters in a model and \(\hat{\mathcal{L}}\) is the maximised log-likelihood. The DIC makes some changes to this formula. Firstly by replacing a maximised log-likelihood with the log-likelihood evaluated at the Bayes estimate \(\hat{\theta}\) and by replacing \(k\) with an alternative correction
\[\begin{aligned} \text{DIC} & = -2 \log p(y \mid \hat{\theta}) + 2 p _{\text{DIC}} ,\\ p_{\text{DIC}} & = 2 \left( \log p(y \mid \hat{\theta}) - \mathbb{E} _{\text{post}} \log p(y \mid \theta) \right). \end{aligned}\]These changes make it more suitable for a Bayesian model, but beware, it isn’t a fully Bayesian metric in the philosophical sense. You are reducing probability distributions down to point estimates, so it looses some of the Bayesian credibility. It’s also DIC and not DIC criteria as that would be an example of RAS syndrome.
Enjoy these types of posts? Then you should sign up for my newsletter.
A DIC Example
To demonstrate how we can calculate DIC, I simulate some data and draw from its posterior distribution. For simplicity, I use the Poisson distribution with a conjugate gamma distribution. This lets me easily draw from the posterior distribution.
This gives us a DIC value of \(\sim 513\). Which is useless on its own, but given two models we can compare the DIC values and favour the model with lowest DIC.
Model Comparison with DIC
To demonstrate this, we simulate some data from a gamma distribution and fit two models; a gamma and a lognormal model using Stan.
The Stan code for the models is simply:
Replacing the gamma distribution for a lognormal in the other model.
I simulate 1000 datapoints and sample from Stan before forming a 2 column matrix with of the posterior samples.
To construct a function for the DIC score, you need to be able to pass in
the data, likelihood function and posterior samples. Thankfully, R has
a fairly standard way of using its probability distributions, we can
rely on both dgamma
and dlnorm
to take in the same type of
arguments. But the dic
function is
not flexible enough that any distribution can be passed through.
By applying the function to the sample data and calculating the values we find that
Model | DIC |
---|---|
Gamma | 375.2491 |
Lognormal | 478.6547 |
The true model (gamma) has the lower DIC as expected. So everything is working as expected!
You can also asses the DIC on some out-of-sample data. This is achieved
by simply simulating from the same distribution and passing it through
the dic
function.
Model | DIC |
---|---|
Gamma | 12.42242 |
Lognormal | 30.78499 |
Again, the gamma model has the lower DIC, more evidence that this is the correct model.
Overall, the DIC is a useful metric to asses your model correctness and easy to calculate using your posterior samples.
Submitted to RWeekly