The adaptation process is built on the assumption that each conditional distribution follows a Dirichlet distribution with the conditional probabilities as the mean vector and with the variances implicitly specified using an "experience count" (also called an "equivalent sample size"). If adaptation is performed using an incomplete case, then the true updated distribution consists of mixtures of Dirichlet distributions. If we try to continue with these mixtures, we get new mixtures (with more terms) when we update with another incomplete case. This soon becomes unmanageable.
Because of this, we approximate each mixture of Dirichlet distributions with a single Dirichlet distribution. The approximating distribution is chosen such that it has the same mean vector as the true distribution and the same sum-of-variances as the true distribution. The latter constraint determines the experience count.
I have attached an old note that explains the calculations.