Author Topic: Model validation  (Read 17057 times)

Offline Van-Anh Phan

  • Newbie
  • *
  • Posts: 2
    • View Profile
Model validation
« on: July 16, 2008, 15:46:50 »
I am interested in how to validate a model network. I would like to ask you:

- How to perform cross-validation in Hugin?
- Can I do it with Hugin 6.6?
- What is the appropriate ratio of training and testing data sets?

- I have tried to get BIC and AIC scores of the model, but in the NETWORK LOG, I do not see it. I have been explained that these scores also were computed using Analysis Wizard. Is it about the Hugin version 6.6 or I did not get it correctly?

Thank you very much for your time
Van Anh

Offline Anders L Madsen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 2295
    • View Profile
Re: Model validation
« Reply #1 on: July 18, 2008, 13:09:58 »
I am interested in how to validate a model network. I would like to ask you:
- How to perform cross-validation in Hugin?
- Can I do it with Hugin 6.6?
- What is the appropriate ratio of training and testing data sets?
Cross-validation is part of the HUGIN Analysis Wizard. The Analysis Wizard was introduced with HUGIN Graphical User Interface (GUI) v6.8.

You may consider different ratios between training and testing data sets such as 3, 5 or 10. You may want to take a look at the Patterns & Predictions tool which also supports n-fold cross validation: http://www.poulinhugin.com/

Quote
- I have tried to get BIC and AIC scores of the model, but in the NETWORK LOG, I do not see it. I have been explained that these scores also were computed using Analysis Wizard. Is it about the Hugin version 6.6 or I did not get it correctly?
Since HUGIN GUI v6.7 the EM learning algorithm has reported the AIC and BIC scores to the Network Log.

In previous exhanges of emails, I have wrongly assumed that you had the latest version of HUGIN GUI which is version 6.9. I strongly recommend that you consider upgrading to the lastet version of HUGIN GUI (i.e., v6.9).
HUGIN EXPERT A/S

Offline Van-Anh Phan

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Model validation
« Reply #2 on: July 18, 2008, 15:34:23 »
Thank you very much, Anders. Now I have a clear reason to ask for Hugin 6.9

My next question is following:

In learning Bayesian network, there are two components:
- Learning structure
- Learning parameters given structure

(1) Model validation means validate the structure or/and validate the parameters?

 I ask this question because one friend of mine explained to me that learning parameters given structure, for example by EM learning, does not need to divide data of cases into training set and testing set (for most of times).

(2) How do the goals of modeling (prediction, explanation or exploration) and learning method influence the model validation?

Thank you for your time

Offline Anders L Madsen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 2295
    • View Profile
Re: Model validation
« Reply #3 on: August 20, 2008, 10:09:20 »
Quote
(1) Model validation means validate the structure or/and validate the parameters?

I ask this question because one friend of mine explained to me that learning parameters given structure, for example by EM learning, does not need to divide data of cases into training set and testing set (for most of times).

A model is a fully specified Bayesian network, i.e., both the structure and the parameters.

The EM algorithm estimates the parameters of the Bayesian network using all data entered into the domain. The EM algorithm iterates two steps Expectation and Maximization. It terminates when one of two stopping criteria is meet. Either an upper limit on the number of iterations or the improvement in log-likelihood is less than a specified threshold. The EM algorithm uses the log-likelihood as a quality measure. From the log-likelihood it is easy to compute both the AIC and the BIC scores. When the EM algorithm stops it reports the log-likelihood, AIC and BIC to the network log.

These scores indicate the quality of the model. Since the EM algorithm may get stuck in a local optimum, it is common to run the EM algorithm a number of times with different initial parameter settings. The model with the highest score is selected. This will be the model which best represents the data (among the models you have constructed).

Cross validation is used to measure the performance of models for classification tasks. If you are building a model for email classification, then you are not interested in how well the model represents the data, but how well the model can classify emails. To measure this you test the model on a testing set which has not be used as part of the training. This will give you an indication of how good a classifier you have build. If data is sparse, then you may consider k-fold cross validation.

If you data is sparse, you will use k-fold cross validation to estimate the quality of the model, but use the entire data set to train the model you are actually using for the classificaiton.

Quote
(2) How do the goals of modeling (prediction, explanation or exploration) and learning method influence the model validation?

It depends on the type of task you are considering. If it is classification task, then you would use cross validation. If it is a task where there is no special target node and where the interactions between nodes are important, then you would foucs on using the AIC and BIC scores.

If you are not building a model only from data, then you may consider how the model performs on selected scenarios, and go through the model with your experts (if any) to validate the independence and dependene properties of the model.

I apologize for the delay in replying to your questions. :-[
HUGIN EXPERT A/S