Author Topic: How much data do I need to learn a network ?  (Read 15103 times)

Offline Anders L Madsen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 2295
    • View Profile
How much data do I need to learn a network ?
« on: January 27, 2007, 10:48:52 »
It is difficult to give an exact measure, as this depends on the domain that the data represents. The more complex the model is, the more data is required to learn the model from the data.

A rule of thumb says that you should have at least 5 cases for each cell in the largest table in the network. That is, to learn the distribution for any given parent configuration, you should have (5 * number of states) cases.

HUGIN EXPERT A/S

Offline nadjet

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: How much data do I need to learn a network ?
« Reply #1 on: February 02, 2007, 16:26:32 »
Dear Anders,
how about the case where you do not know how complex your model will be (i.e. when learning a model from data)?
let's say we have 40 variables and 3 states each (states are the same between variables) , what would be a "good" size dataset?
many thanks

Offline Anders L Madsen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 2295
    • View Profile
Re: How much data do I need to learn a network ?
« Reply #2 on: February 03, 2007, 20:31:16 »
Dear Nadjet,

It is more or less impossible to determine whether or not you have sufficient data before the learning takes place (unless you have very much data in which case learning may be ressource demanding).

Once you have a model you can based on the abovementioned rule of thumb determine if you had sufficient data to learn the resulting model.

Hope this is useful,
-Anders
HUGIN EXPERT A/S