Author Topic: General Model question.  (Read 12538 times)

Offline jeff_porter

  • Newbie
  • *
  • Posts: 5
    • View Profile
General Model question.
« on: June 26, 2007, 10:14:38 »

Just a quick question about best practice of the number of links to a node.

For example. most of my models have 2 or 3 links between each node (max).

One of my models though has ended up with a node being linked to by 25 other nodes.

Now, the Hugin Reasearcher tool can cope with this, but if I add another few links, i.e. 28, then tool will grin to a halt and if I try to save the model, then the program just keeps eating up memory (1gig plus) and windows hangs.

So what is the best practice / max recommended number of links between nodes?

Notes:
Version 6.6

Offline Anders L Madsen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 2295
    • View Profile
Re: General Model question.
« Reply #1 on: June 27, 2007, 16:14:27 »
In general, probabilistic networks, i.e., Bayesian networks and influence diagrams, are well suited for reasoning and decision making under uncertainty when the dependence and independence relations of a problem domain can be described as acyclic, directed graph (DAG) over variables representing entities in the problem domain.

A Bayesian network, for instance, is an intuitive graphical represention of a joint probability distribution over its variables. The joint distribution is encoded as a product of conditional probability tables as defined by the DAG. In order for the network to be a compact representation of the joint distribution, it is necessary that the DAG structure is sparse. The DAG is sparse when it has few edges. In addition, a sparse DAG is computationally more efficient than a dense DAG.

The complexity of DAG is, in part, determined by the size of the largest parent set in the graph.
There a number or reasons why it is advantageous to keep the number of parents as low as possible. One reason is to reduce the size of a conditional probability table (CPT)  of a node. The size of a CPT is computed as:
||X|| * ||Y_1|| * ... * ||Y_n||

where X is the child node, Y_1, ..., Y_n are the parents of X, and ||Z|| specifies the number of states of variable Z. Hence, the number of parameters (i.e., entries) in a CPT grows exponentially with the number of parents of the node. It may be difficult to obtain a value for all entries when the CPT is large. Basically, each parameter can either be specified based on expert knowledge, using an expression or estimated from data.

Acquiring the entries of a CPT from a domain expert when the CPT is very large may be impossible. The expert will often not be able to differentiate between the large number of different parent configurations. Also, estimating a CPT from data will require a large amount of data which may not be available. In the case of  a CPT specified using an expression, it may be possible to construct the CPT, but inference may be intractable due to the large CPT.

Thus, in conclusion, you should try to keep the number of edges to a minimum.



HUGIN EXPERT A/S