Author Topic: EM and ExceptionCaseStateTooLarge  (Read 16661 times)

Offline dechouxb_AAU

  • Newbie
  • *
  • Posts: 5
    • View Profile
EM and ExceptionCaseStateTooLarge
« on: April 18, 2009, 09:43:52 »
Hi,

I have not noticed in the manual or in the forum any explicit limit to the size of the state space of a discrete variable. However, when trying to run EM on a network with variables with a large state space, I got the following exception :

Code: [Select]
COM.hugin.HAPI.ExceptionCaseStateTooLarge: A case state index larger than 32767 has been specified.
at COM.hugin.HAPI.ExceptionHugin.throwException(ExceptionHugin.java:33)
at COM.hugin.HAPI.Domain.parseCases(Domain.java:1516)

The Java documentation does not really explain what is this Exception for. Does that mean that the state space can not contain more than 32767 states (which is the upper bound of a signed 16 bits register)?

Or is there another reason for this exception?

Mvh

Bertrand

Offline Frank Jensen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: EM and ExceptionCaseStateTooLarge
« Reply #1 on: April 20, 2009, 15:57:46 »
The upper limit on the number of states only applies to case data.  It is mentioned in Section 11.1 of the Hugin API Reference Manual (see the description of h_node_set_case_state).

Do you really need more than 32767 different states?

Offline dechouxb_AAU

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: EM and ExceptionCaseStateTooLarge
« Reply #2 on: April 23, 2009, 17:15:15 »
I would agree that in most cases, this limit should be sufficient. But it just happen that my training set does have more than 32767 cases.

Is that the only way to provide the training data to the EM? Or can we provide it through some kind of iterator?


Offline Frank Jensen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: EM and ExceptionCaseStateTooLarge
« Reply #3 on: April 24, 2009, 17:44:56 »
There can be more than 32767 cases, but there cannot be more than 32767 states for a given node (if it should be learned by the EM algorithm).

In order to get good results from the EM algorithm, there should be "sufficiently many" cases for each parent state configuration in the CPT: If the node has N states, then a good number of cases for a given parent state configurtaion would be 10*N.

So with this many states, you will need a lot of data.

Offline dechouxb_AAU

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: EM and ExceptionCaseStateTooLarge
« Reply #4 on: May 12, 2009, 12:10:05 »
I have sufficient data, and this is indeed a lot of data.

I implemented EM for my particular network but I would like to speed it up. (Due to the size of my training set, it takes obviously a while.)

Is there any option proposed by Hugin, like caching, that could be relevant for that matter?

I just found the domain.saveInMemory() function, that I am using in the following way though I am not sure it is the right way.

Code: [Select]
// creation of a Domain domain
domain.compile();
domain.saveToMemory();
while EM is not finished
     // Expectation step
     for every case in my training set
          domain.retractFindings();
          // enter evidences by the node.selectState(...) function
          domain.propagate(Domain.H_EQUILIBRIUM_SUM, Domain.H_EVIDENCE_MODE_NORMAL);
          // get the expected counts by using the domain.getMarginal(....) function
[...]

Thanks in advance

Offline Frank Jensen

  • HUGIN Expert
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: EM and ExceptionCaseStateTooLarge
« Reply #5 on: May 12, 2009, 14:24:06 »
The Domain.saveToMemory operation is used correctly.

See also Section 9.6 in the Hugin API Reference Manual (api-manual.pdf).