Hi,
I have a set of data that represents information on dams (see attached txt). I am interested in modelling how the dam `Status_ID` and `Raise_ID` affect probabilities of failure, and the type of failure mode (`FM`) if it does fail. I use the EM learning algorithm to add experience data from the csv to all the nodes (see attached network).
I want to incorporate the fact that I know I have an unbalanced dataset (failures are overrepresented) and that a true overall failure rate (for all types of `Status_ID`, `Raise_ID` and `FM`) is 2E-4. How do I best incorporate this knowledge in this network? I've tried 1) entering it as a likelihood in 'Run' mode or 2) entering it as a prior before adding experience data.
This is an example of how I would use Bayes equation by hand:
P(Failure=Yes | Status_ID="Active" & Raise_ID="Upstream") = P(Status_ID="Active" & Raise_ID="Upstream" | Failure="Yes")*P(Failure="Yes") / P(Status_ID="Active" & Raise_ID="Upstream")
where P(Failure="Yes") = 2E-4, and other probabilities come from the data
Appreciate any help!
Thanks