|
|
|
|
|
Model Inference
|
The joint distribution is naturally viewed as a specific class of graphical models, namely factor graphs (see, e.g., Kschischang et al. 2001). A factor graph is simply a probability model, where the joint distribution is represented as a product of potential functions called factors. Factor graphs are visualized as undirected graphs, where the variables appear as circles and factors as squares. An edge is drawn between a variable and a factor whenever the factor depends on the variable. This representation is convenient for guiding inference calculations.
To arrive at the most likely physical model, we derived settings of the variables in the joint distribution (MAP configuration). As the joint distribution involves a large number of inter-dependent variables, we settled for finding an approximate MAP configuration using the max-product algorithm (Kschischang et al. 2001). This local propagation algorithm evaluates approximate max-marginals for each variable where each factor or potential function in the joint distribution ties together the values of the variables involved. This determines, for each potential, what information needs to be shared between the variables in order to evaluate their max-marginals, assuming that the associated variables are independent of each other in the absence of the potential in question. The algorithm finds the correct max-marginals whenever the factor graph has a tree-like structure.
The resulting approximate max-marginals sometimes failed to identify a unique MAP configuration, e.g. some of the max-marginals did not have unique maximizing arguments. In this case, we applied a recursive procedure. At each iteration, we ran the max-product algorithm conditioned on the fixed values obtained from previous steps; variables are fixed if their inferred max-marginal probabilities yield a unique max argument. If there are still undetermined variables, then we chose one such variable, fixed it to one of the degenerate values, and continued the iteration - the algorithm stops when all variables have been set.
The problem of enumerating all MAP configurations becomes intractable as the number of variables greatly exceeds the number of available constraints, as often occurs in genome-wide analysis. Instead of generating all MAP configurations, we revised the recursive algorithm to decompose the physical network into subnetworks; variables in different subnetworks did not interfere with one another given the constraints from the available data. The MAP configurations are therefore expressed as a product of subconfigurations in subnetworks.
|
|
|
|
|
|
|
|