Supplemental Results
Yeang CH*, Mak HC*, McCuine S, Workman C, Jaakola T, Ideker TI
Genome Biology (2005) 6:R62   [fulltext] [PDF]


 Computer Science and Artificial Intelligence Laboratory /
    Laboratory for Integrative Network Biology
MIT / UC San Diego
 ·  Home
Methods
 ·  Experimental Protocols
 ·  Internal Validation of Models
 ·  Knockout Data Reproducibility
·  Regulated Gene Selection
 ·  Building Physical Network Models
·  Protein-DNA Data
·  Protein-Protein Data
·  Knockout Data
 ·  Model Inference
 ·  Software Download
 ·  Evaluating for New Experiments
Data
 ·  Inferred Network Models
 ·  Download Network Model Data
 ·  References

Prioritizing New Experiments For Model Discrimination

We measured the discriminative power of a new knockout experiment by the mutual information between the physical model configurations M and the predicted responses under experiment e:



where m denotes the index of model configurations, ye the vector of predicted responses, H(Ye) and H(Ye|M) the entropy and the conditional entropy of Ye. It can be interpreted as the maximal amount of "information" about M that can be extracted from Ye. Equivalently, it is the reduction of uncertainty about M by knowing Ye.

However, using mutual information to gauge new experiments implies that we employed both positive (significant responses) and negative (insignificant responses) evidence to discriminate degenerate model configurations. This is not the case since we constructed potential functions with significant knockout effects only. For example, suppose two classes of model configurations predict two distinct response vectors Ye and Ye’. They differ only on the first gene: Ye=-1 while Ye’=0. If the actual response under experiment e is Ye, then these two model classes are discriminated because they yield different likelihood values. On the other hand, if the actual response is Ye’ instead, then the likelihood values of these two model classes are identical. This is because Ye’= 0 and we ignored the evidence pertaining to gene 1 when constructing the likelihood function.

In order to be consistent with the model discrimination procedure we used, the mutual information score should only capture the information about the significant aspect (up/down regulation) of the predicted response. We revised the mutual information score by denoting Pe as the predicted pattern of change/no change under experiment e. For instance, Pe=(0 1 1 1 0) denotes that genes 2, 3 and 4 are changed (up or down regulated) and genes 1 and 5 remain unchanged under experiment e.

In addition, YeP is the predicted response restricted to the genes which have significant predicted responses according to Pe. If Pe=(0 1 1 1 0), then YeP is a three-component vector with entries +/-1, denoting the predicted response of genes 2, 3, 4. A proper revision to the mutual information score is to condition on change/no change patterns Pe and then compute the entropy reduction given the predicted response YeP restricted to significantly changed genes. The information gain is the expected reduction of model entropy over the change/no change patterns. To be precise,


where YeP denotes the predicted response on significantly changed genes consistent with pattern Pe=p. This quantity can be further simplified as


The derivation is possible since P(Pe|M) and P(Pe,Ye|M) are deterministic.

Ye and Pe are high-dimensional random variables, thus their entropies are difficult to evaluate. To simplify the task we approximated the joint entropy with the sum of marginal entropies with single variables


To calculate the marginal probability of the predicted responses, we augmented the original model with a potential function specifying the conditions of predicting the response of a gene from functional annotations. The marginal probability of predicted responses can be approximated by running sum-product, another message-passing algorithm, on this augmented model.