Supplemental Results
Yeang CH*, Mak HC*, McCuine S, Workman C, Jaakola T, Ideker TI
Genome Biology (2005) 6:R62   [fulltext] [PDF]


 Computer Science and Artificial Intelligence Laboratory /
    Laboratory for Integrative Network Biology
MIT / UC San Diego
 ·  Home
Methods
 ·  Experimental Protocols
 ·  Internal Validation of Models
 ·  Knockout Data Reproducibility
·  Regulated Gene Selection
 ·  Building Physical Network Models
·  Protein-DNA Data
·  Protein-Protein Data
·  Knockout Data
 ·  Model Inference
 ·  Software Download
 ·  Evaluating for New Experiments
Data
 ·  Inferred Network Models
 ·  Download Network Model Data
 ·  References

Observations Of The Reproducibility Of Knockout Data

We assessed the reproducibility of gene expression experiments by comparing swi4Δ and gcn4Δ data from Hughes at al. (Rosetta) to the same knockout data generated from our experimental settings. We measured of overlap of two data sets with several metrics, including the Pearson correlation coefficient between two expression profiles, the permutation p-value of the correlation coefficient, the rank correlation coefficient and its permutation p-value between the two expression profiles, and the hyper-geometric p-value of the overlap of up and down regulated genes. The rank correlation coefficient was computed by replacing each expression profile - a vector of expression log10 ratios - with a vector of their ranks. The permutation p-value of the correlation coefficient was computed by fixing one profile and randomly permuting the entries of the other profile. The hyper-geometric p-value evaluated the significance of overlap of significant changes between two data sets. Suppose N1 out of N genes are up or down regulated in Rosetta swi4Δ, N2 genes are up or down regulated in our new swi4Δ, and n genes are significantly changed in both data sets (in the same direction). The p-value calculated the probability that by randomly selecting N2 genes from a pool of N genes with N1 significant changes, ≥n selected genes had significant changes. N1 and N2 were then swapped to obtain a second p-value; the maximum of those two values was reported.



Table S2: Correlations between swi4Δ and gcn4Δ data from Rosetta and the new experiments.


 

swi4Δ total

gcn4Δ total
correlation -0.013 0.34
correlation p-value 0.84 < 10-4
rank correlation -0.058 0.145
rank correlation p-value 1.0 < 10-4
hyper-geometric p-value 0.188 4.88×10-21
     
  swi4Δ subset gcn4Δ subset
correlation 0.344 0.831
correlation p-value 8×10-5 < 10-5
rank correlation 0.353 0.705
rank correlation p-value 9×10-5 < 10-5
hyper-geometric p-value 1.05×10-5 < 1.8×10-13




Thus, the two Swi4 profiles showed no correlation, while the Gcn4 profiles had significant correlations. Clearly, this lack of correlation for the swi4Δ experiments could be due to uncontrollable differences in our array procedure versus Rosetta's, such as the microarray fabrication process. However, we suspected comparing the expression profiles across the entire genome was also problematic, since a majority of genes were unaffected by a single deletion. Measuring the log10 ratios with respect to wild type in this case would be analogous to measuring the correlation of the background noise, leading to a low correlation between experiments. Since Gcn4 is a master regulator for many genes involved in amino acid synthesis, a gcn4Δ induced many changes which are not buried under random fluctuations, leading to a higher correlation than that of Swi4.

We then restricted the expression profiles of each experiment to the subset of genes which are putatively regulated by Swi4 or Gcn4. For Swi4, we chose the genes which are directly bound by it according to the CHIP-chip data (Lee et al. 2002) or appeared in model #1, and for Gcn4 the genes which are bound by it and are significantly down regulated in the Rosetta gcn4Δ data; this list of selected gene sets is shown in Table S3, and the lower part of Table S2 shows the comparison results on the restricted gene sets. Both correlation coefficients and permutation p-values were much more significant in this restricted set compared to the genome-wide comparisons.



Table S3: Restricted subsets used to evaluate the reproducibility.

Swi4 subset

AAP1
ATR1
BAT2
CCP1
CLB1
CLB2
CLB6
COX9
CPA2
CSI2
CUP9
ECM13
ECM33
ERG6
EXG1
HAP1
HAP4
HSC82
HTA1
HTB1
ISU2
LYS20
MNN1
MSN4
NCE102
PCL7
PDR12
PRY2
QCR2
RNR1
RPI1
RPL19B
SAT2
SCW10
SGA1
SNQ2
SOK2
SRL1
SVS1
SWI4
TRP4
UTR2
YAP6
YBL029W
YBL112C
YBL113C
YDR451C
YDR545W
YEL045C
YEL047C
YEL077C
YER045C
YER189W
YER190W
YFR006W
YGR086C
YGR153W
YGR296W
YHL049C
YHR048W
YIL056W
YIL177C
YJL217W
YJL225C
YJR078W
YKL051W
YLR084C
YLR463C
YLR465C
YLR466W
YLR467W
YML133C
YNL339C
YNR067C
YOL011W
YOR248W
YOR315W
YOX1
YPL267W
YPL283C
YPR203W
YPS4


Gcn4 subset

ADH5
APG1
ARG1
ARG11
ARG3
ARG7
ARG8
ARO1
ARO3
CPA2
GCN4
HAD1
HAL2
HIS1
HIS4
HOM3
LEU4
MET16
PCL5
RIB5
TRP2
TRP3
UGA3
YER069W
YER073W
YGL184C
YHM1
YHR122W
YHR162W
YJL200C
YLR152C
YMC1
YOL119C