Cytoscape 2.6.2 (c) 2006,2007 ISB, MSKCC, UCSD

cytoscape.data
Class ExpressionData

java.lang.Object
  extended by cytoscape.data.ExpressionData
All Implemented Interfaces:
Serializable

public class ExpressionData
extends Object
implements Serializable

This class provides a reader for the common file format for expression data and an interface to access the data.

There are variations in the file format used; the following assumptions about the file format are considered valid. Attempting to read a file that does not satisfy these assumptions is not guaranteed to work.

1. A token is a consecutive sequence of alphanumeric characters separated by whitespace.
2. The file consists of an arbitrary number of lines, each of which contains the same number of tokens (except for possibly the first line) and has a total length less than 8193 characters.
3. The first line of the file is a header line with one of the following three formats:

cond1 cond2 ... condN cond1 cond2 ... condN NumSigConds

cond1 cond2 ... condN

<\t><\t>RATIOS<\t><\t>...LAMBDAS

Here cond1 through condN are the names of the conditions. In the first case, the two sequences of condition names must match exactly in order and lexicographically; each name among cond1 ... condN must be unique. In the second case, each name must be unique, but need only appear once. The last label, NumSigConds, is optional.
The third case is the standard header for a MTX file. The numer of '\t' characters between the words "RATIOS" and "LAMBDAS" is equal to the number of ratio columns in the file (which must be equal to the number of lambda columns).

4. Each successive line represents the measurements for a partcular gene, and has one of the following two formats, depending on the header:

E E ... E S S ... S I

E E ... E

where is the formal name of the gene, is the common name, the E's are tokens, parsable as doubles, representing the expression level change for each condition, the S's are tokens parsable as doubles representing the statistical significance of the expression level change, and I is an optional integer giving the number of conditions in which the expression level change was significant for this gene.

The first format is used in conjuction with the first or third header formats. The second format is used in conjunction with the second header format.

5. An optional last line can be included with the following form:

NumSigGenes: I I ... I

where there are N I's, each an integer representing the number of significant genes in that condition.

See Also:
Serialized Form

Field Summary
static int LAMBDA
          Significance value: LAMBA.
static int MAX_LINE_SIZE
           
static int NONE
          Significance value: NONE.
static int PVAL
          Significance value: PVAL.
static int UNKNOWN
          Significance value: UNKNOWN.
 
Constructor Summary
ExpressionData()
          Constructor.
ExpressionData(String filename)
          Constructor.
ExpressionData(String filename, String keyAttributeName)
          Constructor.
ExpressionData(String filename, String keyAttributeName, TaskMonitor taskMonitor)
          Constructor.
ExpressionData(String filename, TaskMonitor taskMonitor)
          Constructor.
 
Method Summary
 void convertLambdasToPvals()
          Converts all lambdas to p-values.
 void copyToAttribs(CyAttributes nodeAttribs, TaskMonitor taskMonitor)
          Copies ExpressionData data structure into CyAttributes data structure.
 Vector getAllMeasurements()
          Gets all Measurements.
 int getConditionIndex(String condition)
          Gets the index value of the specified experimenal conditon.
 String[] getConditionNames()
          Gets an Array of All Experimental Conditions.
 String getDescription()
          Returns a text description of this data object.
 double[][] getExtremeValues()
          Returns a 2D Matrix of Extreme Values.
 String getFileName()
          Gets the Name of the Expression Data File.
 File getFullPath()
          Gets the File representation of the Expression Data Object.
 String getGeneDescriptor(String gene)
          Gets the Gene Descriptor for the specified gene.
 String[] getGeneDescriptors()
          Gets an Array of GeneDescriptors.
 Vector getGeneDescriptorsVector()
          Gets a Vector Gene Descriptors.
 String[] getGeneNames()
          Gets a List of All Gene Names.
 Vector getGeneNamesVector()
          Gets a List of All Gene Names.
 mRNAMeasurement getMeasurement(String gene, String condition)
          Gets Single Measurement Value for the specified gene at the specified condition.
 Vector getMeasurements(String gene)
          Gets a Vector of all Measurements associated with the specified gene.
 int getNumberOfConditions()
          Gets Total Number of Experimental Conditions.
 int getNumberOfGenes()
          Gets Total Number of Genes.
static double getPvalueFromLambda(double lambda)
          Gets a PValue of the specified lambda value.
 int getSignificanceType()
          Gets the Significance Type.
 boolean hasSignificanceValues()
          Indicates whether the expression data has significance values.
 boolean loadData(String filename, String keyAttributeName)
          Loads the Specified File into memory.
 void setGeneDescriptors(Vector newDescripts)
          Sets a List of Gene Descriptors.
 void setGeneNames(Vector newNames)
          Sets a List of Gene Names.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_LINE_SIZE

public static final int MAX_LINE_SIZE
See Also:
Constant Field Values

PVAL

public static final int PVAL
Significance value: PVAL.

See Also:
Constant Field Values

LAMBDA

public static final int LAMBDA
Significance value: LAMBA.

See Also:
Constant Field Values

NONE

public static final int NONE
Significance value: NONE.

See Also:
Constant Field Values

UNKNOWN

public static final int UNKNOWN
Significance value: UNKNOWN.

See Also:
Constant Field Values
Constructor Detail

ExpressionData

public ExpressionData()
Constructor. Creates an empty Expression Data object with no data.


ExpressionData

public ExpressionData(String filename)
               throws IOException
Constructor. Loads the specified filename into memory.

Parameters:
filename - Name of Expression Data File.
Throws:
IOException - Error opening/parsing the expression data file.

ExpressionData

public ExpressionData(String filename,
                      String keyAttributeName)
               throws IOException
Constructor. Loads the specified filename into memory.

Parameters:
filename - Name of Expression Data File.
Throws:
IOException - Error opening/parsing the expression data file.

ExpressionData

public ExpressionData(String filename,
                      TaskMonitor taskMonitor)
               throws IOException
Constructor. Loads the specified file into memory, and reports its progress to the specified TaskMonitor Object. This option is useful for displaying a progress bar to the end-user, while expression data is being parsed.

Parameters:
filename - Name of Expression Data File.
taskMonitor - TaskMonitor for reporting/monitoring progress.
Throws:
IOException - Error opening/parsing the expression data file.

ExpressionData

public ExpressionData(String filename,
                      String keyAttributeName,
                      TaskMonitor taskMonitor)
               throws IOException
Constructor. Loads the specified file into memory, and reports its progress to the specified TaskMonitor Object. This option is useful for displaying a progress bar to the end-user, while expression data is being parsed.

Parameters:
filename - Name of Expression Data File.
keyAttributeName - Identifies an attribute to use in mapping the data to the nodes.
taskMonitor - TaskMonitor for reporting/monitoring progress.
Throws:
IOException - Error opening/parsing the expression data file.
Method Detail

getFileName

public String getFileName()
Gets the Name of the Expression Data File.

Returns:
File String, as it was originally passed to the constructor, or null, if no filename is available.

getFullPath

public File getFullPath()
Gets the File representation of the Expression Data Object. This File object can be queried for a full path to the file, etc.

Returns:
File Object.

loadData

public boolean loadData(String filename,
                        String keyAttributeName)
                 throws IOException
Loads the Specified File into memory.

Parameters:
filename - Name of Expression Data File.
Returns:
always returns true, indicating succesful load.
Throws:
IOException - Error loading / parsing the Expression Data File.

convertLambdasToPvals

public void convertLambdasToPvals()
Converts all lambdas to p-values. Lambdas are lost after this call.


getPvalueFromLambda

public static double getPvalueFromLambda(double lambda)
Gets a PValue of the specified lambda value.

Returns:
a very close approximation of the pvalue that corresponds to the given lambda value

getSignificanceType

public int getSignificanceType()
Gets the Significance Type.

Returns:
one of NONE, UNKNOWN, PVAL, LAMBDA

getDescription

public String getDescription()
Returns a text description of this data object.

Returns:
Text Decription of this Data Object.

setGeneNames

public void setGeneNames(Vector newNames)
Sets a List of Gene Names. This clobbers the old list of gene names, if it exists.

Parameters:
newNames - Vector of String Objects.

getGeneDescriptors

public String[] getGeneDescriptors()
Gets an Array of GeneDescriptors.

Returns:
Array of String Objects.

getGeneDescriptorsVector

public Vector getGeneDescriptorsVector()
Gets a Vector Gene Descriptors.

Returns:
Vector of String Objects.

setGeneDescriptors

public void setGeneDescriptors(Vector newDescripts)
Sets a List of Gene Descriptors. This clobbers the old list of gene descriptors, if it exists.

Parameters:
newDescripts - Vector of String Objects.

getConditionNames

public String[] getConditionNames()
Gets an Array of All Experimental Conditions.

Returns:
Array of String Objects.

getConditionIndex

public int getConditionIndex(String condition)
Gets the index value of the specified experimenal conditon.

Parameters:
condition - Name of experimental condition.
Returns:
index value of the specified experimenal conditon.

getGeneDescriptor

public String getGeneDescriptor(String gene)
Gets the Gene Descriptor for the specified gene.

Parameters:
gene - Gene Name.
Returns:
Gene Descriptor String.

hasSignificanceValues

public boolean hasSignificanceValues()
Indicates whether the expression data has significance values.

Returns:
true or false.

getAllMeasurements

public Vector getAllMeasurements()
Gets all Measurements.

Returns:
A Vector of Vectors. The embedded Vector contains mRNAMeasurement Objects.

getGeneNames

public String[] getGeneNames()
Gets a List of All Gene Names. Same as getGeneNamesVector(), except this method returns an Array of String Objects.

Returns:
Array of Strings.

getGeneNamesVector

public Vector getGeneNamesVector()
Gets a List of All Gene Names. Same as getGeneNames(), except this method returns a Vector of String Objects.

Returns:
Vector of String Objects.

getNumberOfConditions

public int getNumberOfConditions()
Gets Total Number of Experimental Conditions. This corresponds to the number of condition columns in the original expression data file.

Returns:
total number of experimental conditions.

getNumberOfGenes

public int getNumberOfGenes()
Gets Total Number of Genes. This corresponds to the number of rows of data in the original expression data file.

Returns:
total number of genes.

getExtremeValues

public double[][] getExtremeValues()
Returns a 2D Matrix of Extreme Values. The matrix is set to the following:

 

maxVals[0][0] = minExp; maxVals[0][1] = maxExp; maxVals[1][0] = minSig; maxVals[0][1] = maxSig;

Returns:
a 2D Matrix of double values.

getMeasurements

public Vector getMeasurements(String gene)
Gets a Vector of all Measurements associated with the specified gene.

Parameters:
gene - Gene Name.
Returns:
Vector of mRNAMeasurement Objects.

getMeasurement

public mRNAMeasurement getMeasurement(String gene,
                                      String condition)
Gets Single Measurement Value for the specified gene at the specified condition.

Parameters:
gene - Gene Name.
condition - Condition Name (corresponds to column heading in original expression data file.)
Returns:
an mRNAMeasurement Object.

copyToAttribs

public void copyToAttribs(CyAttributes nodeAttribs,
                          TaskMonitor taskMonitor)
Copies ExpressionData data structure into CyAttributes data structure.

Parameters:
nodeAttribs - Node Attributes Object.
taskMonitor - Task Monitor. Can be null.

www.cytoscape.org