Methods for

 Computational Gene Prediction
 



COURSE MATERIALS





Slides : PDF / Powerpoint

IMPORTANT: You will need some extra fonts if you download the powerpoint slides.


powerpoint
PDF
Overview of gene prediction
HMM's (part I)
HMM's (part II)
Feature Sensing
GHMM's
Comparative gene prediction
PhyloHMM's
SCFG's for noncoding gene prediction
Machine learning
Conditional Random Fields (CRF's) for gene prediction
Multiple Sequence Alignment
Higher-order PhyloHMM's

coming soon


coming soon










Addional exercises


[coming soon]



click here to suggest additional exercises








Data sets


Synthetic data:

G. simplicans data from chapter 5:

NOTE TO INSTRUCTORS:  You can generate your own synthetic data using this script.  It will generate separate training and test sets using the same codon frequencies, signal weight matrices, and GC% (these biases are randomly generate anew at each run of the program).  Exon, intron, and intergenic length distributions will be similar to those for the data sets used in the book (G. simplicans, above).


Real data:

FASTA and GFF files from various organisms (human, mouse, mosquito, rice, and others) can be found here







back