Saturday, January 20, 2007
Learning the language of gene expression
Researchers have taken a major step towards understanding the language of gene regulation in the fruitfly Drosophila and they expect the technique to be rapidly applicable to understanding the effects of genome variation in humans.
The new research, published today in PLoS Computational Biology, is a major advance in using computers to detect the regions in DNA that control the activity of genes. Studies on single genes have shown that variation in gene regulation can be important in disease. The new program, called NestedMICA, allows researchers to find many regulatory regions, which will become a new focus for disease understanding.
The team, from the Wellcome Trust Sanger Institute and The University of Manchester, took slices of genome sequence from next to each Drosophila gene - where the highest concentration of regulatory signals are thought to lie - and fed them into the new computer program that looks for patterns shared between the sequences. The search process is similar to looking for words in a sentence where the vocabulary of the language is unknown.
"Most words in the language of gene regulation can be spelled more than one way," explained Dr Thomas Down, first author on the report. "In English, you might see people writing either 'analyse' or 'analyze'. In genomes, such variation - or even bigger differences - seems to be normal.
"So we can't just count words, we need to recognize alternative spellings."
The team, which includes Dr Casey Bergman from Manchester's Faculty of Life Sciences, has so far found 120 'words' - distinct examples of regions that might regulate genes. About 30 of these were known from many years of studying how individual Drosophila genes are controlled, but most are novel. This is a major step towards understanding the language of gene regulation in an important model organism, and proof of principle of a new technology that will speed the study of regulatory elements in the human genome. Drosophila is a well-studied organism and shares 48% of its 14,000 genes with humans.
Research emerging in the past few months suggests that variation in the sequence of regulatory regions will affect susceptibility to many diseases. A few cases are already known - one form of thalassaemia is caused by a regulatory sequence variant - but knowledge of regulatory elements in the human genome is limited: scientists have only scratched the surface.
Systematic annotation of regulatory regions in the human genome will be very important if researchers are going to understand the effects of all sequence variation.
Dr Tim Hubbard, senior author on the report explained: "While others have tried to identify these control regions before, they have had to try to align lots of sequences. Our new method doesn't depend on alignment, an advantage because the new program is robust to rapidly evolving sequences.
"The new method also doesn't require prior knowledge from, say, looking at known examples, and can search for hundreds of different motifs at once."
As science should, the work makes predictions that the team is testing. Using a set of excellent, publicly available data on gene activity from the University of California-Berkeley and Lawrence Berkeley National Laboratory, they have predicted what some of the newly discovered sequences might mean in the language of gene regulation.
Computer analysis can accelerate the search for important regions in genomes, but the authors emphasize that computer predictions must always be examined experimentally. The findings in Drosophila by the new program have been validated by examining findings against results from experimental imaging.
The results of the research, a set of Drosophila sequence motifs, are freely available from a database at the Sanger Institute. Like many tools developed at the Sanger Institute, NestedMICA is open source software, freely available for anyone to download, run and modify. (Source: University of Manchester)
Based on the open access/free paper:
Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster
Thomas A. Down, Casey M. Bergman, Jing Su1, Tim J. P. Hubbard
A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.
...Functional binding sites are likely to be subject to purifying selection and thus should exhibit a reduced rate of sequence evolution. This is based both on the observation of increased levels of conservation in known TFBSs relative to their background sequences and the intuition that losing elements responsible for gene regulation may often be deleterious. Of course this does not mean that all regulatory elements are under strict purifying selection, and indeed there are good examples of divergence in regulatory element function, as well as conservation of regulatory function with underlying binding site turnover at the sequence level. Nevertheless, increased conservation of predicted TFBSs provides evidence for functional constraint.
To test whether motifs in our set show signatures of evolutionary constraint among Drosophila species, we studied patterns of motif conservation in a large set of orthologous non protein-coding alignments...
1) From the journal Molecular Biology and Evolution:
Common Pattern of Evolution of Gene Expression Level and Protein Sequence in Drosophila
Sequence divergence scaled by variation within species has been used to infer the action of selection upon individual genes. Applying this approach to expression, we compared whole-genome whole-body RNA levels in 10 heterozygous Drosophila simulans genotypes and a pooled sample of 10 D. melanogaster lines using Affymetrix Genechip. For 972 genes expressed in D. melanogaster, the transcript level was below detection threshold in D. simulans, which may be explained either by sequence divergence between the primers on the chip and the mRNA transcripts or by down-regulation of these genes. Out of 6,707 genes that were expressed in both species, transcript level was significantly different between species for 534 genes (at P less than 0.001). Genes whose expression is under stabilizing selection should exhibit reduced genetic variation within species and reduced divergence between species. Expression of genes under directional selection in D. simulans should be highly divergent from D. melanogaster, while showing low genetic variation in D. simulans. Finally, the genes with large variation within species but modest divergence between species are candidates for balancing selection. Rapidly diverging, low-polymorphism genes included those involved in reproduction (e.g., Mst 3Ba, 98Cb; Acps 26Aa, 63F; and sperm-specific dynein). Genes with high variation in transcript abundance within species included metallothionein and hairless, both hypothesized to be segregating in nature because of gene-by-environment interactions. Further, we compared expression divergence and DNA substitution rate in 195 genes. Synonymous substitution rate and expression divergences were uncorrelated, whereas there was a significant positive correlation between nonsynonymous substitution rate and expression divergence. We hypothesize that as a substantial fraction of nonsynonymous divergence has been shown to be adaptive, much of the observed expression divergence is likewise adaptive.
2) From the journal Science:
Sex-Dependent Gene Expression and Evolution of the Drosophila Transcriptome
Comparison of the gene-expression profiles between adults of Drosophila melanogaster and Drosophila simulans has uncovered the evolution of genes that exhibit sex-dependent regulation. Approximately half the genes showed differences in expression between the species, and among these, approximately 83% involved a gain, loss, increase, decrease, or reversal of sex-biased expression. Most of the interspecific differences in messenger RNA abundance affect male-biased genes. Genes that differ in expression between the species showed functional clustering only if they were sex-biased. Our results suggest that sex-dependent selection may drive changes in expression of many of the most rapidly evolving genes in the Drosophila transcriptome.
Recent posts include:
Technorati: learning, language, gene, expression, drosophila, genome, plos, computational, biology, dna, genes, variation, regulation, regulatory, sequence, signals, computer, program, sequences, life, science, analysis, melanogaster, transcription, factor, database, motifs, species, evolution, sex, fly, evolutionary, theory, genetic