CLC number: TP391.1; R394.3
On-line Access: 2011-04-11
Received: 2010-04-11
Revision Accepted: 2010-07-05
Crosschecked: 2011-01-31
Cited: 0
Clicked: 7854
Xiao-hong Mao, Jing-hua Fu, Wei Chen, Qian You, Shiao-fen Fang, Qun-sheng Peng. Structural visualization of sequential DNA data[J]. Journal of Zhejiang University Science C, 2011, 12(4): 263-272.
@article{title="Structural visualization of sequential DNA data",
author="Xiao-hong Mao, Jing-hua Fu, Wei Chen, Qian You, Shiao-fen Fang, Qun-sheng Peng",
journal="Journal of Zhejiang University Science C",
volume="12",
number="4",
pages="263-272",
year="2011",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1000091"
}
%0 Journal Article
%T Structural visualization of sequential DNA data
%A Xiao-hong Mao
%A Jing-hua Fu
%A Wei Chen
%A Qian You
%A Shiao-fen Fang
%A Qun-sheng Peng
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 4
%P 263-272
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1000091
TY - JOUR
T1 - Structural visualization of sequential DNA data
A1 - Xiao-hong Mao
A1 - Jing-hua Fu
A1 - Wei Chen
A1 - Qian You
A1 - Shiao-fen Fang
A1 - Qun-sheng Peng
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 4
SP - 263
EP - 272
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1000091
Abstract: To date, comparing and visualizing genome sequences remain challenging due to the large genome size. Existing approaches take advantage of the stable property of oligonucleotides and exhibit the main characteristics of the whole genome, yet they commonly fail to show progression patterns of the genome adjustably. This paper presents a novel visual encoding technique, which not only supports the binning process (phylogenetic analysis), but also allows the sequential analysis of the genome. The key idea is to regard the combination of each k-nucleotide and its reverse complement as a visual word, and to represent a long genome sequence with a list of local statistical feature vectors derived from the local frequency of the visual words. Experimental results on a variety of examples demonstrate that the presented approach has the ability to quickly and intuitively visualize DNA sequences, and to help the user identify regions of differences among multiple datasets.
[1]Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., 2008. Motion overview of human action. ACM Trans. Graph., 27(5):480-489.
[2]Blei, D.M., Lafferty, J.D., 2006. Dynamic Topic Models. Proc. 23rd Int. Conf. on Machine Learning, p.113-120.
[3]Blei, D.M., Lafferty, J.D., 2007. Modeling Science. Available from http://www.cs.cmu.edu/~lemur/science
[4]Borg, I., Groenen, P., 2003. Modern multidimensional scaling: theory and applications. J. Educat. Meas., 40(3):277-280.
[5]Bourque, G., Pevzner, P.A., 2002. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res., 12(1):26-36.
[6]Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B., 1999. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol., 16:1391-1399.
[7]Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., 1998. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863-14868.
[8]Fortuna, B., Grobelnik, M., Mladenic, D., 2005. Visualization of text document corpus. Informatica, 29:497-502.
[9]Goldman, D.B., Curless, B., Seitz, S.M., Salesion, D., 2006. Schematic storyboarding for video visualization and editing. ACM Trans. Graph., 25(3):862-871.
[10]Grundy, E., Jones, M.W., Laramee, R.S., Wilson, R.P., Shepard, E.L.C., 2009. Visualisation of sensor data from animal movement. Comput. Graph. Forum, 28(3):815-822.
[11]Hallin, P., Binnewies, T., Ussery, D., 2008. The genome blastatlas—a genewiz extension for visualization of whole-genome homology. Mol. BioSyst., 4(5):363.
[12]Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning: data mining, inference and prediction. Math. Intell., 27(2):83-85.
[13]Havre, S., Hetzler, E., Perrine, K., Jurrus, E., Miller, N., 2001. Interactive Visualization of Multiple Query Results. Proc. IEEE Information Visualization, p.105-112.
[14]Herniou, E., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Copy, J.S., O′Reilly, D.R., 2001. Use of whole genome sequence data to infer baculovirus phylogeny. J. Virol., 75(17):8117-8126.
[15]Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet., 11(7):283-290.
[16]Karlin, S., Zhu, Z., Karlin, K.D., 1997. The extended environment of mononuclear metal centers in protein structures. PNAS, 94(26):14225-14230.
[17]Karlin, S., Brocchieri, L., Mrazek, J., Campbell, A.M., Spormann, A.M., 1999. A chimeric prokaryotic ancestry of mitochondria and primitive eukaryote. PNAS, 96(16):9190-9195.
[18]Lu, A., Shen, H., 2008. Interactive Storyboard for Overall Time-Varying Data Visualization. IEEE Pacific Visual- ization Symp., p.143-150.
[19]Mao, Y., Dillon, J., Lebanon, G., 2007. Sequential document visualization. IEEE Trans. Visual. Comput. Graph., 13(6):1208-1215.
[20]Meyer, M., Munzner, T., Pfister, H., 2009. MizBee: a multiscale synteny browser. IEEE Trans. Visual. Comput. Graph., 15(6):897-904.
[21]Savva, G., Dicks, J., Roberts, I.N., 2003. Current approaches to whole genome phylogenetic analysis. Brief. Bioinform., 4(1):63-74.
[22]Schbath, S., Prum, B., de Turckheim, E., 1995. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol., 2(3):417-437.
[23]Shah, N., Dillard, S.E., Weber, G.H., Hamann, B., 2004. Volume Visualization of Multiple Alignment of Large Genomic {DNA}. Springer-Verlag, p.325-342.
[24]Trifonov, E.N., Sussman, J.L., 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. PNAS, 77(7):3816-3820.
[25]Zhou, F., Olman, V., Xu, Y., 2008. Barcodes for genomes and applications. BMC Bioinform., 9:546.
Open peer comments: Debate/Discuss/Question/Opinion
<1>