Day 1 :
Biodiversity Research Center, Academia Sinica, Taipei, 115 Taiwan
Time : 09:30-10:15
Dr. Chun-Ping Yu received his Ph.D. degree in Physics from National Central University, Taiwan. He is currently working as a postdoctoral fellow at Biodiversity Research Center, Academia Sinica, Taiwan. His research interests include gene regulation, evolutionary genetics, and systems biology. His current work focuses on developing NGS functional genomics applications for a large-scale determination of transcription factor binding sites using bioinformatics techniques, machine learning, and artificial intelligence methods.
Transcriptomes obtained from the same tissue under different conditions can provide massive data for identifying genes differentially expressed between conditions. Moreover, transcriptomes from time‐ series experiments provide dynamic information to profile gene expressions over time. Such three dimensional (3‐D) (gene expression, condition and time) data are very useful for studying dynamic gene regulatory networks and biological processes (Note that “conditions” can be replaced by “species” or “strains” and “time series” can be replaced by “tissues” or “sources”.) However, three issues, i.e., heterogeneity of samples, unequal numbers of time points, and uneven time period lengths between studies, made it difficult to analyze the data. The first issue affects the determination of gene expression differences between conditions. The second and third issues require transformation of the original time‐series transcriptomes for cross‐condition comparisons. Although methods have been developed for analyzing 3‐D data, there is still no method to deal with all of these issues. In this study, we developed a comparative gene coexpression network (GCN) method to analyze 3‐D data. To illustrate our method, we applied it to two sets of time‐series transcriptomes of maize embryonic leaf development under the normal light/dark (LD) cycle and under total darkness (TD). As a C4 plant, maize leaves exhibit the Kranz anatomy, which is crucial for C4 photosynthesis. Since Kranz anatomy develops under both LD and TD, we applied our method to compare the two types of transcriptomes to obtain a time‐ordered light‐independent GCN. This GCN should include all regulators of Kranz anatomy development. Indeed, from this GCN we inferred and experimentally validated a number of upstream regulators of a key Kranz anatomy regulator, SHR (SHOOTROOT). In addition, we also obtained a light‐specific GCN and a darkness‐specific GCN. From these three GCNs, we inferred light‐independent, light‐ preferred and darkness‐preferred genes. Moreover, from the darkness‐specific GCN, we could also explain why embryonic leaf cells first divide faster but then more slowly under TD than under LD. As will be explained, our method can be applied to other types of data.
Academia Sinica, Taiwan
Time : 10:15-11:00
Dr. Wen-Lian Hsu 's earlier contribution was on the design of graph algorithms and he has applied similar techniques to tackle computational problems in biology and natural language. In 1993, he developed a Chinese input software, GOING, which has since revolutionized Chinese input on computer. Dr. Hsu is particularly interested in applying natural language processing techniques to understanding DNA sequences as well as protein sequences, structures and functions and also to biological literature mining. Alignment algorithm is often used to compare sequence similarity for both biological sequences and natural language texts. Since biological sequences are exceptionally long and abundant, speed is the major concern. On the other hand, for natural language text, discovering similar phrases, sentences, and paragraphs are of utmost importance. Dr. Hsu has designed ultra-efficient alignment algorithms for biological sequences, flexible approximate matching and clustering algorithms for natural language text.
We present an ultra-efficient global alignment algorithm for comparing similar genomes, and for read mapping in next generation sequencing (NGS), which can process long reads as fast as short reads. Furthermore, it can tolerate much higher error rates. Our parallel read mapping algorithm, KART, is 3 to 10 times faster than the well-known Bowtie2 and BWA-MEM algorithm. On pairwise alignment of human genome sequences, the extended KART is 260 times faster than current methods. The same idea has also been applied to RNA-seq, producing DART, a quick and accurate mapping algorithm. These two results are published in Bioinformatics and the source codes can be downloaded: (1) Kart: a divide-and-conquer algorithm for NGS read alignment: (2) DART: a fast and accurate RNA-seq mapper with a partitioning strategy Besides getting high quality alignment efficiently, our algorithm can simultaneously perform variant calling in about the same amount of time.
To achieve the abovementioned objectives, we design a divide-and-conquer alignment strategy: giving a query sequence P and one reference sequence Q; identifying all locally maximal exact matches as simple region pairs in sequence Q with sequence P, and then clustering the simple region pairs (simple pairs) according to their coordinates in the database to form the bases of global alignment; and fixing the overlaps between adjacent simple region pairs and then filling gaps between adjacent simple region pairs by inserting normal region pair (normal pairs) to produce a complete alignment. The crux of the algorithm is that simple pairs can be aligned in linear time, and all simple pairs and normal pairs can be aligned independently and in parallel. After dividing the query sequence P sufficiently, those pairs that require gapped alignment only have an average length of 21.