Genomics and Bioinformatics Forum
Designing Your Course

  Sample Syllabi
 
 
 

Events & Conferences

  Human Genome Calendar

About Discovering Genomics, Proteomics and Bioinformatics

  About the Authors
  Table of Contents
  Description & Ordering Info

About Fundamental Concepts of Bioinformatics

  About the Authors
  Table of Contents
  Description & Ordering Info

Designing Your Course

Sample Curriculum

Proposal for An Undergraduate Bioinformatics Curriculum for Computer Scientists

Travis E. Doom, Michael L. Raymer , Dan Krane
Department of Computer Science and Engineering
Department of Biological Sciences
Wright State University,
Dayton, OH 45435-0001



This research was supported in part by Wright State University under grant #241656 and in part by the National Science Foundation under grant #EIA-0122582.

Abstract

Bioinformatics is a new and rapidly evolving discipline that has emerged from the fields of experimental molecular biology and biochemistry, and from the artificial intelligence, database, and algorithms disciplines of computer science. Largely because of the inherently interdisciplinary nature of bioinformatics research, academia has been slow to respond to strong industry and government demands for trained scientists to develop and apply novel bioinformatics techniques to the rapidly-growing, freely-available repositories of genetic and proteomic data. While some institutions are responding to this demand by establishing graduate programs in bioinformatics, the entrance barriers for these programs are high, largely due to the significant amount of prerequisite knowledge in the disparate fields of biochemistry and computer science required to author sophisticated new approaches to the analysis of bioinformatics data. We present a proposal for an undergraduate-level bioinformatics curriculum.

1. Introduction

Bioinformatics explores the functional relationships between the composition of the genes within the context of the genome and the structure and function of the proteins encoded by these genes. Because the interaction of the proteins within an organism determines metabolism, reproduction, form, and health, the implications of bioinformatics studies are far reaching. Recent advances in the experimental techniques of molecular biology have resulted in an explosive growth in the availability of molecular data. As a result, current bioinformatics research is generally focused on the representation, analysis, annotation and mining of large databases of genome sequence information. In the future, the focus will shift to a functional analysis of the proteins produced by these genes. Bioinformatics techniques promise to provide information that brings enormous power in areas ranging from disease diagnosis and treatment to evolution, agriculture and environmental science.

There is a high demand for professionals with a background in bioinformatics. The sequencing and analysis of the human genome is one of the most complex computational problems currently being studied on a world-wide scale. Computer scientists are needed to analyze, index, represent, model, display, process, mine, and search large biological databases. This need is already extensive and will continue to grow. The genomic information available at the National Center for Biotechnology Information (NCBI) currently doubles every 14 months. Industry analysts forecast that the market for genomic information alone (and the technology to use it) will reach an annual US $2 billion by 2005 [4]. In the January 2001 issue of The Scientist, it is reported that the National Institute of General Medical Sciences (NIGMS) is already having difficulty finding people from other disciplines to perform the kind of modeling and data analysis that researchers in the biological sciences now require.

The educational opportunities available to undergraduate students wishing to participate in this exciting enterprise are currently limited [3]. The development of an undergraduate curriculum in bioinformatics is essential to meeting the future needs of the nation. The development of a bioinformatics curriculum must be initiated immediately so that students can be a part of the basic research of this emerging field and immediately available to meet the workforce needs of the nation.

2. Graduate program barriers

Graduate programs in bioinformatics are beginning to emerge at several universities, including Wright State University. Entrance requirements for such programs, however, require students with a specific prerequisite program of undergraduate study that is rarely made available to students as part of an organized program. Graduate bioinformatics programs must currently accept students with undergraduate degrees in either computer science or biology and have sequences of remedial or prerequisite courses designed to complement the knowledge already acquired by the students as undergraduates. Students holding an undergraduate degree in computer science generally need to spend the majority of their first year of graduate study taking focused remedial courses in basic biochemistry, molecular biology, and genetics. Students holding an undergraduate degree in biology generally spend the majority of their first year of graduate study in coursework covering introductory computer science programming and basic data st sequence on contemporary algorithms and research techniques in bioinformatics. It is unlikely that this amount of material can be accommodated in a two-year course of study without significant preparation at the undergraduate level.

3. An undergraduate program solution

Due to the demanding entrance requirements, graduate programs alone may prove inadequate in providing the number of bioinformatics specialists that industry will require, partly because of the amount of the remedial course-work necessary. New undergraduate programs must be developed that incorporate a more specific (and shorter) biology sequence with a more focused computer science foundation. It may be necessary to redesignate some of the traditional core courses in CS, such as formal language theory, as electives to allow for an increased base of knowledge in the contemporary areas of IT knowledge (such as artificial intelligence, distributed problem solving, and data-mining).

It falls to four-year institutions to provide opportunities and direction to students to meet the market demand for bioinformatics professionals and to better prepare students for entrance into graduate-level bioinformatics programs. Implementing an academic program of study for bioinformatics is, unfortunately, complicated by its inherently inter-disciplinary nature. Programs accredited by the Computer Science Accrediation Board (CSAB) are required to include at least a two year (24 quarter hour) sequence of fundemental "core" computer science material as well as at least one year of math and one year of a laboratory science (typically physics) [2]. Biology programs typically require at least one year of study in basic chemistry. These sophomore-level courses are usually only taken after a year of study in inorganic chemistry. While an appreciation of basic chemistry concepts such as valency and electro-negativity are useful in the study of bioinformatics, we believe that an accelerated training in chemistry is sufficient and would be more accommodating to the demands of an integrated computer sciences and biology curriculum. At the same time, a streamlined exposure to introductory programming, calculus, and biology (in addition to core freshman course work) in the first two years of study is also appropriate. As bioinformaticians must be equally versed in the languages of biology and computer science, this effort will require a fundamentally interdisciplinary approach. Furthermore, basic research in the field of bioinformatics is progressing rapidly. Professionals in fields such as bioinformatics must possess not only a strong grasp of computer science fundamentals, but must also be equally comfortable in the fundamentals of biology and biochemistry to recognize and appreciate the results of their analyses.


3.1 Integration of computer science core material

Classically, computer science has focused on the study of computer hardware and software. A more contemporary view of information technology, however, must recognize that storage, transmission, and presentation of data make up a significant portion of the future demand on the discipline and on future computer professionals. This mandates a program of study emphasizing contemporary topics in databases and networking.

From the discipline of computer science, a bioinformatics professional should have knowledge of: introductory programming, data structures, AI algorithms (search, optimization, list processing, pattern recognition, etc.), databases, formal and comparative languages (complexity, and specialized algorithm topics such as those explained in [1]), modeling, and simulation, probability and statics, the WWW, visualization, and human-computer interaction (HCI) issues.

3.2 Integration of biology core material

From the discipline of biology, a bioinformatics professional should have working knowledge of at least one of several life sciences fields, including genome analysis, environmental modeling, and protein structure and function, among others. Of these many possibilities, we propose to focus on the area of molecular bioinformatics. A professional in this field of study should understand genetics, molecular and cellular biology, chemical and physical aspects of flow of genetic information from DNA to proteins, gene expression, replication, recombination, repair, and the experimental tools of molecular biology.

The amount of practical laboratory experience that should be possessed by an undergraduate bioinformatician is a point of debate. The results of DNA sequencing technology (and other in vitro and in vivo laboratory technologies) are published, annotated, and made available for analysis world-wide. The real problem is in extracting meaning from the glut of available data. Computationally generated results (in silico technologies) are becoming more prevalent in the field.

4. A bioinformatics curriculum

We now present a curriculum proposal which is in accordance with CSAB standards [2], yet incorporates specific (and short) sequences in chemistry and biology with a more focused computer science foundation. In order to meet our objectives, it was necessary to remove several traditional, but non-essential, topics from the computer science curriculum for this option. Knowledge of calculus-based physics, for instance, is not as important for students preparing for careers in bioinformatics as it is for those interested in digital signal processing. Furthermore, many of the traditional focuses of computer science (such as formal language theory) that are not required CSAB standards have been made optional to allow for an increased base of knowledge in the contemporary areas of IT knowledge.



Computer Science - Bachelor of Science
Wright State University (Total Quarter Credit Hours: 200)

I General Education Courses (42 hours)
Area One: Communication and Mathematical Skills (8 hours)

- ENG 101-4 Composition I
- ENG 102-4 Composition II (also see required math/stat below)

Area Two: The Western Experience (15 hours)

- HST 101-3 Ancient and Medieval Eras
- HST 102-3 Western World in Transition
- HST 103-3 Modern Western World
- One Great Books of the Western World Course
- One Fine and Performing Arts Courses

Area Three: The Non-Western World (6 hours)

- One Comparative Studies Course
- One Regional Studies Course

Area Four: Understanding the Contemporary World (13 hours)

- PSY 105-4 Psychology: Studies of Behavior
- SOC 200-3 Social Life
- PLS 200-3 Political Life
- EC 200-3 Economic Life (also see required biology and chemistry below)

II Departmental Requirements (86 hours)
A. Required Computer Science and Engineering Courses (47 hours)

- CS 240-4 Computer Science I
- CS 241-4 Computer Science II
- CS 242-4 Computer Science III
- CEG 260-4 Digital Computer Hardware
- CEG 320-4 Computer Organization
- CEG 360-4 Digital Systems Design
- CS 400-4 Data Structures and Software Design
- CS 405-4 Intro to Database Management Systems
- CS 409-4 Principles of Artificial Intelligence
- CS 415-3 Social Implications of Computing
- CEG 433-4 Operating Systems
- CS 480-4 Comparative Languages

B. Required Biology Courses (28 hours)

- BIO 112-4 Principles of Biology: Cell Biology and Genetics
- BIO 114-4 Organismic Biology
- BIO 115-4 Principles of Biology: Diversity and Ecology
- BIO 210-4 Molecular Biology I
- BIO 211-4 Molecular Genetics I
- BIO 302-4 Genetics and Change
- BIO 410-4 Cell-Molecular Biology Laboratory

C. Required Bioinformatics Courses (8 hours)

- BIO/CS 2xx-4 Intro to Bioinformatics
- BIO/CS 4xx-4 Computational Molecular Biology

D. Technical Communications (3 hours)
Choose one:

- EGR 335-3 Technical Communications
- BIO 310-3 Issues in Science

III Required Supporting Courses (56 hours)
Area A: Chemistry (33 hours)

- CHM 121-5 Submicroscopic Chemistry
- CHM 122-5 Macroscopic Chemistry
- CHM 123-5 Reaction Dynamics
- CHM 211/215-6 Organic Chemistry I
- CHM 212/216-6 Organic Chemistry II
- CHM 213/217-6 Organic Chemistry III

Area B: Mathematics (23 hours)

- MTH 229-5 Calculus I
- MTH 230-5 Calculus II
- MTH 253-3 Elementary Matrix Algebra
- MTH 257-3 Discrete Mathematics for Computing
- HFE 301-4 Statistics I
- MTH 407-3 Optimization Techniques

IV CS/Bio/MTH Electives (16 hours, at least 8 of which must be 400-level CS/CEG)
Choose from:

- BIO 212-4 Cell Biology I
- BIO 252-5 Microbiology
- BIO 403-5 Developmental Biology
- BIO 406-3 Evolutionary Biology
- BIO 426-4 Human Genetics
- BIO 437-6 Recombinant DNA Methods
- BIO 461-3 Molecular Evolution
- BIO 469-3 Population Genetics
- BMB 421-4 Biochemistry I
- BMB 423-4 Biochemistry II
- BMB 427-4 Biochemistry III
- CEG 255-4 Intro to the Design of Information Tech. Systems
- CEG 416-4 Matrix Computations
- CEG 434-4 Concurrent Software Design
- CEG 465-4 Interactive Systems Modeling, Analysis, and Design
- CEG 466-4 Formal Languages
- CEG 476-4 Computer Graphics I
- CEG 477-4 Computer Graphics II
- CS 470-4 Systems Simulation
- CS 458-3 Applied Graph Theory
- CS 459-3 Combinatorial Tools for Computer Science



5. Conclusion

Computer science is a path to understanding genomes just as biology helps us in understanding living organisms. It is hard to imagine a more significant area where we must hone our methods of questioning than bioinformatics. The competitive pressure and rewards for progress in bioinformatics are high, and students can use them to prepare themselves to join this sought-after work-force. The creation of an undergraduate bioinformatics option in computer science and engineering is of utmost importance for global health, the economic development of those nations undertaking this path, and the success of our students.

The central argument that we present for an undergraduate bioinformatics option within a Computer Science BS degree can be summarized as follows: (1) The number and chain of prerequisites that must be satisfied in either case requires about two years of course-work because course dependencies are such that they cannot be taken in parallel. (2) This being the case, an assumption of two years of prerequisites, in addition to the two years to obtain the MS degree, implies that it could take eight years of preparation for a student to obtain an MS degree in bioinformatics. (3) The alternative that we propose would lead to a BS degree in four years and an MS degree in the standard six year time frame. Our proposed curriculum includes, in addition to traditional computer science, biochemistry, and molecular biology components, several courses tailored specifically to meet the needs of an integrated interdisciplinary program. One such course is an undergraduate introduction to bioinformatics algorithms and methods. As this course will serve as a unifying element for the rest of the bioinformatics program, Drs. Krane and Raymer have formalized the proposed course content and are currently preparing and undergraduate bioinformatics textbook to be published by Benjamin-Cummings in December, 2002.



©2003 Addison-Wesley & Benjamin Cummings