Software Tools :: Information Engineering in Biology
Programming Languages
The following platform-independent languages are particularly useful
for bioinformatics work:
- Perl ("Practical Extraction
and Report Language") is a scripting language that was originally
designed to combine into a single language the capabilities of UNIX
shell scripts and UNIX utilities such as grep, sed, and awk. Perl is
particularly powerful at parsing regular expressions. Its ubiquity
as a web scripting language has earned it the nickname "duct tape of
the Internet." The standard distribution includes many useful
modules, and if you need functionality beyond this, browse the
Comprehensive Perl Archive Network
repository to find additional modules.
- Python is an interpreted
object-oriented language. Its simple syntax makes it easy to learn
and easy to maintain large programs. Like Perl, it owes much of its
power to a large set of standard libraries, and it can be used for
scripting and web applications. It is also useful for prototyping
because of its quick development cycle. A number of add-on graphics
libraries allow it to be used for creating GUI-based
applications.
- Java is an object-oriented
language that is becoming popular due to its degree of
platform-independence. Instead of compiling programs into a specific
machine language, the Java compiler compiles programs into
"byte-code," which any computer possessing a Java Virtual Machine
(JVM) can run.
Perl and Python are free and available for all commonly used
platforms, including the Macintosh (see the MacPerl and MacPython
webpages). Perl is distributed under the GNU public license
(GPL). The Python
license is GPL-compatible.
Bio Projects
- BioPerl, BioPython, and BioJava projects have been
established to create bioinformatics libraries in these
languages. Code is available for parsing database formats and
database outputs, manipulating sequences, etc.
- BioCORBA aims to provide
an object-oriented, language neutral, platform-independent method
for describing and solving bioinformatic problems. It strives to
leverage the code of the other Bio projects in a simple and easy to
use fashion. For example, a language-neutral environment allows
users to write programs using BioPython and access BioPerl modules
through the CORBA server.
Data Annotation Formats
A number of groups are working to establish a common data annotation formats for biological data:
- BioDAS is a project that
aims to develop an open source Distributed Annotation System for
exchanging annotations on genomic sequence data in a distributed
computing environment.
- BioXML's goal is to gather
XML (eXtensible Markup Language) documentation, DTDs and tools for
biology in one central location. It overlaps in interest and in
tools with several other open source projects, including BioPerl,
BioJava, BioPython, BioCORBA and BioDAS.
- Among the XML formats for managing, visualizing and sharing
annotations of genomic sequences are AGAVE (Architecture
for Genomic Annotation, Visualization and Exchange), BSML
(Bioinformatic Sequence Markup Language), and GAME(Genome Annotation
Markup Elements).
Other Projects
- Gene Ontology
Consortium. The goal of this project is to produce a dynamic
controlled vocabulary that can be applied to all eukaryotes.
This website will look much better in a browser that supports
web standards, but it has been designed so
that it is still usable and accessible to any browser or web-enabled device.
|