Software Tools :: Information Engineering in Biology

Programming Languages

The following platform-independent languages are particularly useful for bioinformatics work:

  • Perl ("Practical Extraction and Report Language") is a scripting language that was originally designed to combine into a single language the capabilities of UNIX shell scripts and UNIX utilities such as grep, sed, and awk. Perl is particularly powerful at parsing regular expressions. Its ubiquity as a web scripting language has earned it the nickname "duct tape of the Internet." The standard distribution includes many useful modules, and if you need functionality beyond this, browse the Comprehensive Perl Archive Network repository to find additional modules.
  • Python is an interpreted object-oriented language. Its simple syntax makes it easy to learn and easy to maintain large programs. Like Perl, it owes much of its power to a large set of standard libraries, and it can be used for scripting and web applications. It is also useful for prototyping because of its quick development cycle. A number of add-on graphics libraries allow it to be used for creating GUI-based applications.
  • Java is an object-oriented language that is becoming popular due to its degree of platform-independence. Instead of compiling programs into a specific machine language, the Java compiler compiles programs into "byte-code," which any computer possessing a Java Virtual Machine (JVM) can run.

Perl and Python are free and available for all commonly used platforms, including the Macintosh (see the MacPerl and MacPython webpages). Perl is distributed under the GNU public license (GPL). The Python license is GPL-compatible.

Bio Projects

  • BioPerl, BioPython, and BioJava projects have been established to create bioinformatics libraries in these languages. Code is available for parsing database formats and database outputs, manipulating sequences, etc.
  • BioCORBA aims to provide an object-oriented, language neutral, platform-independent method for describing and solving bioinformatic problems. It strives to leverage the code of the other Bio projects in a simple and easy to use fashion. For example, a language-neutral environment allows users to write programs using BioPython and access BioPerl modules through the CORBA server.

Data Annotation Formats

A number of groups are working to establish a common data annotation formats for biological data:

  • BioDAS is a project that aims to develop an open source Distributed Annotation System for exchanging annotations on genomic sequence data in a distributed computing environment.
  • BioXML's goal is to gather XML (eXtensible Markup Language) documentation, DTDs and tools for biology in one central location. It overlaps in interest and in tools with several other open source projects, including BioPerl, BioJava, BioPython, BioCORBA and BioDAS.
  • Among the XML formats for managing, visualizing and sharing annotations of genomic sequences are AGAVE (Architecture for Genomic Annotation, Visualization and Exchange), BSML (Bioinformatic Sequence Markup Language), and GAME(Genome Annotation Markup Elements).

Other Projects

  • Gene Ontology Consortium. The goal of this project is to produce a dynamic controlled vocabulary that can be applied to all eukaryotes.

This website will look much better in a browser that supports web standards, but it has been designed so that it is still usable and accessible to any browser or web-enabled device.