AnnoTrack: Rails System

January 31st, 2011

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

Please read elsewhere about general Ruby or Rails questions, there are blog entries about Ruby & Rails Terminology, Rails application layout

The AnnoTrack Ruby-on-Rails code can be found in svn/gencode/tracking_system/rails/. Most AnnoTrack-specific code is stored as "plugin" code in the Redmine directory. This means when trying to find a specific piece of code, you have to check the default application directory app, but also the plugin directory

vendor/plugins/redmine_annotrack/app. The language files defining the terminology and browser links used on the websites are

svn/gencode/tracking_system/rails/lang/en.yml and


In these files an entry like


label_project_new: New Gene

means "if you come across the term label_project_new, display it as New Gene in the browser".

To understand the code underlying specific web pages it is helpful to check the routing entries in

config/routes.rb and vendor/plugins/redmine_annotrack/routes.rb. Specific paths in the browser are mapped to specific functions in the rails code. E.g.:


map.connect 'flags/show_tecs', :controller => 'flags', :action => 'show_tecs'

maps the URL to

the show_tecs function in the file app/controllers/flags_controller.rb.

The list of chromosomes used as well as the different priority values are set on this page.

Some options for links on the transcript pages etc. can be changed through the administration interface.

These previous actions require administrator user rights in the AnnoTrack system. The list of different user right for all groups is shown here.

The documentation pages can be edited with a wiki-style syntax by clicking on the edit pencil on each page.

AnnoTrack: General Documentation

January 31st, 2011

Setting up a new system & adjusting it to your needs

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

The system is flexible enough to be of use for other groups and projects performing genome annotation in a collaborative effort and is therefor provided here. These are notes on how to start a new annotation project with AnnoTrack.

General Redmine installation notes for troubleshooting are here, but all the sourcecode required for AnnoTrack is available here.

Most of the AnnoTrack code is written as a plugin for the Redmine system (rails/vendor/plugins/redmine_annotrack), but since there are some other changes required, which override Redmine's default code, you will need the complete package from this site.

General notes

you will need

  1. a database server (e.g. mysql 5)
  2. ruby on rails installation

    source and help on the official rails page documention for running on Mac OsX (usually pre-installed)

  3. a web server (e.g. Apache) when running in production mode, for testing, the Webbrick server supplied with Rails is fine.
  4. get the AnnoTrack source code and database from this page



    tar xzvf annotrack.version.tgz


create your database


mysql -u<user> -p<password> -h<host> -P<port> -e"create database annotrack"
  mysql -u<user> -p<password> -h<host> -P<port> -Dannotrack < annotrack/database.sql

The main tables fo the database are outlined in this diagram.

Rails server

  • we have frozen the additional external Rails modules used by the application (gems) into the AnnoTrack rails code (rails/vendor/rails/) so you don't necessarily need to install all of them separately.
  • set your environment variables GEM_PATH and RAILS_ENV in your shell or in the file annotrack/rails/config/environment.rb
  • adjust the database configurations file in annotrack/rails/config/database.yml with your settings (production and development if desired)

    additional environments can be created (e.g. for multiple organisms) by adding an entry (e.g. "production_housemouse") and a file in environments (e.g. environments/production_mouse.rb)

  • start the server e.g. on port 6223:


    cd annotrack/rails
    ruby scripts/server -edevelopment -p6223 #(to use the development setup)
  • In a web browser your application will usually be at http://localhost:6223/. Log in as administrator ("admin"/"admin") to set up some initial values.

    The admin interface from Redmine is at DEFAULT_URL/admin, modifications should in particular be made on these pages:

    1. Settings: "Application title", "Welcome text", "Host name"
    2. AnnoTrack settings: "Menu links", "Browsers links", "other settings"

      vendor/plugins/redmine_annotrack/lang/en.yml holds the URL patterns used for browser links.

    3. Flags: define new flags to highlight errors
    4. Users: create & adjust user accounts
  • we have stored a gene with two transcript with two flags for demonstration;

    you can see these by clicking on "Transcripts" at the top of the page and then selecting "View all transcripts".

  • you can create a new gene-level entry manually at DEFAULT_URL/projects/add for testing, in general these will be created by scripts writing directly to the database.

Perl API/scripts

  • You can adjust the settings for your system in the central file.
  • We use the scripts/ file the run automatic updates of the core annotation, to update the stats given on the front page (issue and flag counts), please adjust this to your needs

    Some Perl programming knowledge is required to adjust / write parsers to handle the specific data you will be using.

  • The following additional perl modules (many of which are part of a standard installation) are required to use the AnnoTrack perl API:

    • Bio::Das::Lite
    • MIME::Lite
    • DBI
    • Getopt::Long
    • UNIVERSAL::require
    • Bio::EnsEMBL::DBSQL::DBAdaptor (when accessing Ensembl-style databases)
  • most probably you will have to adjust the source-specific scripts used for data loading and analysis stored in annotrack/perl/modules/annotrack

further hints

  • New genes/transcripts, categories and flags would usually be created by script access. There are functions for all this functionality which is documented "here":/human/docs/core
  • This (/documents/show/8) is a basic *source adaptor* reading data from a tab-delimited file to demonstrate how the modules work.
  • This (/documents/show/10) is an example *source adaptor* to demonstrate a module reading from a database with DBI.
  • Here (/human/docs/core) is the Perl-doc of the AnnoTrack core module.

Further adjustments

to customize the system for your own set-up there are a number of files you can modify:

  • rails/app/views/layouts/base.rhtml: Start page layout
  • rails/vendor/plugins/redmine_annotrack/lang/en.yml: Names and paths to browsers and project-related links
  • we are using a Lucene-based search engine for AnnoTrack, there is a switch option between this and the Redmine-internal search enginge on the Administration/Settings/Annotrack page
  • the scripts annotrack/perl/scripts/ and annotrack/perl/scripts/ should be adjusted with your environment and run regulary (nightly) to update annotation data, update counts and optimize tables.
  • many "helper variables" are stored in the tmp_values table. Have a look there if stats etc. are not displayed as expected.


General notes on upgrading existing Redmine installations are here.

AnnoTrack: Web-Server

January 27th, 2011

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

AnnoTrack is a Ruby-On-Rails application with is executed by an Apache2 server with the mod-rails (Passenger) plugin. It is living on virtual machines (VM) where we don't run any other services as rails does not play nice with other web-services.

James Smith (webteam) knows most about this, Tim Cutts & Dave Holland (infrastructure management) can help with the VMs.

Access restrictions apply to connect to all the following services and the superuser rights.

There is a test environment on the VM web-annotrack, the production servers are running on two VM clones web-annotrack1 and web-annotrack2. All can be accessed directly with SSH:


ssh web-annotrack
cd /var/www/annotrack-app

The different species have their own AnnoTrack/Redmine code installations as there does not seem to be another way to have them running in parallel otherwise:

annotrack-app == human

annotrack-app-mouse == mouse

annotrack-app-zfish == zebrafish

Rails/Passenger requires symbolic links from the root-level to the public folder:

human -> annotrack-app/public/

The test system is visible at

The port and other specific server settings are set in the apache2/sites-available/default file.

Re-starting Rails server:


ssh web-annotrack[1,2]
sudo touch tmp/restart.txt

Re-starting entire web server:


ssh web-annotrack[1,2]
sudo apache2ctl -k graceful

Service monitoring

The VMs are monitored with vSphere (web access, Windows client available as well) and Nagios (web-annotrack 1 / 2).

The website is also checked by the Montastic monitoring service.

Submitting to EMBLdb

January 24th, 2011

To submit DNA sequences from capillary (Sanger) sequencing to the public EMBL database, these steps can be taken:

The strategy is to create one submission at the European Nucleotide Archive (ENA) @ EBI Webin submission page and attach a FASTA file with all sequences.

  1. remove low quality sequences. I my case the filter criteria were:

    • max 5 consecutive Ns
    • max 10% Ns
    • min 80bp length
  2. screen for vector contamination:

    • Use NBCI web interface for small sets
    • Use BioPerl for large set: get EMVEC file in EMBL format, convert to FASTA format file with BioPerl


      my $inseq = Bio::SeqIO->new(
            -file   => "<file.dat",
            -format => "embl" );
      my $outseq = Bio::SeqIO->new(
            -file   => ">file.fa",
            -format => "fasta" );
      while (my $seq = $inseq->next_seq) {
    • index with formatdb

      To extract sequences from a BLAST database you need an index file (for protein-dbs these files end with the extension: ".pin", for DNA dbs: ".nin"), a sequence file (".psq", ".nsq") and a header file (".phr" and ".nhr"). formatdb turns FASTA files into BLAST databases.


      formatdb -i emvec.fa -p F -o F

    • run BioPerl Blast with the sequences to be submitted against the EMVEC db:


      use Bio::Tools::Run::StandAloneBlast;
      my @blast_params = (program  => 'blastn', database => 'emvec.dat.fa');
      my $blast_hits = run_blast($seq);

      and filter out hits with very low (<0.1) eValues and long sequence hits.

  3. In my case these are submitted as ESTs. Log in to Webin, create a new submission, choose molecule type (eg.g. "EST"), add a reference publication, specify the number of sequences, describe the header (at least one field, eg. clone-identifier, must be specified to be read from the FASTA header), add common values in the small table to be added to add entries (e.g organism "Homo sapiens"), upload your FASTA file.


Sequence Contaminations

January 20th, 2011

When analysing sequences from public databases or from your own sequencer you have to be aware of potential contaminations.

A contaminated sequence is one that does not faithfully represent the genetic information from the biological source organism/organelle because it contains one or more sequence segments of foreign origin. [NCBI]

The primary approach to screening nucleic acid sequences for vector contamination is to run a sequence similarity search against a database of vector sequences. The preferred tool for conducting such a search is NCBI's VecScreen. VecScreen detects contamination by running a BLAST sequence similarity search against the UniVec vector sequence database.

An interactive web-service EMVEC Database BLAST to scan for contamination.

Help with the interpretation of the results of BLAST2 EVEC.

See also this post about submitting to EMBL db and this post about screening NGS reads locally.