PDL: The Perl Data Language

March 1st, 2011

It doesn't always have to be R!

The Perl Data Language is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.

It also contains plotting modules.

Install with cpan install PDL or check these descriptions.

Code example for getting basic stats from a few values:

Code

use PDL;
 
 
 
my @numbers = (1,4,6,8,10);
 
my $piddle = pdl(@numbers);
 
my ($mean,$prms,$median,$min,$max,$adev,$rms) = statsover($piddle);
 
 
 
print "Mean=$mean\n".
 
      "Root-mean-square deviation=$prms\n".
 
      "Median=$median\n".
 
      "Min=$min\n".
 
      "Max=$max\n".
 
      "StdDev=$adev\n".
 
      "Population-Deviation=$rms\n\n";

Running CronJobs

February 4th, 2011

cron is a extremely useful unix utility that allows tasks to be automatically run in the background at regular intervals.

You need the script / command you want to run and the time it should run. You can the use the crontab command to edit the service:

  1. crontab -e Edit your crontab file, or create one if it doesn't already exist.
  2. crontab -l Display your crontab file.
  3. crontab -r Remove your crontab file.

Format of entries:


*     *     *   *    *        command to be executed

-     -     -   -    -

|     |     |   |    |

|     |     |   |    +----- day of week (0 - 6) (Sunday=0)

|     |     |   +------- month (1 - 12)

|     |     +--------- day of        month (1 - 31)

|     +----------- hour (0 - 23)

+------------- min (0 - 59)

Example:

00 03 * * * bash /users/fsk/backup_db.sh

This runs my backup script at 03:00 every day.

*/10 * * * * echo "job done"

This runs an echo every 10 minutes of every hour of every day.

To receive an email with any result from the jobs, add

Code

MAILTO=yourmail@home.com

to the top of the crontab. To discard any output add

Code

>/dev/null 2>&1

to the end of the job line or as the very first line (for all jobs).

Source

AnnoTrack: Rails System

January 31st, 2011

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

Please read elsewhere about general Ruby or Rails questions, there are blog entries about Ruby & Rails Terminology, Rails application layout

The AnnoTrack Ruby-on-Rails code can be found in svn/gencode/tracking_system/rails/. Most AnnoTrack-specific code is stored as "plugin" code in the Redmine directory. This means when trying to find a specific piece of code, you have to check the default application directory app, but also the plugin directory

vendor/plugins/redmine_annotrack/app. The language files defining the terminology and browser links used on the websites are

svn/gencode/tracking_system/rails/lang/en.yml and

svn/gencode/tracking_system/rails/vendor/plugins/redmine_annotrack/lang/en.yml/.

In these files an entry like

Code

label_project_new: New Gene

means "if you come across the term label_project_new, display it as New Gene in the browser".

To understand the code underlying specific web pages it is helpful to check the routing entries in

config/routes.rb and vendor/plugins/redmine_annotrack/routes.rb. Specific paths in the browser are mapped to specific functions in the rails code. E.g.:

Code

map.connect 'flags/show_tecs', :controller => 'flags', :action => 'show_tecs'

maps the URL http://annotrack.sanger.ac.uk/human/flags/show_tecs to

the show_tecs function in the file app/controllers/flags_controller.rb.

The list of chromosomes used as well as the different priority values are set on this page.

Some options for links on the transcript pages etc. can be changed through the administration interface.

These previous actions require administrator user rights in the AnnoTrack system. The list of different user right for all groups is shown here.

The documentation pages can be edited with a wiki-style syntax by clicking on the edit pencil on each page.

AnnoTrack: General Documentation

January 31st, 2011

Setting up a new system & adjusting it to your needs

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

The system is flexible enough to be of use for other groups and projects performing genome annotation in a collaborative effort and is therefor provided here. These are notes on how to start a new annotation project with AnnoTrack.

General Redmine installation notes for troubleshooting are here, but all the sourcecode required for AnnoTrack is available here.

Most of the AnnoTrack code is written as a plugin for the Redmine system (rails/vendor/plugins/redmine_annotrack), but since there are some other changes required, which override Redmine's default code, you will need the complete package from this site.

General notes

you will need

  1. a database server (e.g. mysql 5)
  2. ruby on rails installation

    source and help on the official rails page documention for running on Mac OsX (usually pre-installed)

  3. a web server (e.g. Apache) when running in production mode, for testing, the Webbrick server supplied with Rails is fine.
  4. get the AnnoTrack source code and database from this page

    unpack:

    Code

    tar xzvf annotrack.version.tgz

trong>Database

create your database

Code

mysql -u<user> -p<password> -h<host> -P<port> -e"create database annotrack"
 
  mysql -u<user> -p<password> -h<host> -P<port> -Dannotrack < annotrack/database.sql

The main tables fo the database are outlined in this diagram.

Rails server

  • we have frozen the additional external Rails modules used by the application (gems) into the AnnoTrack rails code (rails/vendor/rails/) so you don't necessarily need to install all of them separately.
  • set your environment variables GEM_PATH and RAILS_ENV in your shell or in the file annotrack/rails/config/environment.rb
  • adjust the database configurations file in annotrack/rails/config/database.yml with your settings (production and development if desired)

    additional environments can be created (e.g. for multiple organisms) by adding an entry (e.g. "production_housemouse") and a file in environments (e.g. environments/production_mouse.rb)

  • start the server e.g. on port 6223:

    Code

    cd annotrack/rails
     
    ruby scripts/server -edevelopment -p6223 #(to use the development setup)
  • In a web browser your application will usually be at http://localhost:6223/. Log in as administrator ("admin"/"admin") to set up some initial values.

    The admin interface from Redmine is at DEFAULT_URL/admin, modifications should in particular be made on these pages:

    1. Settings: "Application title", "Welcome text", "Host name"
    2. AnnoTrack settings: "Menu links", "Browsers links", "other settings"

      vendor/plugins/redmine_annotrack/lang/en.yml holds the URL patterns used for browser links.

    3. Flags: define new flags to highlight errors
    4. Users: create & adjust user accounts
  • we have stored a gene with two transcript with two flags for demonstration;

    you can see these by clicking on "Transcripts" at the top of the page and then selecting "View all transcripts".

  • you can create a new gene-level entry manually at DEFAULT_URL/projects/add for testing, in general these will be created by scripts writing directly to the database.

Perl API/scripts

  • You can adjust the settings for your system in the central config.pm file.
  • We use the scripts/cron_jobs.pl file the run automatic updates of the core annotation, to update the stats given on the front page (issue and flag counts), please adjust this to your needs

    Some Perl programming knowledge is required to adjust / write parsers to handle the specific data you will be using.

  • The following additional perl modules (many of which are part of a standard installation) are required to use the AnnoTrack perl API:

    • Bio::Das::Lite
    • MIME::Lite
    • DBI
    • Getopt::Long
    • UNIVERSAL::require
    • Bio::EnsEMBL::DBSQL::DBAdaptor (when accessing Ensembl-style databases)
  • most probably you will have to adjust the source-specific scripts used for data loading and analysis stored in annotrack/perl/modules/annotrack

further hints

  • New genes/transcripts, categories and flags would usually be created by script access. There are functions for all this functionality which is documented "here":/human/docs/core
  • This (/documents/show/8) is a basic *source adaptor* reading data from a tab-delimited file to demonstrate how the modules work.
  • This (/documents/show/10) is an example *source adaptor* to demonstrate a module reading from a database with DBI.
  • Here (/human/docs/core) is the Perl-doc of the AnnoTrack core module.

Further adjustments

to customize the system for your own set-up there are a number of files you can modify:

  • rails/app/views/layouts/base.rhtml: Start page layout
  • rails/vendor/plugins/redmine_annotrack/lang/en.yml: Names and paths to browsers and project-related links
  • we are using a Lucene-based search engine for AnnoTrack, there is a switch option between this and the Redmine-internal search enginge on the Administration/Settings/Annotrack page
  • the scripts annotrack/perl/scripts/cron_jobs.pl.example and annotrack/perl/scripts/cron_queries.sh.example should be adjusted with your environment and run regulary (nightly) to update annotation data, update counts and optimize tables.
  • many "helper variables" are stored in the tmp_values table. Have a look there if stats etc. are not displayed as expected.

Upgrading

General notes on upgrading existing Redmine installations are here.

AnnoTrack: Web-Server

January 27th, 2011

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

AnnoTrack is a Ruby-On-Rails application with is executed by an Apache2 server with the mod-rails (Passenger) plugin. It is living on virtual machines (VM) where we don't run any other services as rails does not play nice with other web-services.

James Smith (webteam) knows most about this, Tim Cutts & Dave Holland (infrastructure management) can help with the VMs.

Access restrictions apply to connect to all the following services and the superuser rights.

There is a test environment on the VM web-annotrack, the production servers are running on two VM clones web-annotrack1 and web-annotrack2. All can be accessed directly with SSH:

Code

ssh web-annotrack
 
cd /var/www/annotrack-app

The different species have their own AnnoTrack/Redmine code installations as there does not seem to be another way to have them running in parallel otherwise:

annotrack-app == human

annotrack-app-mouse == mouse

annotrack-app-zfish == zebrafish

Rails/Passenger requires symbolic links from the root-level to the public folder:

human -> annotrack-app/public/

The test system is visible at http://web-annotrack.internal.sanger.ac.uk:8000

The port and other specific server settings are set in the apache2/sites-available/default file.

Re-starting Rails server:

Code

ssh web-annotrack[1,2]
 
sudo touch tmp/restart.txt

Re-starting entire web server:

Code

ssh web-annotrack[1,2]
 
sudo apache2ctl -k graceful

Service monitoring

The VMs are monitored with vSphere (web access, Windows client available as well) and Nagios (web-annotrack 1 / 2).

The website is also checked by the Montastic monitoring service.