PAR regions

April 6th, 2011

The pseudo-autosomal regions are homologous DNA sequences on the (human) X and Y chromosomes (see wikipedia for more). They allow the pairing and crossing-over of these sex chromosomes the same way the autosomal chromosomes do during meiosis. As these genomic regions are identical between X and Y, they are oftentimes only stored once.

To pull out the coordinates of the pseudo-autosomal regions (PAR) from the Ensembl database, you can perform the following query on the Ensembl core database:

Code

select (select sr.name from seq_region sr where sr.seq_region_id=ae.seq_region_id) as chrom_1, ae.seq_region_start as start_1, ae.seq_region_end as end_1, (select sr.name from seq_region sr where sr.seq_region_id=ae.exc_seq_region_id) as chrom_2, ae.exc_seq_region_start as start_2, ae.exc_seq_region_end as end_2 from assembly_exception ae where ae.exc_type="PAR";

For the human database schema 61 (assembly GRCh37/hg19) you will get where the corresponding region is located:

+---------+----------+----------+---------+-----------+-----------+
| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |
+---------+----------+----------+---------+-----------+-----------+
| Y       |    10001 |  2649520 | X       |     60001 |   2699520 |
| Y       | 59034050 | 59373566 | X       | 154931044 | 155270560 |
+---------+----------+----------+---------+-----------+-----------+

For the old assembly (NCBI36/hg18) you will get:

+---------+----------+----------+---------+-----------+-----------+
| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |
+---------+----------+----------+---------+-----------+-----------+
| Y       |        1 |  2709520 | X       |         1 |   2709520 |
| Y       | 57443438 | 57772954 | X       | 154584238 | 154913754 |
+---------+----------+----------+---------+-----------+-----------+

You can alternatively use the API:

Code

my $aefa = $db->get_AssemblyExceptionFeatureAdaptor();
my $sa   = $db->get_SliceAdaptor;
my $slice = $sa->fetch_by_region("chromosome", "Y");
my @aefs = @{$aefa->fetch_all_by_Slice($slice)};
foreach my $ae (@aefs){
  print $ae->display_id."\t".$ae->start."\t".$ae->end."\n";
}
X	10001	2649520
X	59034050	59373566

or for X:

Y	60001	2699520
Y	154931044	155270560

So to translate from Y to X PAR locations you can use the following for GRCh37 / hg19:

Y 10001 - 2649520      <->  X 60001 - 2699520, band Xp22.33
Y 59034050 - 59373566  <->  X 154931044 - 155270560, band Xq28

and for NCBI36 / hg18:

Y 1 - 2709520          <-> X  1 - 2709520, band Xp22.33
Y 57443438 - 57772954  <-> X  154584238 - 154913754, band Xq28

Please note that these coordinates do not agree with the definitions at the GRC and NCBI. This difference of the PAR-2 end coordinates (chrX:155.260.560 / 155.270.560 or chrY:59.363.566 / 59.373.566) is caused by the 10kb telomeric (gap) region which needs to be included in the PAR-2 definition to correctly represent this arrangement.

See also the telomere & centromer definition notes.

A nice list of official HGNC genes that are located in the pseudo-autosomal regions can be found here.

PDL: The Perl Data Language

March 1st, 2011

It doesn't always have to be R!

The Perl Data Language is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.

It also contains plotting modules.

Install with cpan install PDL or check these descriptions.

Code example for getting basic stats from a few values:

Code

use PDL;
 
 
 
my @numbers = (1,4,6,8,10);
 
my $piddle = pdl(@numbers);
 
my ($mean,$prms,$median,$min,$max,$adev,$rms) = statsover($piddle);
 
 
 
print "Mean=$mean\n".
 
      "Root-mean-square deviation=$prms\n".
 
      "Median=$median\n".
 
      "Min=$min\n".
 
      "Max=$max\n".
 
      "StdDev=$adev\n".
 
      "Population-Deviation=$rms\n\n";

Running CronJobs

February 4th, 2011

cron is a extremely useful unix utility that allows tasks to be automatically run in the background at regular intervals.

You need the script / command you want to run and the time it should run. You can the use the crontab command to edit the service:

  1. crontab -e Edit your crontab file, or create one if it doesn't already exist.
  2. crontab -l Display your crontab file.
  3. crontab -r Remove your crontab file.

Format of entries:


*     *     *   *    *        command to be executed

-     -     -   -    -

|     |     |   |    |

|     |     |   |    +----- day of week (0 - 6) (Sunday=0)

|     |     |   +------- month (1 - 12)

|     |     +--------- day of        month (1 - 31)

|     +----------- hour (0 - 23)

+------------- min (0 - 59)

Example:

00 03 * * * bash /users/fsk/backup_db.sh

This runs my backup script at 03:00 every day.

*/10 * * * * echo "job done"

This runs an echo every 10 minutes of every hour of every day.

To receive an email with any result from the jobs, add

Code

MAILTO=yourmail@home.com

to the top of the crontab. To discard any output add

Code

>/dev/null 2>&1

to the end of the job line or as the very first line (for all jobs).

Source

AnnoTrack: Rails System

January 31st, 2011

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

Please read elsewhere about general Ruby or Rails questions, there are blog entries about Ruby & Rails Terminology, Rails application layout

The AnnoTrack Ruby-on-Rails code can be found in svn/gencode/tracking_system/rails/. Most AnnoTrack-specific code is stored as "plugin" code in the Redmine directory. This means when trying to find a specific piece of code, you have to check the default application directory app, but also the plugin directory

vendor/plugins/redmine_annotrack/app. The language files defining the terminology and browser links used on the websites are

svn/gencode/tracking_system/rails/lang/en.yml and

svn/gencode/tracking_system/rails/vendor/plugins/redmine_annotrack/lang/en.yml/.

In these files an entry like

Code

label_project_new: New Gene

means "if you come across the term label_project_new, display it as New Gene in the browser".

To understand the code underlying specific web pages it is helpful to check the routing entries in

config/routes.rb and vendor/plugins/redmine_annotrack/routes.rb. Specific paths in the browser are mapped to specific functions in the rails code. E.g.:

Code

map.connect 'flags/show_tecs', :controller => 'flags', :action => 'show_tecs'

maps the URL http://annotrack.sanger.ac.uk/human/flags/show_tecs to

the show_tecs function in the file app/controllers/flags_controller.rb.

The list of chromosomes used as well as the different priority values are set on this page.

Some options for links on the transcript pages etc. can be changed through the administration interface.

These previous actions require administrator user rights in the AnnoTrack system. The list of different user right for all groups is shown here.

The documentation pages can be edited with a wiki-style syntax by clicking on the edit pencil on each page.

AnnoTrack: General Documentation

January 31st, 2011

Setting up a new system & adjusting it to your needs

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

The system is flexible enough to be of use for other groups and projects performing genome annotation in a collaborative effort and is therefor provided here. These are notes on how to start a new annotation project with AnnoTrack.

General Redmine installation notes for troubleshooting are here, but all the sourcecode required for AnnoTrack is available here.

Most of the AnnoTrack code is written as a plugin for the Redmine system (rails/vendor/plugins/redmine_annotrack), but since there are some other changes required, which override Redmine's default code, you will need the complete package from this site.

General notes

you will need

  1. a database server (e.g. mysql 5)
  2. ruby on rails installation

    source and help on the official rails page documention for running on Mac OsX (usually pre-installed)

  3. a web server (e.g. Apache) when running in production mode, for testing, the Webbrick server supplied with Rails is fine.
  4. get the AnnoTrack source code and database from this page

    unpack:

    Code

    tar xzvf annotrack.version.tgz

trong>Database

create your database

Code

mysql -u<user> -p<password> -h<host> -P<port> -e"create database annotrack"
 
  mysql -u<user> -p<password> -h<host> -P<port> -Dannotrack < annotrack/database.sql

The main tables fo the database are outlined in this diagram.

Rails server

  • we have frozen the additional external Rails modules used by the application (gems) into the AnnoTrack rails code (rails/vendor/rails/) so you don't necessarily need to install all of them separately.
  • set your environment variables GEM_PATH and RAILS_ENV in your shell or in the file annotrack/rails/config/environment.rb
  • adjust the database configurations file in annotrack/rails/config/database.yml with your settings (production and development if desired)

    additional environments can be created (e.g. for multiple organisms) by adding an entry (e.g. "production_housemouse") and a file in environments (e.g. environments/production_mouse.rb)

  • start the server e.g. on port 6223:

    Code

    cd annotrack/rails
     
    ruby scripts/server -edevelopment -p6223 #(to use the development setup)
  • In a web browser your application will usually be at http://localhost:6223/. Log in as administrator ("admin"/"admin") to set up some initial values.

    The admin interface from Redmine is at DEFAULT_URL/admin, modifications should in particular be made on these pages:

    1. Settings: "Application title", "Welcome text", "Host name"
    2. AnnoTrack settings: "Menu links", "Browsers links", "other settings"

      vendor/plugins/redmine_annotrack/lang/en.yml holds the URL patterns used for browser links.

    3. Flags: define new flags to highlight errors
    4. Users: create & adjust user accounts
  • we have stored a gene with two transcript with two flags for demonstration;

    you can see these by clicking on "Transcripts" at the top of the page and then selecting "View all transcripts".

  • you can create a new gene-level entry manually at DEFAULT_URL/projects/add for testing, in general these will be created by scripts writing directly to the database.

Perl API/scripts

  • You can adjust the settings for your system in the central config.pm file.
  • We use the scripts/cron_jobs.pl file the run automatic updates of the core annotation, to update the stats given on the front page (issue and flag counts), please adjust this to your needs

    Some Perl programming knowledge is required to adjust / write parsers to handle the specific data you will be using.

  • The following additional perl modules (many of which are part of a standard installation) are required to use the AnnoTrack perl API:

    • Bio::Das::Lite
    • MIME::Lite
    • DBI
    • Getopt::Long
    • UNIVERSAL::require
    • Bio::EnsEMBL::DBSQL::DBAdaptor (when accessing Ensembl-style databases)
  • most probably you will have to adjust the source-specific scripts used for data loading and analysis stored in annotrack/perl/modules/annotrack

further hints

  • New genes/transcripts, categories and flags would usually be created by script access. There are functions for all this functionality which is documented "here":/human/docs/core
  • This (/documents/show/8) is a basic *source adaptor* reading data from a tab-delimited file to demonstrate how the modules work.
  • This (/documents/show/10) is an example *source adaptor* to demonstrate a module reading from a database with DBI.
  • Here (/human/docs/core) is the Perl-doc of the AnnoTrack core module.

Further adjustments

to customize the system for your own set-up there are a number of files you can modify:

  • rails/app/views/layouts/base.rhtml: Start page layout
  • rails/vendor/plugins/redmine_annotrack/lang/en.yml: Names and paths to browsers and project-related links
  • we are using a Lucene-based search engine for AnnoTrack, there is a switch option between this and the Redmine-internal search enginge on the Administration/Settings/Annotrack page
  • the scripts annotrack/perl/scripts/cron_jobs.pl.example and annotrack/perl/scripts/cron_queries.sh.example should be adjusted with your environment and run regulary (nightly) to update annotation data, update counts and optimize tables.
  • many "helper variables" are stored in the tmp_values table. Have a look there if stats etc. are not displayed as expected.

Upgrading

General notes on upgrading existing Redmine installations are here.