Subversion Notes

August 6th, 2008

Subversion (svn) is an open-source version control system like the Concurrent Versions System (cvs).

It's home is here:

These are notes from the creation/maintenance of the Gencode code repository at the Sanger institute.

Checking out a repository

svn co svn+ssh://

Adding an external subversion code directory to my repository:

1. Remove existing svn files recursively:

find tracking_system/ -name ".svn" -print | xargs rm -rf

2. Tell svn to ignore certain types of files:

vi ~/.subversion/config

add the line: global-ignores = *~ ._* .svn *.svn *#*

3. do the import:

svn add tracking_system

We're NOT using svn import as add can be reverted and the server/directory specifications for import didn't work:

svn import tracking_system svn+ssh:// -m'import of files'

if things go wrong:

svn revert --recursive tracking_system

4. Commit changes:

svn commit -m'import of files' tracking_system

mySQL User Permissions

May 28th, 2008

Give the user some rights to mySQL database tables


mysql> GRANT select,insert,update,delete on firstdb.*

to 'firstuser'@'%' identified by 'passwd';

Important Privileges:


Modify tables with ALTER TABLE


Make new database, table, or index


Remove rows from tables


Remove databases or tables


Create or remove indexes for tables


Add rows to tables


Select records from tables


Modify records in tables

Complete list @

Ruby on Rails (intro level)

April 23rd, 2008

General and Mac OSX Leopard specific notes about the Ruby language and the Rails framework

Ruby is a object-oriented interpreted language written on C. It is dynamic, reflective and supports multiple programming paradigms. [Wiki]

Ruby is said to follow the principle of least surprise.


Package/Distribution system for Ruby [Doc]

Rake: make utility for Ruby [Doc]

Locations on the Mac:

pre-installed gems: /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8

user-installed gems: /Library/Ruby/Gems/1.8

Update rails framework on Mac OSX:

sudo gem update --system

sudo gem install rails

sudo gem update rake

sudo gem update sqlite3-ruby

RubyGems installed the following executables:


This and a nice introduction:

Also see his.

Adding a table

-create table & fields in mySQL

naming convention: lowercase, using underscore if necessary, plural form, more on naming conventions

-run ruby script/generate model {table name}

this will create files in the following places & with the following functions:

  • -app/models/{table name}_controller.rb
  • -app/db/migrate/{#}_create_{table name}.rb

-run ruby script/generate controller {table name}

this will create files in the following places & with the following functions:

  • -app/controllers/{table name}.rb

OR: run ruby script/generate scaffold {table name} {db field definitions}

this will create controller, view, model, test, migrate files

Ref: RubyOnRails on windows

Rails API docu

Start interactive shell on MacOSX: irb

More docu

Ruby book: Programming Ruby: The Pragmatic Programmers' Guide

Rails book: Agile Web Development with Rails: A Pragmatic Guide

Ruby book: Ruby for Rails

Ruby reference


April 21st, 2008

The Consensus CoDing Sequence (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.

Participating institutes:

  • European Bioinformatics Institute (EBI)
  • National Center for Biotechnology Information (NCBI)
  • Wellcome Trust Sanger Institute (WTSI)
  • University of California, Santa Cruz (UCSC)

Project page at NCBI

Job-specific notes:

There now is a copy of the human and mouse CCDS database on the Ensembl livemirror server. This is being used by the Otterlace client for manual genome annotation as well as the Ensembl web site via the DAS server.

Coding Phases / Frames

April 17th, 2008

The phase (or sometimes called frame) gives information on how to translate individual parts of a gene, the coding exons.

Phases 1 & 2 have a different definition in GFF and EnsEMBL format!

In EnsEMBL, the phase is defined for exon objects like this:

The Ensembl phase convention can be thought of as "the number of bases of the first codon which are on the previous exon". It is therefore 0, 1 or 2 (-1 means the exon is non-coding).

In ascii art, with alternate codons represented by ### and +++:

       Previous Exon   Intron   This Exon

    ...-------------            -------------...

    5'                  Phase                3'

    ...#+++###+++###     0      +++###+++###+...

    ...+++###+++###+     1      ++###+++###++...

    ...++###+++###++     2      +###+++###+++...


In GFF format, the 8th column gives phase information for CDS features.

The definition of phases is here:

For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3. If there is no phase, put a "." (a period) in this field.

For forward strand features, phase is counted from the start field. For reverse strand features, phase is counted from the end field.


In effect, you can usually translate the phase from Ensembl to GFF style like this:

0 to 0

1 to 2, the initial first base is added to last exon's codon

2 to 1, the initial first two bases are added to last exon's codon

The DAS protocol defines the phase as the GFF format:

The tag indicates the position of the feature relative to open reading frame, if any. It may be one of the integers 0, 1 or 2, corresponding to each of the three reading frames, or - if the feature is unrelated to a reading frame.


[Some more infos on different formats]