Categories: "EnsEMBL" or "GeneBuild"

Ensembl Core Database Schema Diagram

November 26th, 2010
To understand the concept of Ensembl and learn how to query the tables I find it extremely useful to have a schema diagram of the database in front of me. This can be generated by using the schema.sql and foreign_keys.sql files from the sql directory… more »

Repeat Finding and Masking

October 13th, 2010
What are genomic interspersed repeats? [from the RepeatMasker docu] In the mid 1960's scientists discovered that many genomes contain stretches of highly repetitive DNA sequences ( see Reassociation Kinetics Experiments, and C-Value Paradox ). Thes… more »

Caching in ENSEMBL

November 11th, 2009
How to avoid falling in the cache... Caching is a powerful way to speed up queries to the Ensembl database. It can get problematic however for example if you are repeating a query multiple time, but have updated the data set in between. It is importa… more »


April 21st, 2008
The Consensus CoDing Sequence (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard… more »

Coding Phases / Frames

April 17th, 2008
The phase (or sometimes called frame) gives information on how to translate individual parts of a gene, the coding exons. Phases 1 & 2 have a different definition in GFF and EnsEMBL format! In EnsEMBL, the phase is defined for exon objects like this:… more »