Ruby Sorting

May 9th, 2012

Sorting (elements in an array) is a very common tasks in many scripts. A lot of research has gone into finding the most efficient way to sort.
In Ruby the "sort" function performs a standard comparison accoring to the data type inspected, but as in most other languages you can define any specific orders.

   open_orders.sort

is equivalent to

   open_orders.sort { |x, y| x <=> y }

The sort algorithm will assume that this comparison function/block will return a value accoring to the following logic (like the comparison operators):

    return -1 if x < y
    return  0 if x = y
    return  1 if x > y

So using this logic I can define a specific custom function to to compare the elements that need sorting and call it in the sort function afterwards. In my simple example I need to sort order numbers by two criteria: by a string first ("UK" before "ORD") and by ascending numbers afterwards.

Code

def custom_order_sorting(x_ord,y_ord)
    if(x_ord.match('UK')
       and y_ord.match('ORD'))
       #use UK first
       return -1
    elsif(x_ord.match('ORD')
       and y_ord.match('UK'))
       #use UK first
       return 1
    else
      #use smaller number first
      x_num = x_ord.match('\w(\d+)$')[1]
      y_num = y_ord.match('\w(\d+)$')[1]
      return x_num <=> y_num
    end
end
 
open_orders.sort!{|x,y| custom_order_sorting(x,y)}

Source: stackoverflow.com

Genometastasis

May 9th, 2012

The hypothesis of genometastasis was suggested by García-Olmo et al. more than a decade ago (1) and states (simplified) that normal cells could be turned into cancer cells through contact with (dying) cancer cells. In particular, "metastases might develop as a result of transfection of susceptible cells in distant target organs with dominant oncogenes that circulate in the plasma and are derived from the primary tumor." It can therefor be considered as a form of horizontal gene / DNA transfer. The updake of the genomic material was explained through apoptotic bodies from cancer cells as described by Holmgren et al. (2). The ideas were actually already described a century ago (6,7).
An alternative could be the involvement of a virus as a transmitter as described by zur Hausen (8).

In a later study (3) the same group could show that plasma from colorectal cancer patients could transform cultured cells oncogenically (fig 1):

Genometastasis

Further research of the group was published recently (4) describing the transformation of cells cultured from healthy individuals through particles from cultured colon cancer cells. Goldenberg et al. (5) could stablely transform cells between species through cell fusion, resulting in hamster cells that express human oncogenes.

The evidence for horizontal gene transfer, in particular that cancer cells, dying parts of the cells or even cell-free cancer DNA can induce malignancy is worrying. It is likely only possible under very specific conditions and with certain (aggressive) cancer types, but certainly an interesting research area to watch. If confirmed it could have dramatic effects on treatment strategies and could open up new methological possibilities for molecular research.

References:

  1. García-Olmo D, et al. (1999) Histol Histopathol. 14(4):1159-64.
    Tumor DNA circulating in the plasma might play a role in metastasis. The hypothesis of the genometastasis.
  2. Holmgren L, et al (1999) Horizontal transfer of DNA by the uptake of apoptotic bodies. Blood. 93:3956-3963.
  3. García-Olmo D, García-Olmo DC (2001) Ann N Y Acad Sci. 945:265-75. Functionality of circulating DNA: the hypothesis of genometastasis.
  4. García-Olmo D, et al. (2010) Cell-Free Nucleic Acids Circulating in the Plasma of Colorectal Cancer Patients Induce the Oncogenic Transformation of Susceptible Cultured Cells; Cancer Res. 70(2):560-7
  5. Goldenberg DM et al. (2011) Horizontal transmission and retention of malignancy, as well as
    functional human genes, after spontaneous fusion of human
    glioblastoma and hamster host cells in vivo. International Journal of Cancer 131,1
  6. Goldenberg DM (1968) Über die Progression der Malignität: Eine Hypothese [On the progression of malignancy: A hypothesis]. Klin Wochenschr; 46: 898–99
  7. Aichel O (1911) Über Zellverschmelzung mit qualitative abnormer Chromosomenverteilung als Ursache der Geschwulstbildung [On cell fusion with qualitative abnormal chromosome distribution as the cause of tumor formation]. In: Roux W, ed. Vorträge und Aufsätze über Entwicklungsmechanik der Organismen, Vol. 13
  8. zur Hausen, HPapillomaviruses Causing Cancer: Evasion From Host-Cell Control in Early Events in Carcinogenesis, J Natl Cancer Inst. 2000;92(9)

Uniparental Disomy

May 4th, 2012

In cases where two copies of the same chromosome, or part of a chromosome, from one parent and no copies from the other parent are present in the cell, we call it uniparental disomy (UPD). While all DNA information is present, the development of the cell (and the organism) is hindered because of missing / wrong epigenetic markers. The basic mechanism of how this faulty distribution of chromosomes can occur, is shown in fig.1.

Uniparental Disomy

Sources:

  • Wikipedia
  • Eggermann and Kotzot (2010) Uniparental disomy, Onset mechanisms and their relevance in clinical genetics [German], Medizinische Genetik

Version Control with Perforce on the Command-line

April 19th, 2012

Besides the visual client, the version control system Perforce can be operated through the command line (unix prompt or windows Dos window) and therefor be controlled through other programs like MatLab:

[status, result] = dos(p4command);

A reference manual is available, here are a few hints:
Check the environment settings:

p4 set
  P4CHARSET=winansi
  P4CLIENT=try1 (set)
  P4EDITOR=C:\Windows\SysWOW64\notepad.exe (set)
  P4PORT=perforce:1666
  P4USER=Felix_Kokocinski

end edit if necessary with

set P4CHARSET=winansi

P4EDITOR is optional, P4CLIENT is the checkout / workspace name.
The settings can also be set permanently in the visual client under
Edit / Preferences / Connection / Change Settings
If these are wrong you will get messages like "file(s) not on client".

Most common commands:
synchronize repository:

p4 sync

checkout file:

p4 edit filename.txt
  or
p4 edit //depot/path/in/perforce/filename.txt

submit changes:

p4 submit -d "description of changes" filename.txt

revert to version in repository:

p4 revert filename.txt

add new file:

p4 add filename.txt

get help:

p4 help

Here are some useful one-liners for various tasks.

OMIM Symbols

April 16th, 2012

The Online Mendelian Inheritance in Man is a manually reviewed catalog of human genes and regions involved in genetic disorders and traits. Each entry has a name and a number, e.g. "#154780 MARSHALL SYNDROME". According to the OMIM FAQs, these are the meanings of the the symbols preceding a MIM number:

  1. An asterisk (*) before an entry number indicates a gene.
  2. A number symbol (#) before an entry number indicates that it is a descriptive entry, usually of a phenotype, and does not represent a unique locus. The reason for the use of the number symbol is given in the first paragraph of the entry. Discussion of any gene(s) related to the phenotype resides in another entry(ies) as described in the first paragraph.
  3. A plus sign (+) before an entry number indicates that the entry contains the description of a gene of known sequence and a phenotype.
  4. A percent sign (%) before an entry number indicates that the entry describes a confirmed mendelian phenotype or phenotypic locus for which the underlying molecular basis is not known.
  5. No symbol before an entry number generally indicates a description of a phenotype for which the mendelian basis, although suspected, has not been clearly established or that the separateness of this phenotype from that in another entry is unclear.
  6. A caret (^) before an entry number means the entry no longer exists because it was removed from the database or moved to another entry as indicated.

To fetch a non-redundant list of OMIM annotation through the Ensembl Perl API you can look at the external references (xrefs/dblinks):

Code

my $att = "MIM_GENE";
# or: my $att = "MIM_MORBID";
my $attribs = $gene->get_all_DBLinks($att);
my (%ids, %descriptions);
if (@{ $attribs }){
  foreach my $attrib (@{ $attribs }){
    if (not(exists $ids{$attrib->primary_id()})){
      $ids{$attrib->primary_id} = $attrib->display_id;
      $descriptions{$attrib->description} = $attrib->display_id;
    }
  }
}

Ref:
OMIM publication, http://omim.org/