Simple Website monitor

November 2nd, 2012

There are many sophisticated services and scripts to monitor the accessability of your website or various aspects of your web server. Check stack-overflow and look at Monastic for examples. Here is a very simple solution I needed to monitor the availability of a specific server using its IP address within the internal network. Is was necessary after the server's IP address, that is used in third party software to provide specific services, was "stolen" by other machines. In this case the DNS server assigned the IP address that should have been reserved to mobile devices that connected to the wireless network.

This approach is simply fetching a website from a specific URL using the "reserved" IP and looks for a word/pattern you know should be there. The script is run on a second machine (host name "ubuntu64"), an Ubuntu VM. (It is not using any additional security measures you will want to use if you expose the machine externally.)

Prepare second machine to send notification emails:
Install sendmail, sendemail, mailutils, sensible-mda (to have the whole set).
Add/modify entry in /etc/hosts: ubuntu64

run "sudo sendmailconfig"
test with


sendemail -q -f -t -u "mailtest" -m "mail works!"

Write bash script to get and check website and send alert emails:


# define address and pattern to expect
# define alert email
body="system on machine at risk"
subj="Important server unresponsive";
# fetch page and look for pattern
resp=`wget -q -O - $address | grep -c $searchword`
if [ $resp -lt 1 ]; then
  sendemail -q -f $sender -t $receiver -u $subj -m $body

Add a crontab entry to automatically run this script every 10 minutes:


*/10 * * * * sh /home/user/

Additional improvements could include the options to stop alerting after a specific number of alerts or checking the response time.

Alternatively you can just look up the MAC address associated with the "reserved" IP and compare it to the known physical address of your server and wrap this up into a little script:

>arp -a

Interface: --- 0xb
  Internet Address      Physical Address      Type           00-11-18-2c-2e-6d     dynamic

ENCODE publication interview

October 1st, 2012

Following on from the publication of the main papers of the ENCODE (Encyclopedia Of DNA Elements) scale-up phase, I gave an interview to BlueGnome's marketing team for the Newstrack customer newsletter in 2012.

These are my personal opinions, not my employer's (past or present). They might be of interest to researcher's considering to join a large-scale project like this.

Q. What was it like to be part of the ENCODE project?
It was a great experience to work on a project of this scale with more than 400 scientists from 32 groups spread across the globe. Many of them are the leaders in their field, but at consortium meetings and the many phone conferences everyone could contribute. The amount of data and different technologies was overwhelming at times, so I think it’s an impressive achievement how this project was run and now the findings have been published.

Q. What are the main outcomes of the project?
There has been a very lively discussion about the outcome and how it was presented. In my opinion, the most important result is the data itself. ENCODE has created an enormous repository of measurements across the human genome that has been compiled in a systematic and standardised way. The data will be the basis of future research trying to understand genomic processes involved in basic cellular processes as well as in various diseases.
ENCODE has pushed the development of standards and new applications to interrogate the genome, in particular using sequencing technologies.
The results also remind us that there is a lot of activity in the genome that we currently do not fully understand. Up to 80% of the human genome is biochemically active, there are thousands of additional (non-coding) genes in introns and in the intergenic space, and up to 75% of the genome is transcribed at some point. These observations paint a very dynamic genomic landscape, with overlapping active zones and signals of different complexity, indicating, that we have to keep the concept of genes and genome regulation pretty flexible in our mind.

Q. What are potential implications for BlueGnome and
its customers?

I’m afraid the interpretation of CNV regions is getting even more complex as regulatory regions far away from the actual disease genes might be relevant for cases the clinical customers might come across. This is especially true for the interpretation of cancer profiles – which is highly complex already. We won’t be able to use these new interconnections directly in most cases, but we are looking through the data and have started to incorporate the knowledge by providing new genome-wide annotation data sets as optional BED files on the BlueGnome website, e.g. with GWAS results and regulatory element locations.

Q. Where do you see the human genome in 5 years’ time?
ENCODE is entering its next phase now to extend the catalogue to many additional cell lines as well as the mouse genome. With the recent publications scientists around the world are now more aware of this data and how to use it, so my hope is that we will see an acceleration in algorithm development, data mining and scientific findings. In 5 years we still won’t understand the genome entirely, but we should have a complete parts list and more connections between the parts. Some of these will be clinically relevant to allow progress in understanding and fighting today’s ‘big killers’ like certain types of cancer.

Q. Would you personally be interested in having your genome sequenced?
As a data exploration exercise I would find this really interesting, but the definitive answers you can get from it are still limited today. I would certainly want to make sure this data is kept private and under my control. With BlueGnome now being part of Illumina we can actually help to develop these ideas further.

Further information: Nature's Encode portal, "An integrated encyclopedia of DNA elements in the human genome" publication, Guardian Interview with Ewan Birney

SAM format summary

August 30th, 2012

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. It is a text format for storing sequence data in a series of tab delimited ASCII columns and is commonly used in next-generation sequencing data processing. It is the (non-binary) human-readable version of the BAM format and contains information about the read and the aligned position in the genome. It was developed by Heng Li in Richard Durbins group and others, their paper is here.

After a header section the alignment section describes all results of the aligned read data. The format is best explained with an example line:


1:497:R:-272+13M17D24M  113  1  497  37  37M  15  100338662  0  CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG  0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>  XT:A:U  NM:i:0  SM:i:37  AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:37
Fieldname	description	Example-data
QNAME	read name	1:497:R:-272+13M17D24M
FLAG	alignment flag	113
RNAME	alignment chromosome	1
POS	alignment start position	497
MAPQ	overall mapping quality	37
CIGAR	alignment CIGAR string	37M
MRNM/RNEXT	name of next alignm. in group (mate)	15
MPOS/PNEXT	pos. of next alignm. in group (mate)	100338662
ISIZE/TLEN	observed Template LENgth	0
QUAL	quality per base	0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>
TAGs	further tags with alignment info
XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37

The tags are optional and might vary between alignment programs. Shown are examples from BWA. Important for filtering are usually the tags X0:i (numbers of genome alignments of this read) and XM:i (number of mismatches in alignment).

       Tag	Meaning
       NM	Edit distance
       MD	Mismatching positions/bases
       AS	Alignment score
       BC	Barcode sequence
       X0	Number of best hits
       X1	Number of suboptimal hits found by BWA
       XN	Number of ambiguous bases in the referenece
       XM	Number of mismatches in the alignment
       XO	Number of gap opens
       XG	Number of gap extentions
       XT	Type: Unique/Repeat/N/Mate-sw
       XA	Alternative hits; format: (chr,pos,CIGAR,NM;)*
       XS	Suboptimal alignment score
       XF	Support from forward/reverse alignment
       XE	Number of supporting seeds

The read name (at least from Illumina machines) are constructed as:

[instrument-name]:[run ID]:[flowcell ID]:[lane-number]:[tile-number]:
[x-pos]:[y-pos] [read number]:[is filtered]:[control number]:
[barcode sequence]


@M01117:25:000000000-A37B9:1:1101:14984:1386 1:N:0:4

genome.sph.umich.ed with further useful details, full specs.

Male infertility genetics

August 17th, 2012

10-15% of couples in the western world are faced with some kind of infertility issue, in almost half the cases there are (co-) factors on the male side.
Male infertility factors are often based on sperm abnormalities which can be categorized into:

  • Azoospermic: No sperm in the semen
  • Oligozoospermic: A low sperm count
  • Asthenozoospermic: poor sperm motility
  • Teratozoospermic: abnormal sperm morphology

The genetic region responsible for spermatogenesis and most of these abnormalities is located in the azoospermia factor (AZF) region on Yq11. It contains the sub-regions AZFa, AZFb and AZFc. Microdeletion in these regions are responsible for many genetic causes of male infertility. Alteratons in the region AZFc (which contains the genes PRY2, BPY2, DAZ and CDY1) is believed to be the most frequent molecularly defined cause of spermatogenic failure. This is caused by a high genomic variability, in fact AZFc is one of the most genetically dynamic regions in the human genome. This property may serve as counter against the genetic degeneracy associated with the lack of a meiotic partner, meaning that no exchange of genetic material with a counterpart chromosomal region from the mother can happen.
Intracytoplasmic sperm injection (ICSI) can result in pregnancies, but passes on the genetic infertility to any sons born.

It has been reported that the average sperm count for men in the western world has declined by up to 50% in the past 50 years. These findings are not conclusive however as different studies found different trends in the world. It seems clear however that the exposure to chemical compounds in our environment will influence the hormone balance and have an adverse effect on male fertility and promote diseases like testicular cancer.

Sources:,, Page et al. (1999), Navarro-Costa et al. (2010).

Display todays' Date with JavaScript

August 10th, 2012

To display the current date, day of the week and time on a web page, you don't want to refresh the entire page every sencond or minute. Instead you will want to use JavaScript to dynamically update just this date/clock display element. Here is the code for a display in the format

Friday, 10.8.2012    9:41:49


<!DOCTYPE html>
<script type="text/javascript">
function startTime(){
  var today=new Date();
  var h=today.getHours();
  var m=today.getMinutes();
  var s=today.getSeconds();
  var month = today.getMonth() + 1
  var day = today.getDate()
  var myDays= ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
  var weekday = today.getDay()
  var wday = myDays[weekday]
  var year = today.getFullYear()
  // add a zero in front of numbers<10
  document.getElementById('txt').innerHTML=wday + ", " + day + "." + month + "." + year + "&nbsp;&nbsp;&nbsp;&nbsp;" + h+":"+m+":"+s;
function checkTime(i){
  if (i<10){
    i="0" + i;
  return i;
<body onload="startTime()">
<div id="txt"></div>