Linux Firewall

July 4th, 2013

To block a specific IP address from network access to your (Ubuntu Linux) system, you can add it to your firewall settings:
sudo iptables -A INPUT -s -j DROP
To remove this entry:
sudo iptables -D INPUT -s -j DROP
To just list current firewall rules:
sudo iptables -L


GC content of human chromosomes

April 16th, 2013

The GC content is the molar ratio of guanine+cytosine bases in DNA. The human genome is a mosaic of GC-rich and GC-poor regions, of around 300kb in length, called isochores. GC content is an important factor in many experiments and bioinformatic analysis. This is especially true for next-generation sequencing where the DNA being sequenced has gone through multiple rounds of PCR amplification.

Average GC content per chromosome:

1   0.417439
2   0.402438
3   0.396943
4   0.382479
5   0.395163
6   0.396109
7   0.407513
8   0.401757
9   0.413168
10  0.415849
11  0.415657
12  0.40812
13  0.385265
14  0.408872
15  0.42201
16  0.447894
17  0.455405
18  0.39785
19  0.483603
20  0.441257
21  0.408325
22  0.479881
X   0.394963
Y   0.391288
MT  0.443626

The common way to reduce the GC bias in data analysis is to basically

  1. calculate to GC ratio (number of G/C bases / number of bases) in the region of interest (ROI) being measured
  2. find average value measured (a) across the genome in all regions with this ratio
  3. normalize the value measured in the ROI (m) with this value: m/a

More details on the GC bias in next-gen sequencing is described by Benjamini and Speed here: " The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. (...) It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias."

Correcting the bias can follow a "read model", "fragment model" or a "global model".

Sources:, PubMed, PubMed

See also: Chromosome length table

Windows Task Scheduler Error

April 16th, 2013

A scheduled task on Microsoft Windows 2008 failed "due to a time trigger condition" and with the error message including "Data: Error Value 2147943726." after running without problems before.
The reason for this was that the network-wide password for the user account assigned to running the task, had been changed since setting up the task.
Re-opening the task properties (double-click in the "Active Tasks" list and select "Options" from the right-hand menue") and saving with the new password fixed the problem.

Chromosome lengths

March 20th, 2013

Here is a quick list of the sizes of human chromosomes in assembly GRCh37 as defined by Ensembl:

chrom	 length [bp]
 1	 249,250,621 
 2	 243,199,373 
 3	 198,022,430 
 4	 191,154,276 
 5	 180,915,260 
 6	 171,115,067 
 7	 159,138,663 
 8	 146,364,022 
 9	 141,213,431 
10	 135,534,747 
11	 135,006,516 
12	 133,851,895 
13	 115,169,878 
14	 107,349,540 
15	 102,531,392 
16	  90,354,753 
17	  81,195,210 
18	  78,077,248 
19	  59,128,983 
20	  63,025,520 
21	  48,129,895 
22	  51,304,566 
X	 155,270,560 
Y	  59,373,566 
Mt	      16,569
Chromosome lengths

These sizes are useful for calculations of percent coverage of genomic features or sequencing reads.
They are often required when working with BED files.

Related: Chromosome ideograms and nomenclature, chromosome GC content

Simple Website monitor

November 2nd, 2012

There are many sophisticated services and scripts to monitor the accessability of your website or various aspects of your web server. Check stack-overflow and look at Monastic for examples. Here is a very simple solution I needed to monitor the availability of a specific server using its IP address within the internal network. Is was necessary after the server's IP address, that is used in third party software to provide specific services, was "stolen" by other machines. In this case the DNS server assigned the IP address that should have been reserved to mobile devices that connected to the wireless network.

This approach is simply fetching a website from a specific URL using the "reserved" IP and looks for a word/pattern you know should be there. The script is run on a second machine (host name "ubuntu64"), an Ubuntu VM. (It is not using any additional security measures you will want to use if you expose the machine externally.)

Prepare second machine to send notification emails:
Install sendmail, sendemail, mailutils, sensible-mda (to have the whole set).
Add/modify entry in /etc/hosts: ubuntu64

run "sudo sendmailconfig"
test with


sendemail -q -f -t -u "mailtest" -m "mail works!"

Write bash script to get and check website and send alert emails:


# define address and pattern to expect
# define alert email
body="system on machine at risk"
subj="Important server unresponsive";
# fetch page and look for pattern
resp=`wget -q -O - $address | grep -c $searchword`
if [ $resp -lt 1 ]; then
  sendemail -q -f $sender -t $receiver -u $subj -m $body

Add a crontab entry to automatically run this script every 10 minutes:


*/10 * * * * sh /home/user/

Additional improvements could include the options to stop alerting after a specific number of alerts or checking the response time.

Alternatively you can just look up the MAC address associated with the "reserved" IP and compare it to the known physical address of your server and wrap this up into a little script:

>arp -a

Interface: --- 0xb
  Internet Address      Physical Address      Type           00-11-18-2c-2e-6d     dynamic