Male infertility genetics

August 17th, 2012

10-15% of couples in the western world are faced with some kind of infertility issue, in almost half the cases there are (co-) factors on the male side.
Male infertility factors are often based on sperm abnormalities which can be categorized into:

  • Azoospermic: No sperm in the semen
  • Oligozoospermic: A low sperm count
  • Asthenozoospermic: poor sperm motility
  • Teratozoospermic: abnormal sperm morphology

The genetic region responsible for spermatogenesis and most of these abnormalities is located in the azoospermia factor (AZF) region on Yq11. It contains the sub-regions AZFa, AZFb and AZFc. Microdeletion in these regions are responsible for many genetic causes of male infertility. Alteratons in the region AZFc (which contains the genes PRY2, BPY2, DAZ and CDY1) is believed to be the most frequent molecularly defined cause of spermatogenic failure. This is caused by a high genomic variability, in fact AZFc is one of the most genetically dynamic regions in the human genome. This property may serve as counter against the genetic degeneracy associated with the lack of a meiotic partner, meaning that no exchange of genetic material with a counterpart chromosomal region from the mother can happen.
Intracytoplasmic sperm injection (ICSI) can result in pregnancies, but passes on the genetic infertility to any sons born.

It has been reported that the average sperm count for men in the western world has declined by up to 50% in the past 50 years. These findings are not conclusive however as different studies found different trends in the world. It seems clear however that the exposure to chemical compounds in our environment will influence the hormone balance and have an adverse effect on male fertility and promote diseases like testicular cancer.

Sources: srlworld.com, endotext.org, Page et al. (1999), Navarro-Costa et al. (2010).

Display todays' Date with JavaScript

August 10th, 2012

To display the current date, day of the week and time on a web page, you don't want to refresh the entire page every sencond or minute. Instead you will want to use JavaScript to dynamically update just this date/clock display element. Here is the code for a display in the format

Friday, 10.8.2012    9:41:49

Code

<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
function startTime(){
  var today=new Date();
  var h=today.getHours();
  var m=today.getMinutes();
  var s=today.getSeconds();
  var month = today.getMonth() + 1
  var day = today.getDate()
  var myDays= ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
  var weekday = today.getDay()
  var wday = myDays[weekday]
  var year = today.getFullYear()
  // add a zero in front of numbers<10
  m=checkTime(m);
  s=checkTime(s);
  document.getElementById('txt').innerHTML=wday + ", " + day + "." + month + "." + year + "&nbsp;&nbsp;&nbsp;&nbsp;" + h+":"+m+":"+s;
  t=setTimeout(function(){startTime()},500);
}
 
function checkTime(i){
  if (i<10){
    i="0" + i;
  }
  return i;
}
</script>
</head>
 
<body onload="startTime()">
<div id="txt"></div>
</body>
</html>

Sources: trans4mind.com, w3schools.com

Data Processing with Biopieces

August 2nd, 2012

There is a fine set of scripts that form an orderely pipeline (or framework) to process bioinformatics data on the Unix command line called biopieces. You can e.g. process sequencing (NGS) data like this:

Code

biopieces>
./read_fastq -n 1000 -i data/reads.fastq | ./plot_scores -t png -o data/scores.png --no_stream

to read the first 1000 sequences from a FASTQ file and plot the scores to an image file.
The result might look like this:
Data Processing with Biopieces

The general logic is
        read_data | calculate_something | write_results
with the data being passed through as a "stream" and all modules having the same interface to eachother. Installation instructions are here, on my Ubuntu VM I had to follow these steps:

  1. we need Perl, Ruby, Python, SVN. Install as needed.

    Code

    sudo apt-get install subversion
  2. get biopieces code:

    Code

    svn checkout http://biopieces.googlecode.com/svn/trunk/ biopieces cd biopieces svn checkout http://biopieces.googlecode.com/svn/wiki bp_usage
  3. check pre-requisites with the project's installer script

    Code

    bash biopieces_installer.sh
  4. missing Perl modules where listed nicely and could be installed as suggested.
  5. missing Ruby gems could not be installed due to incompatibilities, eg:

    Code

    sudo gem install RubyInline ERROR: Error installing RubyInline: ZenTest requires RubyGems version > 1.8.

    But the project supplies an excellent ruby installer on the downloads page to create a separate Ruby 1.9 installation, as the default 1.8 one is too old for biopieces, the newer one not officially supported on Ubuntu
  6. modify your ~/.bashrc file to include:

    Code

    export BP_DIR="$HOME/bin/biopieces"
    export BP_DATA="$HOME/bin/biopieces/BP_DATA"
    export BP_TMP="$HOME/bin/biopieces/tmp"
    export BP_LOG="$HOME/bin/biopieces/BP_LOG"
    export PATH="/home/test/bin/biopieces/ruby_install/bin:/home/test/bin/biopieces/biopieces/bp_bin:$PATH"
    export RUBYLIB="/home/test/bin/biopieces/biopieces/code_ruby/lib:$RUBYLIB"
    export PERL5LIB="/home/test/bin/biopieces/biopieces/code_perl:$PERL5LIB"

    Code

    source ~/.bashrc
    mkdir $BP_DATA $BP_TMP $BP_LOG
    The Ruby and Perl lib definitions are necessary avoid errors like

    Code

    cannot load such file -- maasha/biopieces (LoadError)
    ----
    Can't locate Maasha/Fasta.pm in @INC

Some of the almost 200 methods that are implemented in biopieces at this time include:

  • read and write various formats like bed, tab, gff, fasta, fastq
  • blast sequences against eachother or against a genome
  • calculate the N50 value for a set of sequences
  • create statistics about the exon, intron, etc. content of a (12-column) BED file

Building Config Files from a Skeleton

July 12th, 2012

To run programs or pipelines automatically it is often necessary to create or adjust configuration files. Ideally this should be done dynamically by a script from a skeleton (layout) file, replacing placeholder with the adjusted values. This can be done with a unix shell script that even contains the skeleton within:

Code

#! /bin/sh
# pass in variables from command-line arguments
prog=$1
var1=$2
var2=$3
 
# do other required tasks
# ...
 
# config skeleton
template='#config file for pipeline
parameter_1=$var1
parameter_2=$var2'
 
# Generate file output.txt from variable
# $template using placeholders above.
echo "$(eval "echo \"$template\"")" \
> $outputfile
 
# run the specified program
# with the new config file
./${prog} -conf ${outputfile}

Save as script.sh and call with parameters:
sh script.sh program_name par1 par2

Source: stackoverflow

Analysing Variation with Ensembl and PolyPhen

May 28th, 2012

The Ensembl variation resources provide information about structural variants and sequence variants (including Single Nucleotide Polymorphisms (SNPs), insertions, deletions and somatic mutations in the human genome. Details and references are described on the web site and in Chen et al. (2010) Ensembl Variation Resources, BMC Genomics and other publications listed in the site.

Sources and Descriptions currently included in Ensembl variation resources (v67):

  • dbSNP - Variants (including SNPs and indels) imported from dbSNP
  • DGVa - Database of Genomic Variants Archive
  • NHGRI_GWAS_catalog - Variants associated with phenotype data from the NHGRI GWAS catalog
  • COSMIC - Somatic mutations found in human cancers from the COSMIC project
  • EGA - Variants imported from the European Genome-phenome Archive with phenotype association
  • Uniprot - Variants with protein annotation imported from Uniprot
  • HGMD-PUBLIC - Variants from HGMD-PUBLIC dataset March 2012
  • OMIM - Variations linked to entries in the Online Mendelian Inheritance in Man (OMIM) database
  • Open Access GWAS Database - Johnson & O'Donnell 'An Open Access Database of Genome-wide Association Results' PMID:19161620
  • LSDB_LEPRE1 - LEPRE1 homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_PPIB - PPIB homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_CRTAP - CRTAP homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_FKBP10 - FKBP10 homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database

Ensembl offers the possibility to run the underlying code on your own data and predict the functional consequences of known and unknown variants using the Variant Effect Predictor (VEP).

Internally the VEP uses PolyPhen which is further explained below:

For a given amino acid substitution in a protein, PolyPhen-2 extracts various sequence and structure-based features of the substitution site and feeds them to a probabilistic classifier to identify:

Sequence-based features include binding or linking sites, transmembrane regions, regulatory modification sites. Profile matrices are calculated to assess the likelihood of the occurrence of this amino acid at the given position.

Structural features include the comparison to known protein 3D structures in PDB, using DSSP (Dictionary of Secondary Structure in Proteins), accessible surface area and properties.

PolyPhen-2 also looks at functional significance of an allele replacement using the UniProtKB database. It uses the "HumDiv" classifier to find disease-related changes and "HumVar" for variations in the "normal" population.

Ensembl have now added a nice blog entry about this with some more details.