Move of blog

October 3rd, 2017

Please note that I am moving most of these posts to my page

New article will appear only there.


Create BigWig files with DeepTools

September 18th, 2017

To display data as bar graphs along the genome e.g. in the UCSC genome browser, you can create BigWig files. The underlying Wig format is described in more detail here. BigWig is the binary version (described here), that allows compressing the data and streaming of the data from a remote location to the machine running the display (i.e. the genome browser) on demand.

To create and work with the data various software options are available, my current recommendation is:

DeepTools, a "suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data". The docu pages show more options, but to get started install with:

conda install -c bioconda deeptools
pip install --user deepTools

and create a BigWig from a Bam file:

bamCoverage -b Seqdata.bam -o Seqdata.bigWig

 Originally a Galaxy workflow, DeepTools2 can run on the command line or as an API. It was published by Fidel Ramírez et al.

You can display the data in the UCSC browser by adding a custom track (more details) in the form on the track control page using a line like the following to point to your file that needs to be internet-accessible:

track type=bigWig name="My Big Wig" description="Seqdata BigWig file" bigDataUrl=

GAL file format

July 20th, 2017

GenePix Array List (GAL) files are text files with specific information about the location, size, and name of each DNA spot on a microarray. They are therefor of vital importance for the analysis of scanned microarray images. The format defines a specific header before the list of data columns follows:


ATF	1			

9	5			

Type=GenePix ArrayList V1.0				



"Block1=10000, 38780, 150, 20, 200, 18, 200"				


ArrayerSoftwareName=TAS Application Suite (MicroGrid II)				



Block	Column	Row	ID	Name

1	1	1	RP11-163J21	Clone 1

1	1	2	RP11-163J21	Clone 2


ATF -> File conforms to Axon Text File
1 -> Version number of ATF
9 -> Number of header lines before the "Block, Column, Row, ..." line
5 -> Number of data columns (Block, Column, Row, Name, ID)
Type=GenePix ArrayList V1.0 -> Type of file, same for all GAL files
Block Count=1 -> Number of blocks described in the file
Block Type=0 -> Type of block, 0 = rectangular Block
X=A, B, C, D, E, F, G -> The position and dimensions of each block.
A -> xOrigin
B -> yOrigin
C -> Feature diameter
D -> xFeatures
E -> xSpacing
F -> yFeatures
G -> ySpacing ScanResolution - Optional parameter to scale the position on higher-resolution images Block arrangement

1	2	3	4

5	6	7	8

9	10	11	12

13	14	15	16

The data columns are:

  • Block
  • Column
  • Row
  • Name
  • ID

Further reading and sources:

aCGH array QC measures

July 20th, 2017

The within-array quality for (genomic) microarrays is often measured using the following metrics:

  1. Standard Deviation Autosome / Robust (SD autosome) Measure of the dispersion of Log2 ratio of all clones on the array, giving an overall picture of the noise in the array. It is calculated on the normalised but unsmoothed data. The SD robust is the middle 58%/66% of the data. By excluding outliers large changes such as trisomies will not cause this number to change significantly. (The SD robust is the number we use when we say “3 SDs away from the noise” in the calling algorithm.) Both measures are given after all data processing but excluding any smoothing. For BlueFuse Multi processed data the values should be 0.07-0.15 and 0.05-0.11 for the autosome and robust measure respectively.
  2. Signal to Background Ratio (SBR) Brightness of the mean signal (after the background has been subtracted) divided by the raw background signal (global signal).
  3. Derivative Log2 Ratio / Fused (DLR) measure of the probe to probe variability. In an ideal world, probes within a region will have essentially the same ratio. In a noisy array adjacent probes can have a very large ratio difference. The DLR raw is before any data processing, DLR fused is after normalization and data correction BUT is always done on unsmoothed data so it is user setting independent and a cannot be adjusted by the user thereby giving a consistent array-to-array measure of noise. BlueFuse results should be < 0.2.
  4. % included clones Percentage of all clones that were not excluded on a BAC array due to inconsistencies between clone replicates. For BlueFuse results this should be > 95 %.
  5. Mean Spot Amplitude the mean fluorescent signal intensities for the two channels; channel 1 = sample (standardly Cy3; ex 550nm, emm 570nm) and channel 2 = reference (standardly Cy5; ex 650nm, emm 670nm). This metric is variable due to the differences between available scanners. The mean spot amplitude metric can give an indication of how well the DNA has labelled with fluorescent dyes, but more importantly, really high values can indicate over scanning of the microarray image OR can indicate poor washing so there is lots of non-specific signal left. The balance between channels can be assessed but the Cy5 signal tends to give a higher intensity than Cy3, major differences in the channels may indicate a labelling or a scanner problem.

Source: BlueGnome user docs

See also: Microarray Scanners and PGS consulting in the UK & Ireland

Laboratory Tests under CLIA

April 28th, 2017

Congress passed the Clinical Laboratory Improvement Amendments (CLIA) act in 1988 to establish quality standards for all non-research laboratory testing:

  1. Performed on specimens derived from humans; and
  2. For providing information for the diagnosis, prevention, and treatment of disease or impairment, or assessment of health.

The objective of the CLIA  program is to ensure quality in laboratory testing procedures and specifically to establish quality standards to ensure the accuracy, reliability, and timeliness of the patient’s test results. The CLIA Quality System Regulations became effective on April 24, 2003. Now the laboratory is required to check (verify) the manufacturer's performance specifications provided in the package insert for:

  • Accuracy: If test results for previously tested samples fall within the stated acceptable limits, accuracy is verified
  • Precision: Can the results be repeated mulitple times on the same day and on different days by different operators.
  • Reportable range: Use known samples to confirm the upper and lower limits of the test.
  • Also: Reference range or interval: Do the reference ranges provided by the test system's manufacturer fit your patient population?

The number of samples needs to be established for every test, 20 samples are seen as a "rule of thumb".

The FDA defines a Laboratory Developed Test (LDT) as an in vitro diagnostic test that is manufactured by and used within a single laboratory (i.e. a laboratory with a single CLIA certificate). LDTs are also sometimes called in-house developed tests, or “home brew” tests. Similar to other in vitro diagnostic tests, LDTs are considered “devices,” as defined by the FFDCA, and are therefore subject to regulatory oversight by FDA.

Sources:Centers for Medicare & Medicaid Services, Genohub