Deprecated: define(): Declaration of case-insensitive constants is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php on line 49

Deprecated: define(): Declaration of case-insensitive constants is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php on line 50

Deprecated: define(): Declaration of case-insensitive constants is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php on line 51

Deprecated: Function get_magic_quotes_gpc() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_param.funcs.php on line 2071

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 379

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 40

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 317

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 318

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 319

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/model/db/_db.class.php:49) in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_template.funcs.php on line 320
Bioinformatics work notes http://blog.kokocinski.net/index.php?blog=2&tempskin=_atom b2evolution 2021-02-26T21:02:27Z Move of blog admin http://blog.kokocinski.net/blogs/ http://blog.kokocinski.net/index.php/move-of-blog?blog=2 2017-10-03T14:52:00Z 2019-08-21T07:10:29Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Please note that I am moving most of these posts to my page

         blog.gene-test.com.

New articles will appear only there.

 


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Create BigWig files with DeepTools Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/create-bigwig-files?blog=2 2017-09-18T13:25:00Z 2017-09-18T15:34:47Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To display data as bar graphs along the genome e.g. in the UCSC genome browser, you can create BigWig files. The underlying Wig format is described in more detail here. BigWig is the binary version (described here), that allows compressing the data and streaming of the data from a remote location to the machine running the display (i.e. the genome browser) on demand.

To create and work with the data various software options are available, my current recommendation is:

DeepTools, a "suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data". The docu pages show more options, but to get started install with:

conda install -c bioconda deeptools
      or
pip install --user deepTools

and create a BigWig from a Bam file:

bamCoverage -b Seqdata.bam -o Seqdata.bigWig

 Originally a Galaxy workflow, DeepTools2 can run on the command line or as an API. It was published by Fidel Ramírez et al.

You can display the data in the UCSC browser by adding a custom track (more details) in the form on the track control page using a line like the following to point to your file that needs to be internet-accessible:

track type=bigWig name="My Big Wig" description="Seqdata BigWig file" bigDataUrl=https://your.server.com/Seqdata.bigWig


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
GAL file format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/gal-file-format-1?blog=2 2017-07-20T09:42:00Z 2017-07-20T09:42:17Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

GenePix Array List (GAL) files are text files with specific information about the location, size, and name of each DNA spot on a microarray. They are therefor of vital importance for the analysis of scanned microarray images. The format defines a specific header before the list of data columns follows:

Example:

ATF	1			

9	5			

Type=GenePix ArrayList V1.0				

BlockCount=1				

BlockType=0				

"Block1=10000, 38780, 150, 20, 200, 18, 200"				

Supplier=BioRobotics				

ArrayerSoftwareName=TAS Application Suite (MicroGrid II)				

ArrayerSoftwareVersion=2.7.1.18					

ScanResolution=10	

Block	Column	Row	ID	Name

1	1	1	RP11-163J21	Clone 1

1	1	2	RP11-163J21	Clone 2

Explanantions:

ATF -> File conforms to Axon Text File
1 -> Version number of ATF
9 -> Number of header lines before the "Block, Column, Row, ..." line
5 -> Number of data columns (Block, Column, Row, Name, ID)
Type=GenePix ArrayList V1.0 -> Type of file, same for all GAL files
Block Count=1 -> Number of blocks described in the file
Block Type=0 -> Type of block, 0 = rectangular Block
X=A, B, C, D, E, F, G -> The position and dimensions of each block.
A -> xOrigin
B -> yOrigin
C -> Feature diameter
D -> xFeatures
E -> xSpacing
F -> yFeatures
G -> ySpacing ScanResolution - Optional parameter to scale the position on higher-resolution images Block arrangement

1	2	3	4

5	6	7	8

9	10	11	12

13	14	15	16

The data columns are:

  • Block
  • Column
  • Row
  • Name
  • ID

Further reading and sources:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
aCGH array QC measures Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/acgh-array-qc-measures?blog=2 2017-07-20T09:28:00Z 2017-07-20T09:28:19Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The within-array quality for (genomic) microarrays is often measured using the following metrics:

  1. Standard Deviation Autosome / Robust (SD autosome) Measure of the dispersion of Log2 ratio of all clones on the array, giving an overall picture of the noise in the array. It is calculated on the normalised but unsmoothed data. The SD robust is the middle 58%/66% of the data. By excluding outliers large changes such as trisomies will not cause this number to change significantly. (The SD robust is the number we use when we say “3 SDs away from the noise” in the calling algorithm.) Both measures are given after all data processing but excluding any smoothing. For BlueFuse Multi processed data the values should be 0.07-0.15 and 0.05-0.11 for the autosome and robust measure respectively.
  2. Signal to Background Ratio (SBR) Brightness of the mean signal (after the background has been subtracted) divided by the raw background signal (global signal).
  3. Derivative Log2 Ratio / Fused (DLR) measure of the probe to probe variability. In an ideal world, probes within a region will have essentially the same ratio. In a noisy array adjacent probes can have a very large ratio difference. The DLR raw is before any data processing, DLR fused is after normalization and data correction BUT is always done on unsmoothed data so it is user setting independent and a cannot be adjusted by the user thereby giving a consistent array-to-array measure of noise. BlueFuse results should be < 0.2.
  4. % included clones Percentage of all clones that were not excluded on a BAC array due to inconsistencies between clone replicates. For BlueFuse results this should be > 95 %.
  5. Mean Spot Amplitude the mean fluorescent signal intensities for the two channels; channel 1 = sample (standardly Cy3; ex 550nm, emm 570nm) and channel 2 = reference (standardly Cy5; ex 650nm, emm 670nm). This metric is variable due to the differences between available scanners. The mean spot amplitude metric can give an indication of how well the DNA has labelled with fluorescent dyes, but more importantly, really high values can indicate over scanning of the microarray image OR can indicate poor washing so there is lots of non-specific signal left. The balance between channels can be assessed but the Cy5 signal tends to give a higher intensity than Cy3, major differences in the channels may indicate a labelling or a scanner problem.

Source: BlueGnome user docs

See also: Microarray Scanners and PGS consulting in the UK & Ireland


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Laboratory Tests under CLIA Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/laboratory-test-under-clia?blog=2 2017-04-28T10:24:00Z 2017-09-18T15:02:02Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Congress passed the Clinical Laboratory Improvement Amendments (CLIA) act in 1988 to establish quality standards for all non-research laboratory testing:

  1. Performed on specimens derived from humans; and
  2. For providing information for the diagnosis, prevention, and treatment of disease or impairment, or assessment of health.

The objective of the CLIA  program is to ensure quality in laboratory testing procedures and specifically to establish quality standards to ensure the accuracy, reliability, and timeliness of the patient’s test results. The CLIA Quality System Regulations became effective on April 24, 2003. Now the laboratory is required to check (verify) the manufacturer's performance specifications provided in the package insert for:

  • Accuracy: If test results for previously tested samples fall within the stated acceptable limits, accuracy is verified
  • Precision: Can the results be repeated mulitple times on the same day and on different days by different operators.
  • Reportable range: Use known samples to confirm the upper and lower limits of the test.
  • Also: Reference range or interval: Do the reference ranges provided by the test system's manufacturer fit your patient population?

The number of samples needs to be established for every test, 20 samples are seen as a "rule of thumb".

The FDA defines a Laboratory Developed Test (LDT) as an in vitro diagnostic test that is manufactured by and used within a single laboratory (i.e. a laboratory with a single CLIA certificate). LDTs are also sometimes called in-house developed tests, or “home brew” tests. Similar to other in vitro diagnostic tests, LDTs are considered “devices,” as defined by the FFDCA, and are therefore subject to regulatory oversight by FDA.

Sources:Centers for Medicare & Medicaid Services, Genohub


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Vaccination of newborns Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/vaccination-of-newborns?blog=2 2017-04-28T09:37:00Z 2017-06-13T10:05:47Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Most of us take vaccinations for granten and rely on them from our very first days. The whooping cough as an example can be deadly, especially for young babies who are too young to be protected by their own vaccination. Since 2010, the Centers for Disease Control and Prevention (CDC) has recorded between 10,000 and 50,000 cases each year in the United States and up to 20 babies dying. One recent study showed that many whooping cough deaths among babies could be prevented if all babies received the first dose of vaccination on time at 2 months old, when they are old enough to get vaccinated (CDC). Still, some parents believe they know better and risk their childrens life by not vaccinating them at all. 

 

For the US the CDC recommends vaccination of newborns / babies against the following diseases:

For Germany the situation is almost the same and the following vacciantions are recommended for babies under 2 years:

  • Hib H. influenzae Typ b
  • Diphtherie
  • Hepatitis B
  • Masern
  • Mumps
  • Pertussis (Keuchhusten)
  • Pneumokokken
  • Poliomyelitis (Kinderlaehmung)
  • Röteln
  • Tetanus
  • Rotaviren
  • Varizellen (Windpocken)
  • Meningokokken C

Sources: CDC, Robert-Koch-Institut


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Genetic Conditions Screened in Newborns Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/genetic-conditions-screened-in-newborns?blog=2 2017-04-13T14:08:00Z 2017-04-13T14:23:43Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
Genetic Conditions Screened in Newborns

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

As part of the health assessment of newborn babys, a test for common genetic conditions is done by drawing a few drops of blood from the heel of the baby and sending this off for analysis. Any positive results will then be followed up by confirmatory test and a treatment can be initiated if required. The conditions are mostly life-threatening or disabeling for the child if undiagnosed or left untreated.

Below is a list of conditions that are screened as part of the current standard panel of core conditions and secondary conditions in the US-american health system. Secondary conditions are results that will be additionally (unintentinally) revealed when testing for the core conditions. If desired there are even more options for testing (supplemental screening). What test are offered or paid for depends on the state and the insurance. This information is taken from babysfirsttest.org.

 

1. Metabolic Disorders

ORGANIC ACID CONDITIONS

  • 2-Methyl-3-Hydroxybutyric Acidemia (2M3HBA)
  • 2-Methylbutyrylglycinuria (2MBG)
  • 3-Hydroxy-3-Methylglutaric Aciduria (HMG) *
  • 3-Methylcrotonyl-CoA Carboxylase Deficiency (3-MCC) *
  • 3-Methylglutaconic Aciduria (3MGA)
  • Beta-Ketothiolase Deficiency (BKT) *
  • Ethylmalonic Encephalopathy (EME)
  • Glutaric Acidemia, Type I (GA-1) *
  • Holocarboxylase Synthetase Deficiency (MCD)
  • Isobutyrylglycinuria (IBG)
  • Isovaleric Acidemia (IVA) *
  • Malonic Acidemia (MAL)
  • Methylmalonic Acidemia (Cobalamin Disorders) (Cbl A,B) *
  • Methylmalonic Acidemia (Methymalonyl-CoA Mutase Deficiency) (MUT) *
  • Methylmalonic Acidemia with Homocystinuria (Cbl C, D, F)
  • Propionic Acidemia (PROP) *

FATTY ACID OXIDATION DISORDERS

  • 2,4 Dienoyl-CoA Reductase Deficiency (DE RED)
  • Carnitine Acylcarnitine Translocase Deficiency (CACT)
  • Carnitine Palmitoyltransferase I Deficiency (CPT-IA)
  • Carnitine Palmitoyltransferase Type II Deficiency (CPT-II)
  • Carnitine Uptake Defect (CUD) *
  • Glutaric Acidemia, Type II (GA-2)
  • Long-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (LCHAD) *
  • Medium-Chain Acyl-CoA Dehydrogenase Deficiency (MCAD) *
  • Medium-Chain Ketoacyl-CoA Thiolase Deficiency (MCAT)
  • Medium/Short-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (M/SCHAD)
  • Short-Chain Acyl-CoA Dehydrogenase Deficiency (SCAD)
  • Trifunctional Protein Deficiency (TFP) *
  • Very Long-Chain Acyl-CoA Dehydrogenase Deficiency (VLCAD) *

AMINO ACID DISORDERS

  • Argininemia (ARG)
  • Argininosuccinic Aciduria (ASA) *
  • Benign Hyperphenylalaninemia (H-PHE)
  • Biopterin Defect in Cofactor Biosynthesis (BIOPT-BS)
  • Biopterin Defect in Cofactor Regeneration (BIOPT-REG)
  • Carbamoyl Phosphate Synthetase I Deficiency (CPS)
  • Citrullinemia, Type I (CIT) *
  • Citrullinemia, Type II (CIT II)
  • Classic Phenylketonuria (PKU) *
  • Homocystinuria (HCY) *
  • Hypermethioninemia (MET)
  • Hyperornithine with Gyrate Deficiency (Hyper ORN)
  • Maple Syrup Urine Disease (MSUD) *
  • Nonketotic Hyperglycinemia (NKH)
  • Ornithine Transcarbamylase Deficiency (OTC)
  • Prolinemia (PRO)
  • Tyrosinemia, Type I (TYR I) *
  • Tyrosinemia, Type II (TYR II)
  • Tyrosinemia, Type III (TYR III)

 

2. Endocrine Disorders

  • Congenital Adrenal Hyperplasia (CAH) *
  • Primary Congenital Hypothyroidism (CH) *

 

3. Hemoglobin Disorders

  • Glucose-6-Phosphate Dehydrogenase Deficiency (G6PD)
  • Hemoglobinopathies (Var Hb)
  • S, Beta-Thalassemia (Hb S/ßTh) *
  • S, C Disease (Hb S/C) *
  • Sickle Cell Anemia (Hb SS) *

 

4. Other Disorders

  • Adrenoleukodys-trophy (ALD)
  • Biotinidase Deficiency (BIOT) *
  • Classic Galactosemia (GALT) *
  • Congenital Toxoplasmosis (TOXO)
  • Critical Congenital Heart Disease (CCHD) *
  • Cystic Fibrosis (CF) *
  • Formiminoglutamic Acidemia (FIGLU)
  • Galactoepimerase Deficiency (GALE)
  • Galactokinase Deficiency (GALK)
  • Hearing loss (HEAR)
  • Human Immunodeficiency Virus (HIV)
  • Hyperornithinemia-Hyperammonemia-Homocitrullinuria Syndrome (HHH)
  • Pyroglutamic Acidemia (5-OXO)
  • Severe Combined Immunodeficiency (SCID) *
  • T-cell Related Lymphocyte Deficiencies

 

5. Lysosomal Storage Disorders

  • Fabry (FABRY)
  • Gaucher (GBA)
  • Krabbe
  • Mucopolysaccharidosis Type-I (MPS I)
  • Mucopolysaccharidosis Type-II (MPS II)
  • Niemann-Pick Disease (NPD)
  • Pompe (POMPE)

 

See more at: www.babysfirsttest.org


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Comparing instance prices on the Amazon cloud Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/comparing-instance-prices-on-the?blog=2 2017-04-13T11:20:00Z 2017-04-20T09:23:30Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300


As the largest cloud computing company Amazon Web Services (AWS) offers various options to use compute-power on a "as-needed" basis. You can choose what size and type of machine, what number of machines - and you can choose a price model where you are "bidding" for the resource. This means you might have to wait longer to get it, but you will get an impressive discount! You can choose your machines from the AWS dashboard.

 

Here is a comparison of the current prices for "General Purpose - Current Generation" AWS machines in the EU (Frankfurt) region (as of 13/04/2017):

vCPU ECU Memory (GiB) Instance Storage (GB) Linux / UNIX Usage On-Demand Price per Hour Spot Price per Hour Saving %
m4.large 2 6.5 8 EBS Only $0.129 $0.0336 74
m4.xlarge 4 13 16 EBS Only $0.257 $0.0375 85
m4.2xlarge 8 26 32 EBS Only $0.513 $0.1199 77
m4.4xlarge 16 53.5 64 EBS Only $1.026 $0.3536 66
m4.10xlarge 40 124.5 160 EBS Only $2.565 $1.1214 56
m4.16xlarge 64 188 256 EBS Only $4.104 $0.503 88
m3.medium 1 3 3.75 1x4 SSD $0.079 $0.0114 86
m3.large 2 6.5 7.5 1x32 SSD $0.158 $0.0227 86
m3.xlarge 4 13 15 2x40 SSD $0.315 $0.047 85
m3.2xlarge 8 26 30 2x80 SSD $0.632 $0.1504 76

 This only shows a selection of machine options and the prices obviously change over time - but the message should be clear...

 


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Software Requirements Specification Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/software-requirements-specification-1?blog=2 2017-03-02T17:23:00Z 2017-03-02T17:23:33Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

For any large software project (i.e. one that requires more than a few scripts preforming a one-off task) and for every project that was initiated by a customer request, it is useful to precisely define the requirements before starting to write any code. This might be painful at times and slow down the coding fun, but it should avoid a lot of frustration on either side in the end.

Here is a short summary of what Software Requirements Specification (SRS) (IEEE 830) are, how to write them, what they are good for.

SRS is a complete description of the behavior of a system to be developed, including use cases.

The benefits of writing specifications when planning a software project are:

  • Establish the basis for agreement between the customers and the suppliers on what the software product is to do.
  • Reduce the development effort by avoiding redesign, recoding, and retesting and revealing omissions, misunderstandings, and inconsistencies early in the development cycle.
  • Provide a basis for estimating costs and schedules.
  • Provide a baseline for validation (comparison against what the customer needs) and verification (comparison with the formal specifications).
  • Facilitate transfer to new users or new machines.
  • Serve as a basis for enhancement.

Key points to address:

  • Required functionality.
  • External interfaces.
  • Performance.
  • Attributes.
  • Design constraints imposed on an implementation.

Avoid design details and coding details in the specs. Hardware requirements etc. go into general System Specifications, not SRS. The content and language of the document should fit the description with the following key words:

Complete, Consistent, Accurate, Modifiable, Ranked, Testable, Traceable, Unambiguous, Valid, Verifiable

Descriptions of "use cases", mock-up GUI components and other visual aids are extremely useful to communicate with the parties involved.

Sources:
Wikipedia
www.microtoolsinc.com
www.techwr-l.com


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
BCL files Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/bcl-files?blog=2 2016-12-30T13:58:00Z 2016-12-30T13:58:46Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

As part of the Primary Analysis Illumina sequencing machines measure the intensity of the channels used for encoding the different bases and identify the most likely base at a given position of a sequencing read (tag). The Real Time Analysis (RTA) software writes the base and the confidence in the call as a quality score to base call (.bcl) files. As the name implies this is done in real time, i.e. for every cycle of the sequencing run a call for every location identified on the flow cell (tiles and lanes) is added. Bcl files are stored in binary format and represent the raw data output of a sequencing run. The format is described here. Software such as Casava/BclToFastq, Eland or the iSAAC aligner can make use of these files.

The *.bcl files are stored in the BaseCalls directory:

<run directory>/Data/Intensities/BaseCalls/L<lane>/C<cycle>.1

They are named in the format:

s_<lane>_<tile>.bcl

If you want to overcome errors during downstream processing from missing calls, software such as iSAAC and configureBclToFastq have an "--ignore-missing-bcl" command line option. This will interpret missing *.bcl files as no call (N) at that position.

Sources: Illumina, SeqAnswers


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Embryo Morphology Assessment Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/embryo-morphology-assessment?blog=2 2015-07-08T12:14:00Z 2017-03-01T11:23:28Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Some researchers and clinicians believe embryo morphology and development characteristics can be used to assess the viability of IVF embryos to increase chances of a successful pregnancy.

Healthy embryos, i.e. the most viable zygotes that will develop into blastocysts and further seem to follow a specific growth pattern between development day 3 and re-implantation on day 5:
Growth from 2 to 3 cells should be seen in 9 - 11 hours, from 3 to 4 cells in under 2 hours. Reaching day 5 is a critical as the embryo will be re-implanted into the uterus and will attach to the endometrium. The normal development process is shown in figure 1 (source: CMFT NHS):

Embryo Morphology Assessment

Embryo morphology is graded on a scale of 1 to 5 as shown in fig 2 (source: CMFT NHS):

Embryo Morphology Assessment

Embryo cell division can be monitored through the use of an "embryoscope", an incubator with integrated camera. Time-lapse pictures are analysed by an embryologist to help select viable embryos. Systems that help the monitoring process are e.g. the "Early Embryo Viability Assessment" (Eeva) software by Auxogyn.

Embryo Morphology Assessment

Cell tracking and embryo assessment with Eeva (YouTube)

Further readings:

* http://www.ivf.com/morphology.html


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Cystic Fibrosis and its Analysis Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/cystic-fibrosis?blog=2 2015-06-17T13:03:00Z 2015-08-25T08:18:41Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Cystic Fibrosis, also called Mucoviscidos, is a hereditary disease (autosomal recessive) in which exocrine (secretory) glands produce abnormally thick mucus. This mucus can cause problems in digestion, breathing, and body cooling. It affects up to one out of 3000 newborns (with northern European ancestry). There are well over a hundred genetic changes linked to CF. It is an area companies like Illumina are very active in with a special assay cleared as an in-vitro diagnostic test with the FDA for the detection of most of the genetic variants known to cause the disease.

Here are notes from a presentation Dr. Carlos Bustamante gave at a recent ClinGen conference:

Background for CF and CFTR

CFTR:

  • Cysstic Fibrosis Transmembrane Conductance Regulator
  • ABC transporter (ATP-binding cassette), that functons as ion channel
  • cAMP-regulated through R domain phosphorylaton
  • Transports chloride and thiocyanate across epithelial cell membranes
  • 1,480 amino acids 

CF disease:

  • Most common autosomal recessive disorder among Caucasians (1/3,300)
  • Dysregulaton of epithelial fluid transport in lung, pancreas, and other organs
  • ~ 2,000 identfied gene mutatons
  • Phe508del – most common, in 70% cases
  • Wide range of severity, most die of pulmonary disease at mean age of 37

Variants:

  • ~70% of variants currently classed as VUS (variant of unknown significance)
  • ~65% are missense mutations, 24% frameshift & stop-gained, 9% synonymous

Testing Machine Learning Approaches for CF classification

  • Machine learning algorithms show higher performance when compared with separate predictors
  • Tree-based methods perform the best (GBM & RF AUC is 6% higher then the best predictor, MutPred)
  • Top features: MutPred, AF, SIFT, CADD, POSE
  • Predicted pathogenicity probability (RF.pred) correlates with available experimental data for Cl- conductance and sweat Cl-

 Other sources used: PubMedHealth, Wikipedia


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
CRAM format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/cram-format?blog=2 2015-01-07T10:12:00Z 2015-05-05T13:45:40Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

CRAM files are compressed versions of BAM files containing (aligned) sequencing reads. They represent a further file size reduction for this type of data that is generated at ever increasing quantities. Where SAM files are human-readable text files optimized for short read storage, BAM files are their binary equivalent, and CRAM files are a restructured column-oriented binary container format for even more efficient storage.

Tke key components of the approach are that positions are encoded in a relative way (i.e., the difference between successive positions is stored rather than the absolute value) and stored as a Golomb code. Also, only differences to the reference genome are listed instead of the full sequence.

The compression rates achieved are shown in the graph below generated by Uppsala University:

File size comparisons of SAM, BAM, CRAM

Comparing speed: Using the C implementation of for CRAM (James K. Bonfield), decoding is 1.5–1.7× slower than generating BAM files, but 1.8–2.6× faster at encoding. (File size savings are reported at 34–55%.(

Additional compression can be achieved by reducing the granularity of the quality values which will result in lossy compression though. Illumina suggested a binning of Q scores without significant calling performance.

Binning of similar Q-scores (Illumina):

qscore binning

Compression achieved by Q-score binning (Illumina):

qscore compression

Sources and further reading:

  1. Format definition and usage
  2. cram-toolkit
  3. Detailed report at the Uppsala University
  4. SAMtools with CRAM support
  5. Original article from Markus Hsi-Yang Fritz, Rasko Leinonen, Guy Cochrane and Ewan Birney
  6. Article about the implementation in C
  7. Illumina while paper on Qscore compression

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Barcode Balancing for Illumina Sequencing Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/barcode-balancing-for-illumina-sequencing?blog=2 2014-11-04T21:42:00Z 2015-05-05T13:41:58Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

HiSeq & MiSeq
The HiSeq and MiSeq use a green laser to sequence G/T and a red laser to sequence A/C. At each cycle at least one of two nucleotides for each color channel must be read to ensure proper registration. It is important to maintain color balance for each base of the index read being sequenced, otherwise index read sequencing could fail due to registration failure. E.g. if the sample contains only T and C in the first four cycles, image registration will fail. (If possible spike-in phiX sequence to add diversity to low-plex sequencing libraries.)
If one or more bases are not present in the first 11 cycles the quality of the run will be negatively impacted. This is because the color matrix is calculated from the color signals of these cycles.

NextSeq 500
The NextSeq 500 uses two-channel sequencing, which requires only two images to encode the data for four DNA bases, one red channel and one green channel. The NextSeq also uses a new implementation of real-time analysis (RTA) called RTA2.0, which includes important architecture differences from RTA on other Illumina sequencers. For any index sequences, RTA2.0 requires that there is at least one base other than G in the first two cycles. This requirement for index diversity allows the use of any Illumina index selection for single-plex indexing except index 1 (i7) 705, which uses the sequence GGACTCCT. Use the combinations in the table below for proper color balancing on the NextSeq 500.

Source:
Illlumina Nextera tech notes, Illumina Low diversity note
See also TruSeq Guide


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
NGS reads and their Scores Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ngs-reads-and-their-scores?blog=2 2014-08-01T14:42:00Z 2017-07-20T09:47:03Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Quality scoring of the base calls

"Quality scores measure the probability that a base is called incorrectly. With SBS technology, each base in a read is assigned a quality score by a phred-like algorithm, similar to that originally developed for Sanger sequencing experiments. The quality score of a given base, Q, is defined by the equation
Q = -10log10(e)
where e is the estimated probability of the base call being wrong. Thus, a higher quality score indicates a smaller probability of error."(1)
The quality score is usually quoted as QXX, where the XX is the score and refers to that a particular call (or a all base calls of a read / of a sample / of a run) has a probability of error of 10^(-XX/10). For example Q30 equates to an error rate of 1 in 1000, or 0.1%, Q40 equates to an error rate of 1 in 10,000 or 0.01%.

During the primary analysis (real-time analysis, RTA) on the sequencing machine, quality scoring is performed by calculating a set of predictors for each base call, and using those predictor values to look up the quality score in a quality table. The quality table is generated using a modification of the Phred algorithm on a calibration data set representative of run and sequence variability

"It is important to note how quickly or slowly quality scores degrade over the course of a read. With short-read sequencing, quality scores largely dictate the read length limits of different sequencing platforms. Thus, a longer read length specification suggests that the raw data from that platform have consistently higher quality scores across all bases." (1)

Mapping / Alignment scores

For each alignment, BWA calculates a mapping quality score, which is the (Phred-scaled) probability of the alignment being incorrect. The algorithm is similar between BWA and MAQ, except that BWA assumes that the true hit can always be found. The probability for every base is calculated as:

p = 10 ^ (-q/10)

where q is the quality. For example a mapping quality of 40: 10^-4 = 0.0001, which means there is a 0.01% chance that the base is aligned incorrectly.

Example for a whole read:

If your read is 25 bp long and the expected sequencing error rate is 1%, the probability of the read with 0 errors is:

0.99^25 = 0.78

If there is 1 perfect alignment and 5 possible alignment positions with 1 mismatch, we combine these probabilities: The probability of the read with 1 error is
0.20
combined posterior probability that the best alignment is correct:

P(0-errors) / (P(0-errors) + 5 * P(1-errors))

= 0.44, which is low.

Base quality is apparently not considered in evaluating hits in bwa.

Sources:

  1. Illumina
  2. BWA paper
  3. DaveTang blog
  4. jwfoley on SEQanswers
  5. Ying Wei's notes
  6. Gene-Test bioinformatics (PGS / NGS) consulting

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Mount Windows share in Linux system Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/mount-windows-share-in-linux?blog=2 2014-08-01T14:42:00Z 2014-12-18T14:12:36Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Using a text editor, create a file for your remote servers logon credential:

gedit ~/.smbcredentials

Enter your Windows username and password in the file:

username=msusername
password=mspassword
chmod 600 ~/.smbcredentials

Edit your /etc/fstab file:

//servername/sharename /media/windowsshare cifs credentials=/home/ubuntuusername/.smbcredentials,iocharset=utf8,sec=ntlm 0 0 
sudo mount -a

Ref: https://wiki.ubuntu.com/MountWindowsSharesPermanently


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Testing for Equivalence Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/testing-for-equivalence?blog=2 2014-08-01T14:41:00Z 2017-05-05T14:17:58Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

 

To assess whether a new test (e.g. a diagnostic tests or medical device testing for disease or non-disease status) is equivalent to an existing test, the following measures can be reported. They can be of importance for the submission of premarket notification (510(k)) or premarket approval (PMA) applications for diagnostic devices (tests) to the American Food and Drug Administration (FDA). A new test is usually compared to an existing and established test or a general trusted reference. If the existing test (or reference) is not perfect, the FDA recommends to report the positive and negative percent agreement (PPA/NPA). This is calculated using false positives, true positives, false negative and true negatives and calculated like this (1):

  Existing Test  
New Test R+ R-  
T+ TP FP TP+FP
T- FN TN FN+TN
  TP+FN FP+FN TP+FP+FN+TN

  PPA = TP * 100 / (TP + FN)

  NPA = TN * 100 / (TN + FP)

Measures of accuracy

The FDA "recommends you report measures of diagnostic accuracy (sensitivity and specificity pairs, positive and negative likelihood ratio pairs) or measures of agreement (percent positive agreement and percent negative agreement) and their two-sided 95 percent confidence intervals. We recommend reporting these measures both as fractions (e.g., 490/500) and as percentages (e.g., 98.0%)." (2) Sensitivity and specificity are explained here. In general th FDA recommends to report (2):

  • the 2x2 table of results comparing the new test with the non-reference standard
  • a description of the non-reference standard
  • measures of agreement and corresponding confidence intervals.


References:

  1. Workshop notes (B. Biswas): Assessing agreement for diagnostic devices  
  2. FDA recommendation "Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests" 
  3. STAndards for the Reporting of Diagnostic accuracy studies (STARD)
  4. Wikipedia page

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
SNP calling & the VCF format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/snp-calling-the-vcf-format?blog=2 2014-05-16T11:00:00Z 2014-12-18T14:34:10Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

SNP calling refers to the process of identifying posititions where the genome of a sequenced sample differs to that of the reference genome. This might lead to finding disease-causing genomic alterations.
In the following I wanted to re-align short NGS reads against a specific reference (in this case the Mitochondrial genome sequence). A simple way is to use samtools.

1. make a reference genome index

bwa index -a is NCBI_chrM.fa

2. filter reads

samtools view -F4 –hb A1_S1.bam chrM > A1_S1_chrM.bam
samtools view -f4 -hb A1_S1.bam > A1_S1_unmapped.bam
samtools merge A1_S1_chrM_and_Un.bam A1_S1_chrM.bam A1_S1_unmapped.bam

3. create fastq

bamToFastq -i A1_S1_chrM_and_Un.bam -fq A1_S1_chrM_and_Un.1.fq \
 -fq2 A1_S1_chrM_and_Un.2.fq 2> bwa.err &

4. align to new reference

bwa aln NCBI_chrM.fa A1_S1_chrM_and_Un.1.fq > A1_S1_chrM_and_Un.1.sa
bwa aln NCBI_chrM.fa A1_S1_chrM_and_Un.2.fq > A1_S1_chrM_and_Un.2.sai
bwa sampe NCBI_chrM.fa A1_S1_chrM_and_Un.1.sai A1_S1_chrM_and_Un.2.sai \
 A1_S1_chrM_and_Un.1.fq A1_S1_chrM_and_Un.2.fq > A1_S1_chrM_realigned.sam
samtools view -F4 -Sbh A1_S1_chrM_realigned.sam \
 | samtools sort -o - sorted > A1_S1_chrM_realigned.bam

5. call SNPs

samtools mpileup -uD -f NCBI_chrM.fa A1_S1_chrM_realigned.bam \
 | bcftools view -cg - > A1_S1_chrM_realigned.sam.vcf

From the Samtools help pages:
One should consider to apply the following parameters to mpileup in different scenarios:

  • Apply -C50 to reduce the effect of reads with excessive mismatches. This aims to fix overestimated mapping quality and appears to be preferred for BWA-short.
  • Given multiple technologies, apply -P to specify which technologies to use for collecting initial INDEL candidates. It is recommended to find INDEL candidates from technologies with low INDEL error rate, such as Illumina. When this option is in use, the value(s) following the option must appear in the PL tag in the @RG header lines.
  • Apply -D and -S to keep per-sample read depth and strand bias. This is preferred if there are more than one samples at high coverage.
  • Adjust -m and -F to control when to initiate indel realignment (requiring r877+). Samtools only finds INDELs where there are sufficient reads containing the INDEL at the same position. It does this to avoid excessive realignment that is computationally demanding. The default works well for many low-coverage samples but not for, say, 500 exomes. In the latter case, using -m 3 -F 0.0002 (3 supporting reads at minimum 0.02% frequency) is necessary to find singletons.
  • Use `-BQ0 -d10000000 -f ref.fa' if the purpose is to get the precise depth of coverage rather than call SNPs. Under this setting, mpileup will count low-quality bases, process all reads (by default the depth is capped at 8000), and skip the time-demanding BAQ calculation.
  • Apply -A to use anomalous read pairs in mpileup, which are not used by default (requring r874+).

The VCF format

The Variant Call Format (VCF) is the emerging standard for storing variant data. Originally designed for SNPs and short INDELs, it also works for structural variations.

VCF consists of a header and a data section. The header must contain a line starting with one '#', showing the name of each field, and then the sample names starting at the 10th column. The data section is TAB delimited with each line consisting of at least 8 mandatory fields (the first 8 fields in the table below).

Col	Field	Description
1	CHROM	Chromosome name
2	POS	1-based position. For an indel, this is the position 
		preceding the indel.
3	ID	Variant identifier. Usually the dbSNP rsID.
4	REF	Reference sequence at POS involved in the variant. 
		For a SNP, it is a single base.
5	ALT	Comma delimited list of alternative seuqence(s).
6	QUAL	Phred-scaled probability of all samples being 
		homozygous reference.
7	FILTER	Semicolon delimited list of filters that the variant 
		fails to pass.
8	INFO	Semicolon delimited list of variant information.
9	FORMAT	Colon delimited list of the format of individual 
		genotypes in the following fields.
10+	Sample(s)  Individual genotype information defined by FORMAT.

The following table gives the INFO tags used by samtools and bcftools.

Tag	Description
AC	Allele count in genotypes
AC1	Max-likelihood estimate of the first ALT allele count 
	(no HWE assumption)
AF1	Max-likelihood estimate of the first ALT allele frequency 
	(assuming HWE)
AN	Total number of alleles in called genotypes
CGT	The most probable constrained genotype configuration in the trio
CLR	Log ratio of genotype likelihoods with and without the constraint
DP	Raw read depth (sum for all samples)
DP4	Number of high-quality ref-forward bases, ref-reverse, alt-forward 
	and alt-reverse bases
FQ	Phred probability of all samples being the same
G3	ML estimate of genotype frequencies
HWE	Hardy-Weinberg equilibrium test (PMID:15789306)
ICF	Inbreeding coefficient F
INDEL	Indicates that the variant is an INDEL.
IS	Maximum number of reads supporting an indel and fraction of 
	indel reads
MDV	Maximum number of high-quality nonRef reads in samples
MQ	Root-mean-square mapping quality of covering reads
PC2	Phred probability of the nonRef allele frequency in group1 samples 
	being larger (, smaller) than in group2.
PCHI2	Posterior weighted chi2 P-value for testing the association 
	between group1 and group2 samples.
PR	Number of permutations yielding a smaller PCHI2.
PV4	P-values for strand bias, baseQ bias, mapQ bias and tail 
	distance bias
QBD	Quality by Depth: QUAL/#reads
QCHI2	Phred scaled PCHI2.
RP	# permutations yielding a smaller PCHI2
RPB	Read Position Bias
SF	Source File (index to sourceFiles, f when filtered)
TYPE	Variant type
UGT	The most probable unconstrained genotype configuration in the trio
VDB	Variant Distance Bias (v2) for filtering splice-site artefacts 
	in RNA-seq data.

Sources:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Linux Firewall Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/linux-firewall?blog=2 2013-07-04T13:58:00Z 2013-07-31T08:40:31Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To block a specific IP address from network access to your (Ubuntu Linux) system, you can add it to your firewall settings:
sudo iptables -A INPUT -s 223.4.208.56 -j DROP
To remove this entry:
sudo iptables -D INPUT -s 223.4.208.56 -j DROP
To just list current firewall rules:
sudo iptables -L

Sources: cyberciti.biz


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
GC content of human chromosomes Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/gc-content-of-human-chromosomes?blog=2 2013-04-16T12:41:00Z 2013-05-03T09:18:29Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The GC content is the molar ratio of guanine+cytosine bases in DNA. The human genome is a mosaic of GC-rich and GC-poor regions, of around 300kb in length, called isochores. GC content is an important factor in many experiments and bioinformatic analysis. This is especially true for next-generation sequencing where the DNA being sequenced has gone through multiple rounds of PCR amplification.

Average GC content per chromosome:

1   0.417439
2   0.402438
3   0.396943
4   0.382479
5   0.395163
6   0.396109
7   0.407513
8   0.401757
9   0.413168
10  0.415849
11  0.415657
12  0.40812
13  0.385265
14  0.408872
15  0.42201
16  0.447894
17  0.455405
18  0.39785
19  0.483603
20  0.441257
21  0.408325
22  0.479881
X   0.394963
Y   0.391288
MT  0.443626

The common way to reduce the GC bias in data analysis is to basically

  1. calculate to GC ratio (number of G/C bases / number of bases) in the region of interest (ROI) being measured
  2. find average value measured (a) across the genome in all regions with this ratio
  3. normalize the value measured in the ROI (m) with this value: m/a

More details on the GC bias in next-gen sequencing is described by Benjamini and Speed here: " The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. (...) It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias."

Correcting the bias can follow a "read model", "fragment model" or a "global model".

Sources: biostars.org, PubMed, PubMed

See also: Chromosome length table


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Windows Task Scheduler Error Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/windows-task-scheduler-error?blog=2 2013-04-16T12:40:00Z 2013-04-16T12:40:48Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

A scheduled task on Microsoft Windows 2008 failed "due to a time trigger condition" and with the error message including "Data: Error Value 2147943726." after running without problems before.
The reason for this was that the network-wide password for the user account assigned to running the task, had been changed since setting up the task.
Re-opening the task properties (double-click in the "Active Tasks" list and select "Options" from the right-hand menue") and saving with the new password fixed the problem.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Chromosome lengths Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/chromosome-lengths?blog=2 2013-03-20T10:10:00Z 2015-07-27T14:06:16Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Here is a quick list of the sizes of human chromosomes in assembly GRCh37 as defined by Ensembl:

chrom	 length [bp]
 1	 249,250,621 
 2	 243,199,373 
 3	 198,022,430 
 4	 191,154,276 
 5	 180,915,260 
 6	 171,115,067 
 7	 159,138,663 
 8	 146,364,022 
 9	 141,213,431 
10	 135,534,747 
11	 135,006,516 
12	 133,851,895 
13	 115,169,878 
14	 107,349,540 
15	 102,531,392 
16	  90,354,753 
17	  81,195,210 
18	  78,077,248 
19	  59,128,983 
20	  63,025,520 
21	  48,129,895 
22	  51,304,566 
X	 155,270,560 
Y	  59,373,566 
Mt	      16,569
Chromosome lengths

These sizes are useful for calculations of percent coverage of genomic features or sequencing reads.
They are often required when working with BED files.

Related: Chromosome ideograms and nomenclature, chromosome GC content


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Simple Website monitor Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/simple-website-monitor?blog=2 2012-11-02T11:42:00Z 2012-11-14T10:56:47Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

There are many sophisticated services and scripts to monitor the accessability of your website or various aspects of your web server. Check stack-overflow and look at Monastic for examples. Here is a very simple solution I needed to monitor the availability of a specific server using its IP address within the internal network. Is was necessary after the server's IP address, that is used in third party software to provide specific services, was "stolen" by other machines. In this case the DNS server assigned the IP address that should have been reserved to mobile devices that connected to the wireless network.

This approach is simply fetching a website from a specific URL using the "reserved" IP and looks for a word/pattern you know should be there. The script is run on a second machine (host name "ubuntu64"), an Ubuntu VM. (It is not using any additional security measures you will want to use if you expose the machine externally.)

Prepare second machine to send notification emails:
Install sendmail, sendemail, mailutils, sensible-mda (to have the whole set).
Add/modify entry in /etc/hosts:

127.0.1.1 ubuntu64.network.local ubuntu64

run "sudo sendmailconfig"
test with

Code

sendemail -q -f cron@ubuntux64.network.local -t my@email.com -u "mailtest" -m "mail works!"

Write bash script to get and check website and send alert emails:

Code

# define address and pattern to expect
address='192.168.1.1/phpmyadmin/main.php'
searchword='phpMyAdmin'
 
# define alert email
sender="cron@ubuntux64"
receiver="my@email.com"
body="system on machine 192.168.1.1 at risk"
subj="Important server unresponsive";
 
# fetch page and look for pattern
resp=`wget -q -O - $address | grep -c $searchword`
if [ $resp -lt 1 ]; then
  sendemail -q -f $sender -t $receiver -u $subj -m $body
fi

Add a crontab entry to automatically run this script every 10 minutes:

Code

*/10 * * * * sh /home/user/server_check.sh

Additional improvements could include the options to stop alerting after a specific number of alerts or checking the response time.

Alternatively you can just look up the MAC address associated with the "reserved" IP and compare it to the known physical address of your server and wrap this up into a little script:

>arp -a 192.168.1.1

Interface: 192.168.1.152 --- 0xb
  Internet Address      Physical Address      Type
  192.168.1.1           00-11-18-2c-2e-6d     dynamic


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
ENCODE publication interview Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/encode-publication-interview?blog=2 2012-10-01T15:08:00Z 2014-09-01T09:55:21Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Following on from the publication of the main papers of the ENCODE (Encyclopedia Of DNA Elements) scale-up phase, I gave an interview to BlueGnome's marketing team for the Newstrack customer newsletter in 2012.

These are my personal opinions, not my employer's (past or present). They might be of interest to researcher's considering to join a large-scale project like this.

Q. What was it like to be part of the ENCODE project?
It was a great experience to work on a project of this scale with more than 400 scientists from 32 groups spread across the globe. Many of them are the leaders in their field, but at consortium meetings and the many phone conferences everyone could contribute. The amount of data and different technologies was overwhelming at times, so I think it’s an impressive achievement how this project was run and now the findings have been published.

Q. What are the main outcomes of the project?
There has been a very lively discussion about the outcome and how it was presented. In my opinion, the most important result is the data itself. ENCODE has created an enormous repository of measurements across the human genome that has been compiled in a systematic and standardised way. The data will be the basis of future research trying to understand genomic processes involved in basic cellular processes as well as in various diseases.
ENCODE has pushed the development of standards and new applications to interrogate the genome, in particular using sequencing technologies.
The results also remind us that there is a lot of activity in the genome that we currently do not fully understand. Up to 80% of the human genome is biochemically active, there are thousands of additional (non-coding) genes in introns and in the intergenic space, and up to 75% of the genome is transcribed at some point. These observations paint a very dynamic genomic landscape, with overlapping active zones and signals of different complexity, indicating, that we have to keep the concept of genes and genome regulation pretty flexible in our mind.

Q. What are potential implications for BlueGnome and
its customers?

I’m afraid the interpretation of CNV regions is getting even more complex as regulatory regions far away from the actual disease genes might be relevant for cases the clinical customers might come across. This is especially true for the interpretation of cancer profiles – which is highly complex already. We won’t be able to use these new interconnections directly in most cases, but we are looking through the data and have started to incorporate the knowledge by providing new genome-wide annotation data sets as optional BED files on the BlueGnome website, e.g. with GWAS results and regulatory element locations.

Q. Where do you see the human genome in 5 years’ time?
ENCODE is entering its next phase now to extend the catalogue to many additional cell lines as well as the mouse genome. With the recent publications scientists around the world are now more aware of this data and how to use it, so my hope is that we will see an acceleration in algorithm development, data mining and scientific findings. In 5 years we still won’t understand the genome entirely, but we should have a complete parts list and more connections between the parts. Some of these will be clinically relevant to allow progress in understanding and fighting today’s ‘big killers’ like certain types of cancer.

Q. Would you personally be interested in having your genome sequenced?
As a data exploration exercise I would find this really interesting, but the definitive answers you can get from it are still limited today. I would certainly want to make sure this data is kept private and under my control. With BlueGnome now being part of Illumina we can actually help to develop these ideas further.

Further information: Nature's Encode portal, "An integrated encyclopedia of DNA elements in the human genome" publication, Guardian Interview with Ewan Birney


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
SAM format summary Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/sam-format-summary?blog=2 2012-08-30T14:53:00Z 2013-05-22T13:34:54Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. It is a text format for storing sequence data in a series of tab delimited ASCII columns and is commonly used in next-generation sequencing data processing. It is the (non-binary) human-readable version of the BAM format and contains information about the read and the aligned position in the genome. It was developed by Heng Li in Richard Durbins group and others, their paper is here.

After a header section the alignment section describes all results of the aligned read data. The format is best explained with an example line:

Code

1:497:R:-272+13M17D24M  113  1  497  37  37M  15  100338662  0  CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG  0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>  XT:A:U  NM:i:0  SM:i:37  AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:37
Fieldname	description	Example-data
QNAME	read name	1:497:R:-272+13M17D24M
FLAG	alignment flag	113
RNAME	alignment chromosome	1
POS	alignment start position	497
MAPQ	overall mapping quality	37
CIGAR	alignment CIGAR string	37M
MRNM/RNEXT	name of next alignm. in group (mate)	15
MPOS/PNEXT	pos. of next alignm. in group (mate)	100338662
ISIZE/TLEN	observed Template LENgth	0
SEQ	sequence	CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG
QUAL	quality per base	0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>
TAGs	further tags with alignment info
XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37

The tags are optional and might vary between alignment programs. Shown are examples from BWA. Important for filtering are usually the tags X0:i (numbers of genome alignments of this read) and XM:i (number of mismatches in alignment).

       Tag	Meaning
       NM	Edit distance
       MD	Mismatching positions/bases
       AS	Alignment score
       BC	Barcode sequence
       X0	Number of best hits
       X1	Number of suboptimal hits found by BWA
       XN	Number of ambiguous bases in the referenece
       XM	Number of mismatches in the alignment
       XO	Number of gap opens
       XG	Number of gap extentions
       XT	Type: Unique/Repeat/N/Mate-sw
       XA	Alternative hits; format: (chr,pos,CIGAR,NM;)*
       XS	Suboptimal alignment score
       XF	Support from forward/reverse alignment
       XE	Number of supporting seeds

The read name (at least from Illumina machines) are constructed as:

[instrument-name]:[run ID]:[flowcell ID]:[lane-number]:[tile-number]:
[x-pos]:[y-pos] [read number]:[is filtered]:[control number]:
[barcode sequence]

example:

@M01117:25:000000000-A37B9:1:1101:14984:1386 1:N:0:4

Sources:
genome.sph.umich.ed with further useful details, full specs.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Male infertility genetics Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/male-infertility-genetics?blog=2 2012-08-17T08:06:00Z 2015-07-08T14:31:12Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

10-15% of couples in the western world are faced with some kind of infertility issue, in almost half the cases there are (co-) factors on the male side.
Male infertility factors are often based on sperm abnormalities which can be categorized into:

  • Azoospermic: No sperm in the semen
  • Oligozoospermic: A low sperm count
  • Asthenozoospermic: poor sperm motility
  • Teratozoospermic: abnormal sperm morphology

The genetic region responsible for spermatogenesis and most of these abnormalities is located in the azoospermia factor (AZF) region on Yq11. It contains the sub-regions AZFa, AZFb and AZFc. Microdeletion in these regions are responsible for many genetic causes of male infertility. Alteratons in the region AZFc (which contains the genes PRY2, BPY2, DAZ and CDY1) is believed to be the most frequent molecularly defined cause of spermatogenic failure. This is caused by a high genomic variability, in fact AZFc is one of the most genetically dynamic regions in the human genome. This property may serve as counter against the genetic degeneracy associated with the lack of a meiotic partner, meaning that no exchange of genetic material with a counterpart chromosomal region from the mother can happen.
Intracytoplasmic sperm injection (ICSI) can result in pregnancies, but passes on the genetic infertility to any sons born.

It has been reported that the average sperm count for men in the western world has declined by up to 50% in the past 50 years. These findings are not conclusive however as different studies found different trends in the world. It seems clear however that the exposure to chemical compounds in our environment will influence the hormone balance and have an adverse effect on male fertility and promote diseases like testicular cancer.

Sources: srlworld.com, endotext.org, Page et al. (1999), Navarro-Costa et al. (2010).


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Display todays' Date with JavaScript Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/display-todays-date-with-javascript?blog=2 2012-08-10T08:43:00Z 2015-07-08T14:31:42Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To display the current date, day of the week and time on a web page, you don't want to refresh the entire page every sencond or minute. Instead you will want to use JavaScript to dynamically update just this date/clock display element. Here is the code for a display in the format

Friday, 10.8.2012    9:41:49

Code

<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
function startTime(){
  var today=new Date();
  var h=today.getHours();
  var m=today.getMinutes();
  var s=today.getSeconds();
  var month = today.getMonth() + 1
  var day = today.getDate()
  var myDays= ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
  var weekday = today.getDay()
  var wday = myDays[weekday]
  var year = today.getFullYear()
  // add a zero in front of numbers<10
  m=checkTime(m);
  s=checkTime(s);
  document.getElementById('txt').innerHTML=wday + ", " + day + "." + month + "." + year + "&nbsp;&nbsp;&nbsp;&nbsp;" + h+":"+m+":"+s;
  t=setTimeout(function(){startTime()},500);
}
 
function checkTime(i){
  if (i<10){
    i="0" + i;
  }
  return i;
}
</script>
</head>
 
<body onload="startTime()">
<div id="txt"></div>
</body>
</html>

Sources: trans4mind.com, w3schools.com


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Data Processing with Biopieces Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/biopieces?blog=2 2012-08-02T11:02:00Z 2015-07-08T14:35:23Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

There is a fine set of scripts that form an orderely pipeline (or framework) to process bioinformatics data on the Unix command line called biopieces. You can e.g. process sequencing (NGS) data like this:

Code

biopieces>
./read_fastq -n 1000 -i data/reads.fastq | ./plot_scores -t png -o data/scores.png --no_stream

to read the first 1000 sequences from a FASTQ file and plot the scores to an image file.
The result might look like this:
Data Processing with Biopieces

The general logic is
        read_data | calculate_something | write_results
with the data being passed through as a "stream" and all modules having the same interface to eachother. Installation instructions are here, on my Ubuntu VM I had to follow these steps:

  1. we need Perl, Ruby, Python, SVN. Install as needed.

    Code

    sudo apt-get install subversion
  2. get biopieces code:

    Code

    svn checkout http://biopieces.googlecode.com/svn/trunk/ biopieces cd biopieces svn checkout http://biopieces.googlecode.com/svn/wiki bp_usage
  3. check pre-requisites with the project's installer script

    Code

    bash biopieces_installer.sh
  4. missing Perl modules where listed nicely and could be installed as suggested.
  5. missing Ruby gems could not be installed due to incompatibilities, eg:

    Code

    sudo gem install RubyInline ERROR: Error installing RubyInline: ZenTest requires RubyGems version > 1.8.

    But the project supplies an excellent ruby installer on the downloads page to create a separate Ruby 1.9 installation, as the default 1.8 one is too old for biopieces, the newer one not officially supported on Ubuntu
  6. modify your ~/.bashrc file to include:

    Code

    export BP_DIR="$HOME/bin/biopieces"
    export BP_DATA="$HOME/bin/biopieces/BP_DATA"
    export BP_TMP="$HOME/bin/biopieces/tmp"
    export BP_LOG="$HOME/bin/biopieces/BP_LOG"
    export PATH="/home/test/bin/biopieces/ruby_install/bin:/home/test/bin/biopieces/biopieces/bp_bin:$PATH"
    export RUBYLIB="/home/test/bin/biopieces/biopieces/code_ruby/lib:$RUBYLIB"
    export PERL5LIB="/home/test/bin/biopieces/biopieces/code_perl:$PERL5LIB"

    Code

    source ~/.bashrc
    mkdir $BP_DATA $BP_TMP $BP_LOG
    The Ruby and Perl lib definitions are necessary avoid errors like

    Code

    cannot load such file -- maasha/biopieces (LoadError)
    ----
    Can't locate Maasha/Fasta.pm in @INC

Some of the almost 200 methods that are implemented in biopieces at this time include:

  • read and write various formats like bed, tab, gff, fasta, fastq
  • blast sequences against eachother or against a genome
  • calculate the N50 value for a set of sequences
  • create statistics about the exon, intron, etc. content of a (12-column) BED file

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Building Config Files from a Skeleton Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/config-file-from-skeleton?blog=2 2012-07-12T08:03:00Z 2012-07-29T14:37:55Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To run programs or pipelines automatically it is often necessary to create or adjust configuration files. Ideally this should be done dynamically by a script from a skeleton (layout) file, replacing placeholder with the adjusted values. This can be done with a unix shell script that even contains the skeleton within:

Code

#! /bin/sh
# pass in variables from command-line arguments
prog=$1
var1=$2
var2=$3
 
# do other required tasks
# ...
 
# config skeleton
template='#config file for pipeline
parameter_1=$var1
parameter_2=$var2'
 
# Generate file output.txt from variable
# $template using placeholders above.
echo "$(eval "echo \"$template\"")" \
> $outputfile
 
# run the specified program
# with the new config file
./${prog} -conf ${outputfile}

Save as script.sh and call with parameters:
sh script.sh program_name par1 par2

Source: stackoverflow


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Analysing Variation with Ensembl and PolyPhen Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/analysing-variation-with-ensembl-and-polyphen?blog=2 2012-05-28T11:28:00Z 2012-08-07T08:19:56Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Ensembl variation resources provide information about structural variants and sequence variants (including Single Nucleotide Polymorphisms (SNPs), insertions, deletions and somatic mutations in the human genome. Details and references are described on the web site and in Chen et al. (2010) Ensembl Variation Resources, BMC Genomics and other publications listed in the site.

Sources and Descriptions currently included in Ensembl variation resources (v67):

  • dbSNP - Variants (including SNPs and indels) imported from dbSNP
  • DGVa - Database of Genomic Variants Archive
  • NHGRI_GWAS_catalog - Variants associated with phenotype data from the NHGRI GWAS catalog
  • COSMIC - Somatic mutations found in human cancers from the COSMIC project
  • EGA - Variants imported from the European Genome-phenome Archive with phenotype association
  • Uniprot - Variants with protein annotation imported from Uniprot
  • HGMD-PUBLIC - Variants from HGMD-PUBLIC dataset March 2012
  • OMIM - Variations linked to entries in the Online Mendelian Inheritance in Man (OMIM) database
  • Open Access GWAS Database - Johnson & O'Donnell 'An Open Access Database of Genome-wide Association Results' PMID:19161620
  • LSDB_LEPRE1 - LEPRE1 homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_PPIB - PPIB homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_CRTAP - CRTAP homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database
  • LSDB_FKBP10 - FKBP10 homepage - Osteogenesis Imperfecta Variant Database - Leiden Open Variation Database

Ensembl offers the possibility to run the underlying code on your own data and predict the functional consequences of known and unknown variants using the Variant Effect Predictor (VEP).

Internally the VEP uses PolyPhen which is further explained below:

For a given amino acid substitution in a protein, PolyPhen-2 extracts various sequence and structure-based features of the substitution site and feeds them to a probabilistic classifier to identify:

Sequence-based features include binding or linking sites, transmembrane regions, regulatory modification sites. Profile matrices are calculated to assess the likelihood of the occurrence of this amino acid at the given position.

Structural features include the comparison to known protein 3D structures in PDB, using DSSP (Dictionary of Secondary Structure in Proteins), accessible surface area and properties.

PolyPhen-2 also looks at functional significance of an allele replacement using the UniProtKB database. It uses the "HumDiv" classifier to find disease-related changes and "HumVar" for variations in the "normal" population.

Ensembl have now added a nice blog entry about this with some more details.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Sequence Mappability & Alignability Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/sequence-mappability-alignability?blog=2 2012-05-16T16:46:00Z 2019-07-15T08:46:21Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
Sequence Mappability &amp; Alignability

Sequence uniqueness within the genome plays an important part when attempting to map short sequence parts - e.g. next-generation short sequencing reads. It is one of the factors that can introduce a bias in sequencing or it's analysis - the other important factor being GC content (GC-rich sequences, eg. genic/exonic region, as well as very GC-poor regions are often under-represented (Bentley et al. 2008), mainly caused by amplificatin steps in the protocol). Reads mapped to multiple regions are often discarded, genomic regions with high sequence degeneracy / low sequence complexity therefor show lower mapped read coverage than unique regions, creating a systematic bias. The CRG Alignability tracks at the UCSC genome browser display how uniquely k-mer sequences align to a region of the genome. As you can see from the tracks, the mappability increases with read length: CRG mappability tracks for different read lengths at the UCSC browser For each window (of sizes 36, 40, 50, 75 or 100 nts), a mapability score was computed: S = 1 / (number of matches found in the genome), so S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on. Further desription in the publication of Thomas Derrien, Paolo Ribeca, et al. The data for these tracks can be downloaded, if you are working with other read lengths or genomes, you can run the software to generate the data yourself: Get the Gem library (latest version at GibHub), unpack it with tar xbvf GEM-libraries-Linux-x86_64.tbz2, create an index: [codeblock lang= line=1]gem-do-index -i genome.fasta -o gem_index[/codeblock] run the mappability part, eg. with a read length of 250: [codeblock lang= line=1]gem-mappability -I gem_index -l 250 -o mappability_250.gem[/codeblock] To query a specific region for its mappability you can also use this online tool http://surveyor.chgr.org/. An alternative is to look at the "uniqueome" data and publication. Refs:

  • Fast computation and applications of genome mappability. Derrien T, et al. PLoS One. 2012
  • The uniqueome: a mappability resource for short-tag sequencing. Koehler et al. Bioinformatics. 2011; 27(2): 272–274.
  • Blog post at MassGenomics
  • Systematic bias in high-throughput sequencing data and its correction by BEADS. Cheung et al. 2011
  • Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry. Bentley et al., Nature 2008

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Ruby Sorting Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ruby-sorting?blog=2 2012-05-09T11:46:00Z 2012-05-17T12:50:24Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Sorting (elements in an array) is a very common tasks in many scripts. A lot of research has gone into finding the most efficient way to sort.
In Ruby the "sort" function performs a standard comparison accoring to the data type inspected, but as in most other languages you can define any specific orders.

   open_orders.sort

is equivalent to

   open_orders.sort { |x, y| x <=> y }

The sort algorithm will assume that this comparison function/block will return a value accoring to the following logic (like the comparison operators):

    return -1 if x < y
    return  0 if x = y
    return  1 if x > y

So using this logic I can define a specific custom function to to compare the elements that need sorting and call it in the sort function afterwards. In my simple example I need to sort order numbers by two criteria: by a string first ("UK" before "ORD") and by ascending numbers afterwards.

Code

def custom_order_sorting(x_ord,y_ord)
    if(x_ord.match('UK')
       and y_ord.match('ORD'))
       #use UK first
       return -1
    elsif(x_ord.match('ORD')
       and y_ord.match('UK'))
       #use UK first
       return 1
    else
      #use smaller number first
      x_num = x_ord.match('\w(\d+)$')[1]
      y_num = y_ord.match('\w(\d+)$')[1]
      return x_num <=> y_num
    end
end
 
open_orders.sort!{|x,y| custom_order_sorting(x,y)}

Source: stackoverflow.com


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Genometastasis Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/genometastasis?blog=2 2012-05-09T09:23:00Z 2015-07-08T14:37:20Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The hypothesis of genometastasis was suggested by García-Olmo et al. more than a decade ago (1) and states (simplified) that normal cells could be turned into cancer cells through contact with (dying) cancer cells. In particular, "metastases might develop as a result of transfection of susceptible cells in distant target organs with dominant oncogenes that circulate in the plasma and are derived from the primary tumor." It can therefor be considered as a form of horizontal gene / DNA transfer. The updake of the genomic material was explained through apoptotic bodies from cancer cells as described by Holmgren et al. (2). The ideas were actually already described a century ago (6,7).
An alternative could be the involvement of a virus as a transmitter as described by zur Hausen (8).

In a later study (3) the same group could show that plasma from colorectal cancer patients could transform cultured cells oncogenically (fig 1):

Genometastasis

Further research of the group was published recently (4) describing the transformation of cells cultured from healthy individuals through particles from cultured colon cancer cells. Goldenberg et al. (5) could stablely transform cells between species through cell fusion, resulting in hamster cells that express human oncogenes.

The evidence for horizontal gene transfer, in particular that cancer cells, dying parts of the cells or even cell-free cancer DNA can induce malignancy is worrying. It is likely only possible under very specific conditions and with certain (aggressive) cancer types, but certainly an interesting research area to watch. If confirmed it could have dramatic effects on treatment strategies and could open up new methological possibilities for molecular research.

References:

  1. García-Olmo D, et al. (1999) Histol Histopathol. 14(4):1159-64.
    Tumor DNA circulating in the plasma might play a role in metastasis. The hypothesis of the genometastasis.
  2. Holmgren L, et al (1999) Horizontal transfer of DNA by the uptake of apoptotic bodies. Blood. 93:3956-3963.
  3. García-Olmo D, García-Olmo DC (2001) Ann N Y Acad Sci. 945:265-75. Functionality of circulating DNA: the hypothesis of genometastasis.
  4. García-Olmo D, et al. (2010) Cell-Free Nucleic Acids Circulating in the Plasma of Colorectal Cancer Patients Induce the Oncogenic Transformation of Susceptible Cultured Cells; Cancer Res. 70(2):560-7
  5. Goldenberg DM et al. (2011) Horizontal transmission and retention of malignancy, as well as
    functional human genes, after spontaneous fusion of human
    glioblastoma and hamster host cells in vivo. International Journal of Cancer 131,1
  6. Goldenberg DM (1968) Über die Progression der Malignität: Eine Hypothese [On the progression of malignancy: A hypothesis]. Klin Wochenschr; 46: 898–99
  7. Aichel O (1911) Über Zellverschmelzung mit qualitative abnormer Chromosomenverteilung als Ursache der Geschwulstbildung [On cell fusion with qualitative abnormal chromosome distribution as the cause of tumor formation]. In: Roux W, ed. Vorträge und Aufsätze über Entwicklungsmechanik der Organismen, Vol. 13
  8. zur Hausen, HPapillomaviruses Causing Cancer: Evasion From Host-Cell Control in Early Events in Carcinogenesis, J Natl Cancer Inst. 2000;92(9)

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Uniparental Disomy Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/uniparental-disomy?blog=2 2012-05-04T14:09:00Z 2015-07-08T14:38:28Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

In cases where two copies of the same chromosome, or part of a chromosome, from one parent and no copies from the other parent are present in the cell, we call it uniparental disomy (UPD). While all DNA information is present, the development of the cell (and the organism) is hindered because of missing / wrong epigenetic markers. The basic mechanism of how this faulty distribution of chromosomes can occur, is shown in fig.1.

Uniparental Disomy

Sources:

  • Wikipedia
  • Eggermann and Kotzot (2010) Uniparental disomy, Onset mechanisms and their relevance in clinical genetics [German], Medizinische Genetik

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Version Control with Perforce on the Command-line Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/version-control-with-perforce-on-the-command-line?blog=2 2012-04-19T09:12:00Z 2012-05-17T12:51:03Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Besides the visual client, the version control system Perforce can be operated through the command line (unix prompt or windows Dos window) and therefor be controlled through other programs like MatLab:

[status, result] = dos(p4command);

A reference manual is available, here are a few hints:
Check the environment settings:

p4 set
  P4CHARSET=winansi
  P4CLIENT=try1 (set)
  P4EDITOR=C:\Windows\SysWOW64\notepad.exe (set)
  P4PORT=perforce:1666
  P4USER=Felix_Kokocinski

end edit if necessary with

set P4CHARSET=winansi

P4EDITOR is optional, P4CLIENT is the checkout / workspace name.
The settings can also be set permanently in the visual client under
Edit / Preferences / Connection / Change Settings
If these are wrong you will get messages like "file(s) not on client".

Most common commands:
synchronize repository:

p4 sync

checkout file:

p4 edit filename.txt
  or
p4 edit //depot/path/in/perforce/filename.txt

submit changes:

p4 submit -d "description of changes" filename.txt

revert to version in repository:

p4 revert filename.txt

add new file:

p4 add filename.txt

get help:

p4 help

Here are some useful one-liners for various tasks.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
OMIM Symbols Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/omim-symbols?blog=2 2012-04-16T15:19:00Z 2012-08-09T11:10:54Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Online Mendelian Inheritance in Man is a manually reviewed catalog of human genes and regions involved in genetic disorders and traits. Each entry has a name and a number, e.g. "#154780 MARSHALL SYNDROME". According to the OMIM FAQs, these are the meanings of the the symbols preceding a MIM number:

  1. An asterisk (*) before an entry number indicates a gene.
  2. A number symbol (#) before an entry number indicates that it is a descriptive entry, usually of a phenotype, and does not represent a unique locus. The reason for the use of the number symbol is given in the first paragraph of the entry. Discussion of any gene(s) related to the phenotype resides in another entry(ies) as described in the first paragraph.
  3. A plus sign (+) before an entry number indicates that the entry contains the description of a gene of known sequence and a phenotype.
  4. A percent sign (%) before an entry number indicates that the entry describes a confirmed mendelian phenotype or phenotypic locus for which the underlying molecular basis is not known.
  5. No symbol before an entry number generally indicates a description of a phenotype for which the mendelian basis, although suspected, has not been clearly established or that the separateness of this phenotype from that in another entry is unclear.
  6. A caret (^) before an entry number means the entry no longer exists because it was removed from the database or moved to another entry as indicated.

To fetch a non-redundant list of OMIM annotation through the Ensembl Perl API you can look at the external references (xrefs/dblinks):

Code

my $att = "MIM_GENE";
# or: my $att = "MIM_MORBID";
my $attribs = $gene->get_all_DBLinks($att);
my (%ids, %descriptions);
if (@{ $attribs }){
  foreach my $attrib (@{ $attribs }){
    if (not(exists $ids{$attrib->primary_id()})){
      $ids{$attrib->primary_id} = $attrib->display_id;
      $descriptions{$attrib->description} = $attrib->display_id;
    }
  }
}

Ref:
OMIM publication, http://omim.org/


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Nucleotide Ambiguity Codes Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/nucleotide-ambiguity-codes?blog=2 2012-04-04T10:32:00Z 2012-05-17T12:51:38Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The symbols to describe the different nucleotides in DNA are the following:

------------------------------------------
Symbol       Meaning      Nucleic Acid
------------------------------------------
A            A           Adenine
C            C           Cytosine
G            G           Guanine
T            T           Thymine
U            U           Uracil
M          A or C
R          A or G
W          A or T
S          C or G
Y          C or T
K          G or T
V        A or C or G
H        A or C or T
D        A or G or T
B        C or G or T
X      G or A or T or C
N      G or A or T or C

Note: these letters are also used in the "samtools tview" program to visually show NGS read alignments.

Sources:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
1000 Genomes Project Populations Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/1000-genomes-project-populations?blog=2 2012-04-03T08:49:00Z 2012-04-10T09:45:26Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The goal of the 1000 Genomes Project is create a "A Deep Catalog of Human Genetic Variation" by measuring and analysing most genetic variants that have frequencies of at least 1% in the populations studied.

The population codes used in the project are the following (Source: 1000 Genomes / ftp site):

CHB	Han Chines              Han Chinese in Beijing, China 
JPT	Japanese                Japanese in Tokyo, Japan
CHS	Southern Han Chinese    Han Chinese South 
CDX	Dai Chinese             Chinese Dai in Xishuangbanna, China
KHV	Kinh Vietnamese         Kinh in Ho Chi Minh City, Vietnam
CHD	Denver Chinese          Chinese in Denver, Colorado (pilot 3 only)
	
CEU	CEPH    Utah residents (CEPH) with Northern and Western European ancestry 
TSI	Tuscan  Toscani in Italia 
GBR	British British in England and Scotland 
FIN	Finnish Finnish in Finland 
IBS	Spanish Iberian populations in Spain 
	
YRI	Yoruba  Yoruba in Ibadan, Nigeria
LWK	Luhya   Luhya in Webuye, Kenya
GWD	Gambian Gambian in Western Division, The Gambia 
MSL	Mende   Mende in Sierra Leone
ESN	Esan    Esan in Nigeria
	
ASW	African-American SW     African Ancestry in Southwest US  
ACB	African-Caribbean       African Caribbean in Barbados
MXL	Mexican-American        Mexican Ancestry in Los Angeles, California
PUR	Puerto Rican            Puerto Rican in Puerto Rico
CLM	Colombian               Colombian in Medellin, Colombia
PEL	Peruvian                Peruvian in Lima, Peru

GIH	Gujarati                Gujarati Indian in Houston,TX
PJL	Punjabi                 Punjabi in Lahore,Pakistan
BEB	Bengali                 Belgali in Bangladesh
STU	Sri Lankan              Sri Lankan Tamil in the UK
ITU	Indian                  Indian Telugu in the UK

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Canonical transcripts Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/canonical-transcripts?blog=2 2012-01-03T14:59:00Z 2012-12-22T11:43:01Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

As reported in the Ensembl 2009 NAR paper canonical transcripts are defined for all genes and for all species in the Ensembl gene sets. "The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA. Should a transcript already regarded as canonical not be selected using the above rules, there is support for storing this information in the Ensembl database."
For the human gene annotation the hierarchy to choose if there are more than one protein-coding transcripts is:

  1. CCDS transcripts
  2. Havana manual annotation transcripts of the type "protein_coding"
  3. Havana manual annotation transcripts that are also protein-coding
  4. Ensembl protein-coding transcripts

If there are multiple transcripts within the groups, take the longest CDS of the highest priority group.
For non-coding types takes the longest cDNA of

  1. Havana transcripts
  2. Ensembl transcripts

Source:
Ensembl 2009 NAR paper, Ensembl mailing list

These objects can be regarded as representative transcripts for the gene and can be fetched with the Perl API method

Bio::EnsEMBL::Gene::canonical_transcript()

Some caution needs to be used when looking at the pseudo-autosomal regions: When looking at genes from the Y PAR, the method will return a transcript with X coordinates. While not really a bug, this might mess up your data if un-noticed. To check and fix this something like the following will work:

Code

#fetch slice from Y PAR
 
my $slice = $slice_adaptor->fetch_by_region( \
'Chromosome','Y', 59100480, 59115127);
 
#get an example gene
my $gene = @{$slice->get_all_Genes}[0];
 
#get canonical transcript from the gene
my $transcript = $gene->canonical_transcript;
 
#re-fetch transcript on Y to avoid getting
# X locations for PAR
if($gene->slice->seq_region_name eq "Y"){
 
  my $sid = $transcript->stable_id;
  $transcript = undef;
  my $transcripts = \
$transcript_adaptor->fetch_all_by_Slice( \
$slice, 1);  
 
  foreach my $poss_transcript (@$transcripts){
    next unless($poss_transcript->stable_id eq $sid);
    $transcript = $poss_transcript;
  }
 
}

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Telomeric and Centromeric regions Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/telomeric-and-centromeric-regions?blog=2 2011-09-22T15:10:00Z 2012-10-30T18:31:34Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Telomeres form caps on the ends of chromosomes that prevent fusion of chromosomal ends and provide genomic stability.

During gametogenesis, reprogramming of the germ cells leads to elongation of telomeres up to their species-specific maximum.

In normal somatic cells, telomeres are progressively shortened with every cell division. This shortening in normal human cells limits the number of cell divisions. For human cells to proliferate beyond the senescence checkpoint, they need to stabilize telomere length. This is accomplished mainly by reactivation of the telomerase enzyme. Telomerase expression is under the control of many factors. Expression of telomerase can lead to cell immortalization and is activated during tumorigenesis, i.e. cancer.

Male Xq-telomeres are 1100 bp shorter than female Xq-telomeres.

The telomeric repeat found on all human chromosomes is "TTAGGG".

The centromeres and telomeres of the human chromosomes are not defined as region attributes in the Ensembl perl API explicitely, so for checking these regions, one option is to pull them out of the UCSC table browser (use the "Mapping and Sequencing tracks" group and the "Gap" table) and define them manually. You can e.g. create an array of hashes with the regions and use them in your script:

Code

#read data (listed below) from a file...
my @data = split("\s");
my %telomere = (
      'chrom' => $data[0],
      'start' => $data[1],
      'end'   => $data[2],
   );
push(@telomeres, \%telomere);

The list of centromere regions (transformed from the 0-based UCSC system to the 1-based coordinated system) for GRCh37 is:

Code

1       121535435       124535434
2       92326172       95326171
3       90504855       93504854
4       49660118       52660117
5       46405642       49405641
6       58830167       61830166
7       58054332       61054331
8       43838888       46838887
9       47367680       50367679
10      39254936       42254935
11      51644206       54644205
12      34856695       37856694
13      16000001       19000000
14      16000001       19000000
15      17000001       20000000
16      35335802       38335801
17      22263007       25263006
18      15460899       18460898
19      24681783       27681782
20      26369570       29369569
21      11288130       14288129
22      13000001       16000000
X       58632013       61632012
Y       10104554       13104553

The list of telomere regions for GRCh37 is (1-based):

Code

1       1               10000
1       249240622       249250621
2       1               10000
2       243189374       243199373
3       1               10000
3       198012431       198022430
4       1               10000
4       191144277       191154276
5       1               10000
5       180905261       180915260
6       1               10000
6       171105068       171115067
7       1               10000
7       159128664       159138663
8       1               10000
8       146354023       146364022
9       1               10000
9       141203432       141213431
10      135524748       135534747
10      1               10000
11      134996517       135006516
11      1               10000
12      1               10000
12      133841896       133851895
13      1               10000
13      115159879       115169878
14      1               10000
14      107339541       107349540
15      1               10000
15      102521393       102531392
16      1               10000
16      90344754        90354753
18      1               10000
18      78067249        78077248
19      1               10000
19      59118984        59128983
20      1               10000
20      63015521        63025520
21      1               10000
21      48119896        48129895
22      1               10000
22      51294567        51304566
X       1               10000
X       155260561       155270560
Y       1               10000
Y       59363567        59373566

Telomeres of chromosome 17 have not been defined for assembly GRCh37. They are short, but do exists nonetheless. An assembly patch will address this.

Sources:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Cytogenetic Nomenclature Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/cytogenetic-nomenclature?blog=2 2011-09-06T10:47:00Z 2013-04-04T17:17:57Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The karyotype (number and set-up of the chromosomes) of a person or any changes in specific regions on human chromosomes are described with a system of numbers and symbols, defined by a group of cytogentic experts as the International System for Human Cytogenetic Nomenclature (ISCN). It was initiated 1960 by a committee after the suggestion of Charles E. Ford resulting in the "Proposed Standard System of Nomenclature of Human Mitotic Chromosomes". The system is based on the ideogram definitions of visual bands (described eg. in Francke et al. 1981), and was last revised 2005 and 2009.

The visual bands are created by staining techniques and describe regions of similarity in respect to functionality and base compositions (GC content); their lengths are 5-10 MB.

Example ideogram of chromosome 1
Example ideograms: Human chromosome 1 in different resolutions, from WashU

Numbers used:

The regions are numbered from the centromer outwards in both directions towards the telomeres on the shorter p arm and the longer q arm. The numbers cannot be read in the normal decimal numeric system e.g. 21, but rather 2-1 (region 2 band 1). Counting starts at the centromer as region 1 (or 1-0), to 11 (1-1) to 21 (2-1) to 22 (2-2) etc. Subbands are added in a similar way, eg. 21.1 to 21.2, if the bands are small or only appear at a higher resolution.

There are different levels of resolution that can be used as bands, e.g. 1q32 in the 400-bands resolution can be split up into 1q32.1, 1q32.2, 1q32.3 in the 550-bands resolution (see Figure above as an example for chr 1).

Here is a list of chromosome, band (arm, region, band, subband), genomic start and end position, from the Ensembl database for assembly GRCh37:

Code

1  p11.1  121500001  125000000  acen
1  p11.2  120600001  121500000  gneg
1  p12  117800001  120600000  gpos50
1  p13.1  116100001  117800000  gneg
1  p13.2  111800001  116100000  gpos50
1  p13.3  107200001  111800000  gneg
1  p21.1  102200001  107200000  gpos100
1  p21.2  99700001  102200000  gneg
1  p21.3  94700001  99700000  gpos75
1  p22.1  92000001  94700000  gneg
1  p22.2  88400001  92000000  gpos75
1  p22.3  84900001  88400000  gneg
1  p31.1  69700001  84900000  gpos100
1  p31.2  68900001  69700000  gneg
1  p31.3  61300001  68900000  gpos50
1  p32.1  59000001  61300000  gneg
1  p32.2  56100001  59000000  gpos50
1  p32.3  50700001  56100000  gneg
1  p33  46800001  50700000  gpos75
1  p34.1  44100001  46800000  gneg
1  p34.2  40100001  44100000  gpos25
1  p34.3  34600001  40100000  gneg
1  p35.1  32400001  34600000  gpos25
1  p35.2  30200001  32400000  gneg
1  p35.3  28000001  30200000  gpos25
1  p36.11  23900001  28000000  gneg
1  p36.12  20400001  23900000  gpos25
1  p36.13  16200001  20400000  gneg
1  p36.21  12700001  16200000  gpos50
1  p36.22  9200001  12700000  gneg
1  p36.23  7200001  9200000  gpos25
1  p36.31  5400001  7200000  gneg
1  p36.32  2300001  5400000  gpos25
1  p36.33  1  2300000  gneg
1  q11  125000001  128900000  acen
1  q12  128900001  142600000  gvar
1  q21.1  142600001  147000000  gneg
1  q21.2  147000001  150300000  gpos50
1  q21.3  150300001  155000000  gneg
1  q22  155000001  156500000  gpos50
1  q23.1  156500001  159100000  gneg
1  q23.2  159100001  160500000  gpos50
1  q23.3  160500001  165500000  gneg
1  q24.1  165500001  167200000  gpos50
1  q24.2  167200001  170900000  gneg
1  q24.3  170900001  172900000  gpos75
1  q25.1  172900001  176000000  gneg
1  q25.2  176000001  180300000  gpos50
1  q25.3  180300001  185800000  gneg
1  q31.1  185800001  190800000  gpos100
1  q31.2  190800001  193800000  gneg
1  q31.3  193800001  198700000  gpos100
1  q32.1  198700001  207200000  gneg
1  q32.2  207200001  211500000  gpos25
1  q32.3  211500001  214500000  gneg
1  q41  214500001  224100000  gpos100
1  q42.11  224100001  224600000  gneg
1  q42.12  224600001  227000000  gpos25
1  q42.13  227000001  230700000  gneg
1  q42.2  230700001  234700000  gpos50
1  q42.3  234700001  236600000  gneg
1  q43  236600001  243700000  gpos75
1  q44  243700001  249250621  gneg
2  p11.1  90500001  93300000  acen
2  p11.2  83300001  90500000  gneg
2  p12  75000001  83300000  gpos100
2  p13.1  73500001  75000000  gneg
2  p13.2  71500001  73500000  gpos50
2  p13.3  68600001  71500000  gneg
2  p14  64100001  68600000  gpos50
2  p15  61300001  64100000  gneg
2  p16.1  55000001  61300000  gpos100
2  p16.2  52900001  55000000  gneg
2  p16.3  47800001  52900000  gpos100
2  p21  41800001  47800000  gneg
2  p22.1  38600001  41800000  gpos50
2  p22.2  36600001  38600000  gneg
2  p22.3  32100001  36600000  gpos75
2  p23.1  30000001  32100000  gneg
2  p23.2  27900001  30000000  gpos25
2  p23.3  24000001  27900000  gneg
2  p24.1  19200001  24000000  gpos75
2  p24.2  16700001  19200000  gneg
2  p24.3  12200001  16700000  gpos75
2  p25.1  7100001  12200000  gneg
2  p25.2  4400001  7100000  gpos50
2  p25.3  1  4400000  gneg
2  q11.1  93300001  96800000  acen
2  q11.2  96800001  102700000  gneg
2  q12.1  102700001  106000000  gpos50
2  q12.2  106000001  107500000  gneg
2  q12.3  107500001  110200000  gpos25
2  q13  110200001  114400000  gneg
2  q14.1  114400001  118800000  gpos50
2  q14.2  118800001  122400000  gneg
2  q14.3  122400001  129900000  gpos50
2  q21.1  129900001  132500000  gneg
2  q21.2  132500001  135100000  gpos25
2  q21.3  135100001  136800000  gneg
2  q22.1  136800001  142200000  gpos100
2  q22.2  142200001  144100000  gneg
2  q22.3  144100001  148700000  gpos100
2  q23.1  148700001  149900000  gneg
2  q23.2  149900001  150500000  gpos25
2  q23.3  150500001  154900000  gneg
2  q24.1  154900001  159800000  gpos75
2  q24.2  159800001  163700000  gneg
2  q24.3  163700001  169700000  gpos75
2  q31.1  169700001  178000000  gneg
2  q31.2  178000001  180600000  gpos50
2  q31.3  180600001  183000000  gneg
2  q32.1  183000001  189400000  gpos75
2  q32.2  189400001  191900000  gneg
2  q32.3  191900001  197400000  gpos75
2  q33.1  197400001  203300000  gneg
2  q33.2  203300001  204900000  gpos50
2  q33.3  204900001  209000000  gneg
2  q34  209000001  215300000  gpos100
2  q35  215300001  221500000  gneg
2  q36.1  221500001  225200000  gpos75
2  q36.2  225200001  226100000  gneg
2  q36.3  226100001  231000000  gpos100
2  q37.1  231000001  235600000  gneg
2  q37.2  235600001  237300000  gpos50
2  q37.3  237300001  243199373  gneg
3  p11.1  87900001  91000000  acen
3  p11.2  87200001  87900000  gneg
3  p12.1  83500001  87200000  gpos75
3  p12.2  79800001  83500000  gneg
3  p12.3  74200001  79800000  gpos75
3  p13  69800001  74200000  gneg
3  p14.1  63700001  69800000  gpos50
3  p14.2  58600001  63700000  gneg
3  p14.3  54400001  58600000  gpos50
3  p21.1  52300001  54400000  gneg
3  p21.2  50600001  52300000  gpos25
3  p21.31  44200001  50600000  gneg
3  p21.32  44100001  44200000  gpos50
3  p21.33  43700001  44100000  gneg
3  p22.1  39400001  43700000  gpos75
3  p22.2  36500001  39400000  gneg
3  p22.3  32100001  36500000  gpos50
3  p23  30900001  32100000  gneg
3  p24.1  26400001  30900000  gpos75
3  p24.2  23900001  26400000  gneg
3  p24.3  16400001  23900000  gpos100
3  p25.1  13300001  16400000  gneg
3  p25.2  11800001  13300000  gpos25
3  p25.3  8700001  11800000  gneg
3  p26.1  4000001  8700000  gpos50
3  p26.2  2800001  4000000  gneg
3  p26.3  1  2800000  gpos50
3  q11.1  91000001  93900000  acen
3  q11.2  93900001  98300000  gvar
3  q12.1  98300001  100000000  gneg
3  q12.2  100000001  100900000  gpos25
3  q12.3  100900001  102800000  gneg
3  q13.11  102800001  106200000  gpos75
3  q13.12  106200001  107900000  gneg
3  q13.13  107900001  111300000  gpos50
3  q13.2  111300001  113500000  gneg
3  q13.31  113500001  117300000  gpos75
3  q13.32  117300001  119000000  gneg
3  q13.33  119000001  121900000  gpos75
3  q21.1  121900001  123800000  gneg
3  q21.2  123800001  125800000  gpos25
3  q21.3  125800001  129200000  gneg
3  q22.1  129200001  133700000  gpos25
3  q22.2  133700001  135700000  gneg
3  q22.3  135700001  138700000  gpos25
3  q23  138700001  142800000  gneg
3  q24  142800001  148900000  gpos100
3  q25.1  148900001  152100000  gneg
3  q25.2  152100001  155000000  gpos50
3  q25.31  155000001  157000000  gneg
3  q25.32  157000001  159000000  gpos50
3  q25.33  159000001  160700000  gneg
3  q26.1  160700001  167600000  gpos100
3  q26.2  167600001  170900000  gneg
3  q26.31  170900001  175700000  gpos75
3  q26.32  175700001  179000000  gneg
3  q26.33  179000001  182700000  gpos75
3  q27.1  182700001  184500000  gneg
3  q27.2  184500001  186000000  gpos25
3  q27.3  186000001  187900000  gneg
3  q28  187900001  192300000  gpos75
3  q29  192300001  198022430  gneg
4  p11  48200001  50400000  acen
4  p12  44600001  48200000  gneg
4  p13  41200001  44600000  gpos50
4  p14  35800001  41200000  gneg
4  p15.1  27700001  35800000  gpos100
4  p15.2  21300001  27700000  gneg
4  p15.31  17800001  21300000  gpos75
4  p15.32  15200001  17800000  gneg
4  p15.33  11300001  15200000  gpos50
4  p16.1  6000001  11300000  gneg
4  p16.2  4500001  6000000  gpos25
4  p16.3  1  4500000  gneg
4  q11  50400001  52700000  acen
4  q12  52700001  59500000  gneg
4  q13.1  59500001  66600000  gpos100
4  q13.2  66600001  70500000  gneg
4  q13.3  70500001  76300000  gpos75
4  q21.1  76300001  78900000  gneg
4  q21.21  78900001  82400000  gpos50
4  q21.22  82400001  84100000  gneg
4  q21.23  84100001  86900000  gpos25
4  q21.3  86900001  88000000  gneg
4  q22.1  88000001  93700000  gpos75
4  q22.2  93700001  95100000  gneg
4  q22.3  95100001  98800000  gpos75
4  q23  98800001  101100000  gneg
4  q24  101100001  107700000  gpos50
4  q25  107700001  114100000  gneg
4  q26  114100001  120800000  gpos75
4  q27  120800001  123800000  gneg
4  q28.1  123800001  128800000  gpos50
4  q28.2  128800001  131100000  gneg
4  q28.3  131100001  139500000  gpos100
4  q31.1  139500001  141500000  gneg
4  q31.21  141500001  146800000  gpos25
4  q31.22  146800001  148500000  gneg
4  q31.23  148500001  151100000  gpos25
4  q31.3  151100001  155600000  gneg
4  q32.1  155600001  161800000  gpos100
4  q32.2  161800001  164500000  gneg
4  q32.3  164500001  170100000  gpos100
4  q33  170100001  171900000  gneg
4  q34.1  171900001  176300000  gpos75
4  q34.2  176300001  177500000  gneg
4  q34.3  177500001  183200000  gpos100
4  q35.1  183200001  187100000  gneg
4  q35.2  187100001  191154276  gpos25
5  p11  46100001  48400000  acen
5  p12  42500001  46100000  gpos50
5  p13.1  38400001  42500000  gneg
5  p13.2  33800001  38400000  gpos25
5  p13.3  28900001  33800000  gneg
5  p14.1  24600001  28900000  gpos100
5  p14.2  23300001  24600000  gneg
5  p14.3  18400001  23300000  gpos100
5  p15.1  15000001  18400000  gneg
5  p15.2  9800001  15000000  gpos50
5  p15.31  6300001  9800000  gneg
5  p15.32  4500001  6300000  gpos25
5  p15.33  1  4500000  gneg
5  q11.1  48400001  50700000  acen
5  q11.2  50700001  58900000  gneg
5  q12.1  58900001  62900000  gpos75
5  q12.2  62900001  63200000  gneg
5  q12.3  63200001  66700000  gpos75
5  q13.1  66700001  68400000  gneg
5  q13.2  68400001  73300000  gpos50
5  q13.3  73300001  76900000  gneg
5  q14.1  76900001  81400000  gpos50
5  q14.2  81400001  82800000  gneg
5  q14.3  82800001  92300000  gpos100
5  q15  92300001  98200000  gneg
5  q21.1  98200001  102800000  gpos100
5  q21.2  102800001  104500000  gneg
5  q21.3  104500001  109600000  gpos100
5  q22.1  109600001  111500000  gneg
5  q22.2  111500001  113100000  gpos50
5  q22.3  113100001  115200000  gneg
5  q23.1  115200001  121400000  gpos100
5  q23.2  121400001  127300000  gneg
5  q23.3  127300001  130600000  gpos100
5  q31.1  130600001  136200000  gneg
5  q31.2  136200001  139500000  gpos25
5  q31.3  139500001  144500000  gneg
5  q32  144500001  149800000  gpos75
5  q33.1  149800001  152700000  gneg
5  q33.2  152700001  155700000  gpos50
5  q33.3  155700001  159900000  gneg
5  q34  159900001  168500000  gpos100
5  q35.1  168500001  172800000  gneg
5  q35.2  172800001  176600000  gpos25
5  q35.3  176600001  180915260  gneg
6  p11.1  58700001  61000000  acen
6  p11.2  57000001  58700000  gneg
6  p12.1  52900001  57000000  gpos100
6  p12.2  51800001  52900000  gneg
6  p12.3  46200001  51800000  gpos100
6  p21.1  40500001  46200000  gneg
6  p21.2  36600001  40500000  gpos25
6  p21.31  33500001  36600000  gneg
6  p21.32  32100001  33500000  gpos25
6  p21.33  30400001  32100000  gneg
6  p22.1  27000001  30400000  gpos50
6  p22.2  25200001  27000000  gneg
6  p22.3  15200001  25200000  gpos75
6  p23  13400001  15200000  gneg
6  p24.1  11600001  13400000  gpos25
6  p24.2  10600001  11600000  gneg
6  p24.3  7100001  10600000  gpos50
6  p25.1  4200001  7100000  gneg
6  p25.2  2300001  4200000  gpos25
6  p25.3  1  2300000  gneg
6  q11.1  61000001  63300000  acen
6  q11.2  63300001  63400000  gneg
6  q12  63400001  70000000  gpos100
6  q13  70000001  75900000  gneg
6  q14.1  75900001  83900000  gpos50
6  q14.2  83900001  84900000  gneg
6  q14.3  84900001  88000000  gpos50
6  q15  88000001  93100000  gneg
6  q16.1  93100001  99500000  gpos100
6  q16.2  99500001  100600000  gneg
6  q16.3  100600001  105500000  gpos100
6  q21  105500001  114600000  gneg
6  q22.1  114600001  118300000  gpos75
6  q22.2  118300001  118500000  gneg
6  q22.31  118500001  126100000  gpos100
6  q22.32  126100001  127100000  gneg
6  q22.33  127100001  130300000  gpos75
6  q23.1  130300001  131200000  gneg
6  q23.2  131200001  135200000  gpos50
6  q23.3  135200001  139000000  gneg
6  q24.1  139000001  142800000  gpos75
6  q24.2  142800001  145600000  gneg
6  q24.3  145600001  149000000  gpos75
6  q25.1  149000001  152500000  gneg
6  q25.2  152500001  155500000  gpos50
6  q25.3  155500001  161000000  gneg
6  q26  161000001  164500000  gpos50
6  q27  164500001  171115067  gneg
7  p11.1  58000001  59900000  acen
7  p11.2  54000001  58000000  gneg
7  p12.1  50500001  54000000  gpos75
7  p12.2  49000001  50500000  gneg
7  p12.3  45400001  49000000  gpos75
7  p13  43300001  45400000  gneg
7  p14.1  37200001  43300000  gpos75
7  p14.2  35000001  37200000  gneg
7  p14.3  28800001  35000000  gpos75
7  p15.1  28000001  28800000  gneg
7  p15.2  25500001  28000000  gpos50
7  p15.3  20900001  25500000  gneg
7  p21.1  16500001  20900000  gpos100
7  p21.2  13800001  16500000  gneg
7  p21.3  7300001  13800000  gpos100
7  p22.1  4500001  7300000  gneg
7  p22.2  2800001  4500000  gpos25
7  p22.3  1  2800000  gneg
7  q11.1  59900001  61700000  acen
7  q11.21  61700001  67000000  gneg
7  q11.22  67000001  72200000  gpos50
7  q11.23  72200001  77500000  gneg
7  q21.11  77500001  86400000  gpos100
7  q21.12  86400001  88200000  gneg
7  q21.13  88200001  91100000  gpos75
7  q21.2  91100001  92800000  gneg
7  q21.3  92800001  98000000  gpos75
7  q22.1  98000001  103800000  gneg
7  q22.2  103800001  104500000  gpos50
7  q22.3  104500001  107400000  gneg
7  q31.1  107400001  114600000  gpos75
7  q31.2  114600001  117400000  gneg
7  q31.31  117400001  121100000  gpos75
7  q31.32  121100001  123800000  gneg
7  q31.33  123800001  127100000  gpos75
7  q32.1  127100001  129200000  gneg
7  q32.2  129200001  130400000  gpos25
7  q32.3  130400001  132600000  gneg
7  q33  132600001  138200000  gpos50
7  q34  138200001  143100000  gneg
7  q35  143100001  147900000  gpos75
7  q36.1  147900001  152600000  gneg
7  q36.2  152600001  155100000  gpos25
7  q36.3  155100001  159138663  gneg
8  p11.1  43100001  45600000  acen
8  p11.21  39700001  43100000  gneg
8  p11.22  38300001  39700000  gpos25
8  p11.23  36500001  38300000  gneg
8  p12  28800001  36500000  gpos75
8  p21.1  27400001  28800000  gneg
8  p21.2  23300001  27400000  gpos50
8  p21.3  19000001  23300000  gneg
8  p22  12700001  19000000  gpos100
8  p23.1  6200001  12700000  gneg
8  p23.2  2200001  6200000  gpos75
8  p23.3  1  2200000  gneg
8  q11.1  45600001  48100000  acen
8  q11.21  48100001  52200000  gneg
8  q11.22  52200001  52600000  gpos75
8  q11.23  52600001  55500000  gneg
8  q12.1  55500001  61600000  gpos50
8  q12.2  61600001  62200000  gneg
8  q12.3  62200001  66000000  gpos50
8  q13.1  66000001  68000000  gneg
8  q13.2  68000001  70500000  gpos50
8  q13.3  70500001  73900000  gneg
8  q21.11  73900001  78300000  gpos100
8  q21.12  78300001  80100000  gneg
8  q21.13  80100001  84600000  gpos75
8  q21.2  84600001  86900000  gneg
8  q21.3  86900001  93300000  gpos100
8  q22.1  93300001  99000000  gneg
8  q22.2  99000001  101600000  gpos25
8  q22.3  101600001  106200000  gneg
8  q23.1  106200001  110500000  gpos75
8  q23.2  110500001  112100000  gneg
8  q23.3  112100001  117700000  gpos100
8  q24.11  117700001  119200000  gneg
8  q24.12  119200001  122500000  gpos50
8  q24.13  122500001  127300000  gneg
8  q24.21  127300001  131500000  gpos50
8  q24.22  131500001  136400000  gneg
8  q24.23  136400001  139900000  gpos75
8  q24.3  139900001  146364022  gneg
9  p11.1  47300001  49000000  acen
9  p11.2  43600001  47300000  gneg
9  p12  41000001  43600000  gpos50
9  p13.1  38400001  41000000  gneg
9  p13.2  36300001  38400000  gpos25
9  p13.3  33200001  36300000  gneg
9  p21.1  28000001  33200000  gpos100
9  p21.2  25600001  28000000  gneg
9  p21.3  19900001  25600000  gpos100
9  p22.1  18500001  19900000  gneg
9  p22.2  16600001  18500000  gpos25
9  p22.3  14200001  16600000  gneg
9  p23  9000001  14200000  gpos75
9  p24.1  4600001  9000000  gneg
9  p24.2  2200001  4600000  gpos25
9  p24.3  1  2200000  gneg
9  q11  49000001  50700000  acen
9  q12  50700001  65900000  gvar
9  q13  65900001  68700000  gneg
9  q21.11  68700001  72200000  gpos25
9  q21.12  72200001  74000000  gneg
9  q21.13  74000001  79200000  gpos50
9  q21.2  79200001  81100000  gneg
9  q21.31  81100001  84100000  gpos50
9  q21.32  84100001  86900000  gneg
9  q21.33  86900001  90400000  gpos50
9  q22.1  90400001  91800000  gneg
9  q22.2  91800001  93900000  gpos25
9  q22.31  93900001  96600000  gneg
9  q22.32  96600001  99300000  gpos25
9  q22.33  99300001  102600000  gneg
9  q31.1  102600001  108200000  gpos100
9  q31.2  108200001  111300000  gneg
9  q31.3  111300001  114900000  gpos25
9  q32  114900001  117700000  gneg
9  q33.1  117700001  122500000  gpos75
9  q33.2  122500001  125800000  gneg
9  q33.3  125800001  130300000  gpos25
9  q34.11  130300001  133500000  gneg
9  q34.12  133500001  134000000  gpos25
9  q34.13  134000001  135900000  gneg
9  q34.2  135900001  137400000  gpos25
9  q34.3  137400001  141213431  gneg
10  p11.1  38000001  40200000  acen
10  p11.21  34400001  38000000  gneg
10  p11.22  31300001  34400000  gpos25
10  p11.23  29600001  31300000  gneg
10  p12.1  24600001  29600000  gpos50
10  p12.2  22600001  24600000  gneg
10  p12.31  18700001  22600000  gpos75
10  p12.32  18600001  18700000  gneg
10  p12.33  17300001  18600000  gpos75
10  p13  12200001  17300000  gneg
10  p14  6600001  12200000  gpos75
10  p15.1  3800001  6600000  gneg
10  p15.2  3000001  3800000  gpos25
10  p15.3  1  3000000  gneg
10  q11.1  40200001  42300000  acen
10  q11.21  42300001  46100000  gneg
10  q11.22  46100001  49900000  gpos25
10  q11.23  49900001  52900000  gneg
10  q21.1  52900001  61200000  gpos100
10  q21.2  61200001  64500000  gneg
10  q21.3  64500001  70600000  gpos100
10  q22.1  70600001  74900000  gneg
10  q22.2  74900001  77700000  gpos50
10  q22.3  77700001  82000000  gneg
10  q23.1  82000001  87900000  gpos100
10  q23.2  87900001  89500000  gneg
10  q23.31  89500001  92900000  gpos75
10  q23.32  92900001  94100000  gneg
10  q23.33  94100001  97000000  gpos50
10  q24.1  97000001  99300000  gneg
10  q24.2  99300001  101900000  gpos50
10  q24.31  101900001  103000000  gneg
10  q24.32  103000001  104900000  gpos25
10  q24.33  104900001  105800000  gneg
10  q25.1  105800001  111900000  gpos100
10  q25.2  111900001  114900000  gneg
10  q25.3  114900001  119100000  gpos75
10  q26.11  119100001  121700000  gneg
10  q26.12  121700001  123100000  gpos50
10  q26.13  123100001  127500000  gneg
10  q26.2  127500001  130600000  gpos50
10  q26.3  130600001  135534747  gneg
11  p11.11  51600001  53700000  acen
11  p11.12  48800001  51600000  gpos75
11  p11.2  43500001  48800000  gneg
11  p12  36400001  43500000  gpos100
11  p13  31000001  36400000  gneg
11  p14.1  27200001  31000000  gpos75
11  p14.2  26100001  27200000  gneg
11  p14.3  21700001  26100000  gpos100
11  p15.1  16200001  21700000  gneg
11  p15.2  12700001  16200000  gpos50
11  p15.3  10700001  12700000  gneg
11  p15.4  2800001  10700000  gpos50
11  p15.5  1  2800000  gneg
11  q11  53700001  55700000  acen
11  q12.1  55700001  59900000  gpos75
11  q12.2  59900001  61700000  gneg
11  q12.3  61700001  63400000  gpos25
11  q13.1  63400001  65900000  gneg
11  q13.2  65900001  68400000  gpos25
11  q13.3  68400001  70400000  gneg
11  q13.4  70400001  75200000  gpos50
11  q13.5  75200001  77100000  gneg
11  q14.1  77100001  85600000  gpos100
11  q14.2  85600001  88300000  gneg
11  q14.3  88300001  92800000  gpos100
11  q21  92800001  97200000  gneg
11  q22.1  97200001  102100000  gpos100
11  q22.2  102100001  102900000  gneg
11  q22.3  102900001  110400000  gpos100
11  q23.1  110400001  112500000  gneg
11  q23.2  112500001  114500000  gpos50
11  q23.3  114500001  121200000  gneg
11  q24.1  121200001  123900000  gpos50
11  q24.2  123900001  127800000  gneg
11  q24.3  127800001  130800000  gpos50
11  q25  130800001  135006516  gneg
12  p11.1  33300001  35800000  acen
12  p11.21  30700001  33300000  gneg
12  p11.22  27800001  30700000  gpos50
12  p11.23  26500001  27800000  gneg
12  p12.1  21300001  26500000  gpos100
12  p12.2  20000001  21300000  gneg
12  p12.3  14800001  20000000  gpos100
12  p13.1  12800001  14800000  gneg
12  p13.2  10100001  12800000  gpos75
12  p13.31  5400001  10100000  gneg
12  p13.32  3300001  5400000  gpos25
12  p13.33  1  3300000  gneg
12  q11  35800001  38200000  acen
12  q12  38200001  46400000  gpos100
12  q13.11  46400001  49100000  gneg
12  q13.12  49100001  51500000  gpos25
12  q13.13  51500001  54900000  gneg
12  q13.2  54900001  56600000  gpos25
12  q13.3  56600001  58100000  gneg
12  q14.1  58100001  63100000  gpos75
12  q14.2  63100001  65100000  gneg
12  q14.3  65100001  67700000  gpos50
12  q15  67700001  71500000  gneg
12  q21.1  71500001  75700000  gpos75
12  q21.2  75700001  80300000  gneg
12  q21.31  80300001  86700000  gpos100
12  q21.32  86700001  89000000  gneg
12  q21.33  89000001  92600000  gpos100
12  q22  92600001  96200000  gneg
12  q23.1  96200001  101600000  gpos75
12  q23.2  101600001  103800000  gneg
12  q23.3  103800001  109000000  gpos50
12  q24.11  109000001  111700000  gneg
12  q24.12  111700001  112300000  gpos25
12  q24.13  112300001  114300000  gneg
12  q24.21  114300001  116800000  gpos50
12  q24.22  116800001  118100000  gneg
12  q24.23  118100001  120700000  gpos50
12  q24.31  120700001  125900000  gneg
12  q24.32  125900001  129300000  gpos50
12  q24.33  129300001  133851895  gneg
13  p11.1  16300001  17900000  acen
13  p11.2  10000001  16300000  gvar
13  p12  4500001  10000000  stalk
13  p13  1  4500000  gvar
13  q11  17900001  19500000  acen
13  q12.11  19500001  23300000  gneg
13  q12.12  23300001  25500000  gpos25
13  q12.13  25500001  27800000  gneg
13  q12.2  27800001  28900000  gpos25
13  q12.3  28900001  32200000  gneg
13  q13.1  32200001  34000000  gpos50
13  q13.2  34000001  35500000  gneg
13  q13.3  35500001  40100000  gpos75
13  q14.11  40100001  45200000  gneg
13  q14.12  45200001  45800000  gpos25
13  q14.13  45800001  47300000  gneg
13  q14.2  47300001  50900000  gpos50
13  q14.3  50900001  55300000  gneg
13  q21.1  55300001  59600000  gpos100
13  q21.2  59600001  62300000  gneg
13  q21.31  62300001  65700000  gpos75
13  q21.32  65700001  68600000  gneg
13  q21.33  68600001  73300000  gpos100
13  q22.1  73300001  75400000  gneg
13  q22.2  75400001  77200000  gpos50
13  q22.3  77200001  79000000  gneg
13  q31.1  79000001  87700000  gpos100
13  q31.2  87700001  90000000  gneg
13  q31.3  90000001  95000000  gpos100
13  q32.1  95000001  98200000  gneg
13  q32.2  98200001  99300000  gpos25
13  q32.3  99300001  101700000  gneg
13  q33.1  101700001  104800000  gpos100
13  q33.2  104800001  107000000  gneg
13  q33.3  107000001  110300000  gpos100
13  q34  110300001  115169878  gneg
14  p11.1  16100001  17600000  acen
14  p11.2  8100001  16100000  gvar
14  p12  3700001  8100000  stalk
14  p13  1  3700000  gvar
14  q11.1  17600001  19100000  acen
14  q11.2  19100001  24600000  gneg
14  q12  24600001  33300000  gpos100
14  q13.1  33300001  35300000  gneg
14  q13.2  35300001  36600000  gpos50
14  q13.3  36600001  37800000  gneg
14  q21.1  37800001  43500000  gpos100
14  q21.2  43500001  47200000  gneg
14  q21.3  47200001  50900000  gpos100
14  q22.1  50900001  54100000  gneg
14  q22.2  54100001  55500000  gpos25
14  q22.3  55500001  58100000  gneg
14  q23.1  58100001  62100000  gpos75
14  q23.2  62100001  64800000  gneg
14  q23.3  64800001  67900000  gpos50
14  q24.1  67900001  70200000  gneg
14  q24.2  70200001  73800000  gpos50
14  q24.3  73800001  79300000  gneg
14  q31.1  79300001  83600000  gpos100
14  q31.2  83600001  84900000  gneg
14  q31.3  84900001  89800000  gpos100
14  q32.11  89800001  91900000  gneg
14  q32.12  91900001  94700000  gpos25
14  q32.13  94700001  96300000  gneg
14  q32.2  96300001  101400000  gpos50
14  q32.31  101400001  103200000  gneg
14  q32.32  103200001  104000000  gpos50
14  q32.33  104000001  107349540  gneg
15  p11.1  15800001  19000000  acen
15  p11.2  8700001  15800000  gvar
15  p12  3900001  8700000  stalk
15  p13  1  3900000  gvar
15  q11.1  19000001  20700000  acen
15  q11.2  20700001  25700000  gneg
15  q12  25700001  28100000  gpos50
15  q13.1  28100001  30300000  gneg
15  q13.2  30300001  31200000  gpos50
15  q13.3  31200001  33600000  gneg
15  q14  33600001  40100000  gpos75
15  q15.1  40100001  42800000  gneg
15  q15.2  42800001  43600000  gpos25
15  q15.3  43600001  44800000  gneg
15  q21.1  44800001  49500000  gpos75
15  q21.2  49500001  52900000  gneg
15  q21.3  52900001  59100000  gpos75
15  q22.1  59100001  59300000  gneg
15  q22.2  59300001  63700000  gpos25
15  q22.31  63700001  67200000  gneg
15  q22.32  67200001  67300000  gpos25
15  q22.33  67300001  67500000  gneg
15  q23  67500001  72700000  gpos25
15  q24.1  72700001  75200000  gneg
15  q24.2  75200001  76600000  gpos25
15  q24.3  76600001  78300000  gneg
15  q25.1  78300001  81700000  gpos50
15  q25.2  81700001  85200000  gneg
15  q25.3  85200001  89100000  gpos50
15  q26.1  89100001  94300000  gneg
15  q26.2  94300001  98500000  gpos50
15  q26.3  98500001  102531392  gneg
16  p11.1  34600001  36600000  acen
16  p11.2  28100001  34600000  gneg
16  p12.1  24200001  28100000  gpos50
16  p12.2  21200001  24200000  gneg
16  p12.3  16800001  21200000  gpos50
16  p13.11  14800001  16800000  gneg
16  p13.12  12600001  14800000  gpos50
16  p13.13  10500001  12600000  gneg
16  p13.2  7900001  10500000  gpos50
16  p13.3  1  7900000  gneg
16  q11.1  36600001  38600000  acen
16  q11.2  38600001  47000000  gvar
16  q12.1  47000001  52600000  gneg
16  q12.2  52600001  56700000  gpos50
16  q13  56700001  57400000  gneg
16  q21  57400001  66700000  gpos100
16  q22.1  66700001  70800000  gneg
16  q22.2  70800001  72900000  gpos50
16  q22.3  72900001  74100000  gneg
16  q23.1  74100001  79200000  gpos75
16  q23.2  79200001  81700000  gneg
16  q23.3  81700001  84200000  gpos50
16  q24.1  84200001  87100000  gneg
16  q24.2  87100001  88700000  gpos25
16  q24.3  88700001  90354753  gneg
17  p11.1  22200001  24000000  acen
17  p11.2  16000001  22200000  gneg
17  p12  10700001  16000000  gpos75
17  p13.1  6500001  10700000  gneg
17  p13.2  3300001  6500000  gpos50
17  p13.3  1  3300000  gneg
17  q11.1  24000001  25800000  acen
17  q11.2  25800001  31800000  gneg
17  q12  31800001  38100000  gpos50
17  q21.1  38100001  38400000  gneg
17  q21.2  38400001  40900000  gpos25
17  q21.31  40900001  44900000  gneg
17  q21.32  44900001  47400000  gpos25
17  q21.33  47400001  50200000  gneg
17  q22  50200001  57600000  gpos75
17  q23.1  57600001  58300000  gneg
17  q23.2  58300001  61100000  gpos75
17  q23.3  61100001  62600000  gneg
17  q24.1  62600001  64200000  gpos50
17  q24.2  64200001  67100000  gneg
17  q24.3  67100001  70900000  gpos75
17  q25.1  70900001  74800000  gneg
17  q25.2  74800001  75300000  gpos25
17  q25.3  75300001  81195210  gneg
18  p11.1  15400001  17200000  acen
18  p11.21  10900001  15400000  gneg
18  p11.22  8500001  10900000  gpos25
18  p11.23  7100001  8500000  gneg
18  p11.31  2900001  7100000  gpos50
18  p11.32  1  2900000  gneg
18  q11.1  17200001  19000000  acen
18  q11.2  19000001  25000000  gneg
18  q12.1  25000001  32700000  gpos100
18  q12.2  32700001  37200000  gneg
18  q12.3  37200001  43500000  gpos75
18  q21.1  43500001  48200000  gneg
18  q21.2  48200001  53800000  gpos75
18  q21.31  53800001  56200000  gneg
18  q21.32  56200001  59000000  gpos50
18  q21.33  59000001  61600000  gneg
18  q22.1  61600001  66800000  gpos100
18  q22.2  66800001  68700000  gneg
18  q22.3  68700001  73100000  gpos25
18  q23  73100001  78077248  gneg
19  p11  24400001  26500000  acen
19  p12  20000001  24400000  gvar
19  p13.11  16300001  20000000  gneg
19  p13.12  14000001  16300000  gpos25
19  p13.13  13900001  14000000  gneg
19  p13.2  6900001  13900000  gpos25
19  p13.3  1  6900000  gneg
19  q11  26500001  28600000  acen
19  q12  28600001  32400000  gvar
19  q13.11  32400001  35500000  gneg
19  q13.12  35500001  38300000  gpos25
19  q13.13  38300001  38700000  gneg
19  q13.2  38700001  43400000  gpos25
19  q13.31  43400001  45200000  gneg
19  q13.32  45200001  48000000  gpos25
19  q13.33  48000001  51400000  gneg
19  q13.41  51400001  53600000  gpos25
19  q13.42  53600001  56300000  gneg
19  q13.43  56300001  59128983  gpos25
20  p11.1  25600001  27500000  acen
20  p11.21  22300001  25600000  gneg
20  p11.22  21300001  22300000  gpos25
20  p11.23  17900001  21300000  gneg
20  p12.1  12100001  17900000  gpos75
20  p12.2  9200001  12100000  gneg
20  p12.3  5100001  9200000  gpos75
20  p13  1  5100000  gneg
20  q11.1  27500001  29400000  acen
20  q11.21  29400001  32100000  gneg
20  q11.22  32100001  34400000  gpos25
20  q11.23  34400001  37600000  gneg
20  q12  37600001  41700000  gpos75
20  q13.11  41700001  42100000  gneg
20  q13.12  42100001  46400000  gpos25
20  q13.13  46400001  49800000  gneg
20  q13.2  49800001  55000000  gpos75
20  q13.31  55000001  56500000  gneg
20  q13.32  56500001  58400000  gpos50
20  q13.33  58400001  63025520  gneg
21  p11.1  10900001  13200000  acen
21  p11.2  6800001  10900000  gvar
21  p12  2800001  6800000  stalk
21  p13  1  2800000  gvar
21  q11.1  13200001  14300000  acen
21  q11.2  14300001  16400000  gneg
21  q21.1  16400001  24000000  gpos100
21  q21.2  24000001  26800000  gneg
21  q21.3  26800001  31500000  gpos75
21  q22.11  31500001  35800000  gneg
21  q22.12  35800001  37800000  gpos50
21  q22.13  37800001  39700000  gneg
21  q22.2  39700001  42600000  gpos50
21  q22.3  42600001  48129895  gneg
22  p11.1  12200001  14700000  acen
22  p11.2  8300001  12200000  gvar
22  p12  3800001  8300000  stalk
22  p13  1  3800000  gvar
22  q11.1  14700001  17900000  acen
22  q11.21  17900001  22200000  gneg
22  q11.22  22200001  23500000  gpos25
22  q11.23  23500001  25900000  gneg
22  q12.1  25900001  29600000  gpos50
22  q12.2  29600001  32200000  gneg
22  q12.3  32200001  37600000  gpos50
22  q13.1  37600001  41000000  gneg
22  q13.2  41000001  44200000  gpos50
22  q13.31  44200001  48400000  gneg
22  q13.32  48400001  49400000  gpos50
22  q13.33  49400001  51304566  gneg
X  p11.1  58100001  60600000  acen
X  p11.21  54800001  58100000  gneg
X  p11.22  49800001  54800000  gpos25
X  p11.23  46400001  49800000  gneg
X  p11.3  42400001  46400000  gpos75
X  p11.4  37600001  42400000  gneg
X  p21.1  31500001  37600000  gpos100
X  p21.2  29300001  31500000  gneg
X  p21.3  24900001  29300000  gpos100
X  p22.11  21900001  24900000  gneg
X  p22.12  19300001  21900000  gpos50
X  p22.13  17100001  19300000  gneg
X  p22.2  9500001  17100000  gpos50
X  p22.31  6000001  9500000  gneg
X  p22.32  4300001  6000000  gpos50
X  p22.33  1  4300000  gneg
X  q11.1  60600001  63000000  acen
X  q11.2  63000001  64600000  gneg
X  q12  64600001  67800000  gpos50
X  q13.1  67800001  71800000  gneg
X  q13.2  71800001  73900000  gpos50
X  q13.3  73900001  76000000  gneg
X  q21.1  76000001  84600000  gpos100
X  q21.2  84600001  86200000  gneg
X  q21.31  86200001  91800000  gpos100
X  q21.32  91800001  93500000  gneg
X  q21.33  93500001  98300000  gpos75
X  q22.1  98300001  102600000  gneg
X  q22.2  102600001  103700000  gpos50
X  q22.3  103700001  108700000  gneg
X  q23  108700001  116500000  gpos75
X  q24  116500001  120900000  gneg
X  q25  120900001  128700000  gpos100
X  q26.1  128700001  130400000  gneg
X  q26.2  130400001  133600000  gpos25
X  q26.3  133600001  138000000  gneg
X  q27.1  138000001  140300000  gpos75
X  q27.2  140300001  142100000  gneg
X  q27.3  142100001  147100000  gpos100
X  q28  147100001  155270560  gneg
Y  p11.1  11600001  12500000  acen
Y  p11.2  3000001  11600000  gneg
Y  p11.31  2500001  3000000  gpos50
Y  p11.32  1  2500000  gneg
Y  q11.1  12500001  13400000  acen
Y  q11.21  13400001  15100000  gneg
Y  q11.221  15100001  19800000  gpos50
Y  q11.222  19800001  22100000  gneg
Y  q11.223  22100001  26200000  gpos50
Y  q11.23  26200001  28800000  gneg
Y  q12  28800001  59373566  gvar

Symbols used:

The following symbols are often used with band numbers when describing changes in the karyogram, eg. of cancer cells in ISCN notation.

 , 	Separates chromosome modal number, sex chromosomes, 
        and chromosome abnormalities

 - 	Loss of a chromosome

 ( ) 	Surround structurally altered chromosomes and breakpoints

 + 	Gain of a chromosome

 ++     Multiple signals on one chromosome

 ; 	Separates rearranged chromosomes and breakpoints involving
        more than one chromosome

 / 	Separates cell lines or clones

 // 	Separates recipient and donor cell lines in bone marrow
        transplants

 ~      approximation

 x      multiple copies of chromosomes or regions

 .      Sperates multiple techniques

 amp    amplification

 arr    microarray data

 dim    diminished fluorescence ratio intensity: deletion

 del 	Deletion

 der 	Derivative chromosome (used when only one chromosome from
        a translocation is present, or when one chromosome has two
        or more structural abnormalities).
        Alternative description:
        Structurally rearranged chromosome generated either by a 
        rearrangement involving two or more chromosomes or by multiple
        aberrations within a single chromosome (e.g. an inversion and a
        deletion of the same chromosome, or deletions in both arms of a
        single chromosome).[1] The term always refers to the chromosome
        that has an intact centromere.

 dic 	Dicentric chromosome

 dn 	Chromosomal abnormality not inherited from parents (de novo)

 dup 	Duplication of a portion of a chromosome

 enh 	Enhanced fluorescence ratio intensity: duplication

 fra 	Fragile site (usually used with Fragile-X syndrome)

 h 	Heterochromatic region of chromosome

 hlpa   Multiple ligation-dependent probe amplifications

 hmz    Homozygosity

 htz    Heterozygosity

 i 	Isochromosome (both arms of the chromosome are the same)

 ins 	Insertion of a portion of a chromosome

 inv 	Inversion

 .ish 	Precedes karyotype results from fluorescence in situ
        hybridization (FISH) analysis

 mar 	Marker chromosome (unidentifiable piece of chromosome)

 mat 	Maternally derived chromosome rearrangement

 p 	Short arm of a chromosome

 pat 	Paternally derived chromosome rearrangement

 psu dic 	Only one centromere is active (pseudo dicentric)

 q 	Long arm of a chromosome

 r 	Ring chromosome

 t 	Translocation

 ter 	Terminal end of arm (i.e. 2qter - end of the long arm
        of chromosome 2)

 tri	Trisomy

 trp 	Triplication of a portion of a chromosome 

The software CyDAS seems to be able to work with this kind of karyotype data and there is also a discussion about shortcomings of the nomenclature. (Some of the issues might have been fixed in the mean times.)
There is also a publication by Mascarello et al. about the shortcomings of the system, shown by comparisons how different clinicians used it in a survey analysis. Up to 50% of the notations used for specific cases were incorrect and only 8% of participants used the exact same string to describe a trisomy 21 in uncultured amniocytes.

Sources and further reading:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Encode pilot regions Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/encode-pilot-regions?blog=2 2011-07-28T11:17:00Z 2011-08-04T09:45:51Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

This is the list of genomic regions that was analysed as the 1% of the human genome in the ENCODE pilot phase. (The main phase of ENCODE is looking at the entire human genome.) The coordinates are for assembly NCBI36 (hg18).
See also the entry about ENCODE and the UCSC pages.

Name Chr. Start End Description
ENr231 1 149424685 149924684 Random Picks
ENr131 2 234156564 234656627 Random Picks
ENr331 2 219985590 220485589 Random Picks
ENr112 2 51512209 52012208 Random Picks
ENr121 2 118011044 118511043 Random Picks
ENr113 4 118466104 118966103 Random Picks
ENr212 5 141880151 142380150 Random Picks
ENm002 5 131284314 132284313 Manual Picks:Interleukin
ENr221 5 55871007 56371006 Random Picks
ENr222 6 132218540 132718539 Random Picks
ENr223 6 73789953 74289952 Random Picks
ENr323 6 108371397 108871396 Random Picks
ENr334 6 41405895 41905894 Random Picks
ENm013 7 89621625 90736048 Manual Picks
ENm001 7 115597757 117475182 Manual Picks:CFTR
ENm010 7 26924046 27424045 Manual Picks:HOXA
ENm012 7 113720369 114720368 Manual Picks:FOXP2
ENm014 7 125865892 127029088 Manual Picks
ENr321 8 118882221 119382220 Random Picks
ENr232 9 130725123 131225122 Random Picks
ENr114 10 55153819 55653818 Random Picks
ENr312 11 130604798 131104797 Random Picks
ENr332 11 63940889 64440888 Random Picks
ENm009 11 4730996 5732587 Manual Picks:Beta
ENm011 11 1699992 2306039 Manual Picks:1GF2/H19
ENm003 11 115962316 116462315 Manual Picks:Apo
ENr123 12 38626477 39126476 Random Picks
ENr111 13 29418016 29918015 Random Picks
ENr132 13 112338065 112838064 Random Picks
ENr311 14 52947076 53447075 Random Picks
ENr322 14 98458224 98958223 Random Picks
ENr233 15 41520089 42020088 Random Picks
ENm008 16 1 500000 Manual Picks:Alpha
ENr313 16 60833950 61333949 Random Picks
ENr211 16 25780428 26280428 Random Picks
ENr213 18 23719232 24219231 Random Picks
ENr122 18 59412301 59912300 Random Picks
ENm007 19 59023585 60024460 Manual Picks:Chr19
ENr333 20 33304929 33804928 Random Picks
ENr133 21 39244467 39744466 Random Picks
ENm005 21 32668237 34364221 Manual Picks:Chr21
ENm004 22 30133954 31833953 Manual Picks:Chr22
ENr324 X 122609996 123109995 Random Picks
ENm006 X 152767492 154063081 Manual Picks:ChrX

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Public Ensembl databases Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/public-ensembl-databases?blog=2 2011-07-05T12:22:00Z 2011-08-10T09:52:15Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

A quick reminder of the specifications to connect to the public Ensembl mySQL databases:

Database Server Port
Ensembl (v 24-47)
ensembldb.ensembl.org††
3306
Ensembl (v 48 and above)
ensembldb.ensembl.org 5306
Ensembl Mart martdb.ensembl.org 5316
Ensembl Genomes mysql.ebi.ac.uk 4157
Ensembl (curr. v) in US cloud useastdb.ensembl.org 5306

user = "anonymous"

pass = ""

mysql commandline for connection:

Code

mysql -uanonymous -hensembldb.ensembl.org -P5306


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
SQLite Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/sqlite?blog=2 2011-06-03T13:45:00Z 2013-04-18T13:03:25Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Using the SQLite database Engine

SQLite is different (to MySQL) in a number of ways, the main one being that it is server-less and file-based. The other distinctive features are nicely listed here with pros and cons.

It's an ideal choice if you want to bundle a database with your application, as SQLite is small, platform independent and without any usage restrictions.

It can be accessed with the Perl DBI modules:

Code

my $dbh =
DBI->connect("dbi:SQLite:dbname=$db_file","","")
or die "Unable to connect: $DBI::errstr\n";

visually with the (free) Firefox plugin SQLite Manager or the (paid) application SQLite Maestro or on the command-line by calling:
sqlite db_file_name.db
Special sqlite commands are preceeded by a ".", e.g. to exit type ".exit".

The sql syntax is not identical but very similar. Converter tools are listed here, here are some stackoverflow notes about the topic.

Some compatibility notes: SQLite supports sub queries.
It does not support deletes on joined tables.

To make the output more readable you can:

Code

.header on
.separator \t

To inspect the structure of a database you can use the following commands.
1. list table names:

Code

.tables  #or
.tables table_na%  # "like" pattern matching

2. show the create statement:

Code

.schema table_name  #or
.schema table_na%   #or
SELECT sql FROM sqlite_master WHERE name = 'table_name';

To export all data from a database into files seperated by table you can use the "export table" function in the SQLite Manager, or use the command line if you have many tables:
1. create a file with all table names in your database. (get the name as mentioned above.)
2. Then call sqlite with each to export the data:

Code

cat tables.txt | awk '{print ".mode csv\n.output "$1".txt\nselect * from "$1";"} | sqlite dbname.db

Alternative export formats are column, html, insert, line, list, tabs, tcl

Import of these text files can be done with

Code

.import file.txt table_name

The separator for export and import need to be the same, otherwise you will get errors like

data.txt line 1: expected 10 columns of data but found 1

If there are linebreaks in the data fields, the parsing of the import will break in a similiar way. Try to set the separator to

\t

and not specify

.mode csv

for the export.

Here are some very useful FAQs.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Command line options in Windows Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/windows-command-line?blog=2 2011-05-20T07:57:00Z 2011-06-15T07:57:41Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Missing the lovely Unix command-line tools when working on MS Windows machines, I've been trying a few options to speed up everyday tasks like easy file processing:

  • Cygwin as a Unix emulation. Works fine most of time, but you can feel that it's an alien in the windows environment unless you configure it extensively: the old problem of different line break encodings, the different way to map/list directories.
  • PowerShell. A useful alternative to the windows command window with a window split into command and output screen and an extended command set.
  • UnxUtils. A collection of all those unix tools I missed wrapped up to be usable by the windows command line (grep, ls, head, awk...). Nice!

    Remember to add "UnxUtils\usr\local\wbin" to your PATH.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Using dbVar Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/using-dbvar?blog=2 2011-05-12T17:19:00Z 2013-02-25T10:36:04Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

"Structural variation (SV) is generally defined as a region of DNA approximately 1 kb and larger in size and can include inversions and balanced translocations or genomic imbalances (insertions and deletions), commonly referred to as copy number variants (CNVs). These CNVs often overlap with segmental duplications (regions of DNA >1 kb present more than once in the genome). If present at >1% in a population a CNV may be referred to as copy number polymorphism (CNP)."

Estimates of how much of the human genome are CNVs range from 10-20%.

dbVar is the NCBI database of genomic structural variation designed to store data on variant DNA ≥ 1 bp in size.

The databases ids are organised in the following manner:

  • std: the study id - this identifies a submitted study
  • sv: the structural variant id - this identifies the submitted region of variation
  • ssv: the supporting structural variant id - this identifies the supporting regions of variation (often sample-specific) that were used to call the submitted region of variation
  • The ids are prefixed with 'n' if the study was submitted to NCBI, or 'e' if it was submitted to EBI

This means that multiple experimental results, ie. regions identified from different samples, stored as "supporting variants", are combined into regions that describe these as one "event" and are stored as "variant".

An example: esv10580 includes the supporting variants essv57440, essv75601, essv61475 and others. The individual (GRCh37/hg19) coordinates, e.g.

Chr1	521,413	564,458
Chr1	521,413	564,458
Chr1	521,648	575,095

result in the maximum coordinates for the variant:

Chr1	521,413	575,095

They all belong to the study estd20 by Conrad et al. (2010).

There is a good overview page explaining structural variations and related methods.

Source: dbVar


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Parsing OMIM data Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/parsing-omim-data?blog=2 2011-04-18T09:41:00Z 2012-10-12T11:08:31Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Online Mendelian Inheritance in Man (OMIM) data is a "catalog of human genes and genetic disorders and traits, with particular focus on the molecular relationship between genetic variation and phenotypic expression. It is a phenotypic companion to the Human Genome Project." (omim.org)

To get human disease annotation for your gene data, the fine data from the OMIM database can be downloaded from their FTP site and parsed with one of multiple OMIM parsers within the BioPerl framwork.

I used Christian Zmasek's OMIMparser.pm to get hashes with the ids and names:

Code

use Bio::Phenotype::OMIM::OMIMparser;
 
 
 
$omim_parser = Bio::Phenotype::OMIM::OMIMparser->new(
 
    -genemap  => $omim_genemap,
 
    -omimtext => $omim_all );
 
while ( my $omim_entry = $omim_parser->next_phenotype() ) {
 
  my $numb  = $omim_entry->MIM_number();
 
  my $title = $omim_entry->title();
 
  #remove the gene symbol from the title line
 
  $title =~ s/^.?(\d+) //;
 
  $title =~ s/;.*$//;
 
  #store omim ids by disease names
 
  $omim_names{$title} = $numb;
 
  #store genes and disease names in hash ref by omim id
 
  $omim_ids{$numb}->{'disease'} = $title;
 
  my @symbols = $omim_entry->each_gene_symbol();
 
  $omim_ids{$numb}->{'genes'} = \@symbols;
 
  push(@all_omim, $numb.":".$title);
 
}

If you fall over an exception like this:

------------- EXCEPTION -------------

MSG: 16.13.3 does not make sense: 'arm' or 'cen' missing

STACK Bio::Map::CytoPosition::cytorange BioPerl-1.6.0/Bio/Map/CytoPosition.pm:165

You need to fix an error in the genemap file from OMIM:

line 9053 should be

16.25|2|2|10|16p13.3|CHTF18....

instead of

16.25|2|2|10|16.13.3|CHTF18...

source

OMIM ids are pre-fixed with defined symbols. The explanation what these characters means can be found on their FAQ site or here.

Please note that OMIM band start locations have a 1 bp offset to the definitions e.g. in ENSEMBL (probably from a 0-based coordinate system). The "16p11.2" band below is listed as chr16 28100001 - 34600000 in Ensembl.

OMIM example


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
PAR regions Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/par-regions?blog=2 2011-04-06T15:37:00Z 2013-08-05T15:54:49Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The pseudo-autosomal regions are homologous DNA sequences on the (human) X and Y chromosomes (see wikipedia for more). They allow the pairing and crossing-over of these sex chromosomes the same way the autosomal chromosomes do during meiosis. As these genomic regions are identical between X and Y, they are oftentimes only stored once.

To pull out the coordinates of the pseudo-autosomal regions (PAR) from the Ensembl database, you can perform the following query on the Ensembl core database:

Code

select (select sr.name from seq_region sr where sr.seq_region_id=ae.seq_region_id) as chrom_1, ae.seq_region_start as start_1, ae.seq_region_end as end_1, (select sr.name from seq_region sr where sr.seq_region_id=ae.exc_seq_region_id) as chrom_2, ae.exc_seq_region_start as start_2, ae.exc_seq_region_end as end_2 from assembly_exception ae where ae.exc_type="PAR";

For the human database schema 61 (assembly GRCh37/hg19) you will get where the corresponding region is located:

+---------+----------+----------+---------+-----------+-----------+
| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |
+---------+----------+----------+---------+-----------+-----------+
| Y       |    10001 |  2649520 | X       |     60001 |   2699520 |
| Y       | 59034050 | 59373566 | X       | 154931044 | 155270560 |
+---------+----------+----------+---------+-----------+-----------+

For the old assembly (NCBI36/hg18) you will get:

+---------+----------+----------+---------+-----------+-----------+
| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |
+---------+----------+----------+---------+-----------+-----------+
| Y       |        1 |  2709520 | X       |         1 |   2709520 |
| Y       | 57443438 | 57772954 | X       | 154584238 | 154913754 |
+---------+----------+----------+---------+-----------+-----------+

You can alternatively use the API:

Code

my $aefa = $db->get_AssemblyExceptionFeatureAdaptor();
my $sa   = $db->get_SliceAdaptor;
my $slice = $sa->fetch_by_region("chromosome", "Y");
my @aefs = @{$aefa->fetch_all_by_Slice($slice)};
foreach my $ae (@aefs){
  print $ae->display_id."\t".$ae->start."\t".$ae->end."\n";
}
X	10001	2649520
X	59034050	59373566

or for X:

Y	60001	2699520
Y	154931044	155270560

So to translate from Y to X PAR locations you can use the following for GRCh37 / hg19:

Y 10001 - 2649520      <->  X 60001 - 2699520, band Xp22.33
Y 59034050 - 59373566  <->  X 154931044 - 155270560, band Xq28

and for NCBI36 / hg18:

Y 1 - 2709520          <-> X  1 - 2709520, band Xp22.33
Y 57443438 - 57772954  <-> X  154584238 - 154913754, band Xq28

Please note that these coordinates do not agree with the definitions at the GRC and NCBI. This difference of the PAR-2 end coordinates (chrX:155.260.560 / 155.270.560 or chrY:59.363.566 / 59.373.566) is caused by the 10kb telomeric (gap) region which needs to be included in the PAR-2 definition to correctly represent this arrangement.

See also the telomere & centromer definition notes.

A nice list of official HGNC genes that are located in the pseudo-autosomal regions can be found here.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
PDL: The Perl Data Language Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/pdl-the-perl-data-language?blog=2 2011-03-01T21:56:00Z 2011-05-20T13:10:50Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

It doesn't always have to be R!

The Perl Data Language is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.

It also contains plotting modules.

Install with cpan install PDL or check these descriptions.

Code example for getting basic stats from a few values:

Code

use PDL;
 
 
 
my @numbers = (1,4,6,8,10);
 
my $piddle = pdl(@numbers);
 
my ($mean,$prms,$median,$min,$max,$adev,$rms) = statsover($piddle);
 
 
 
print "Mean=$mean\n".
 
      "Root-mean-square deviation=$prms\n".
 
      "Median=$median\n".
 
      "Min=$min\n".
 
      "Max=$max\n".
 
      "StdDev=$adev\n".
 
      "Population-Deviation=$rms\n\n";

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Running CronJobs Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/running-cronjobs?blog=2 2011-02-04T16:33:00Z 2015-04-13T08:17:16Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

cron is a extremely useful unix utility that allows tasks to be automatically run in the background at regular intervals.

You need the script / command you want to run and the time it should run. You can the use the crontab command to edit the service:

  1. crontab -e Edit your crontab file, or create one if it doesn't already exist.
  2. crontab -l Display your crontab file.
  3. crontab -r Remove your crontab file.

Format of entries:


*     *     *   *    *        command to be executed

-     -     -   -    -

|     |     |   |    |

|     |     |   |    +----- day of week (0 - 6) (Sunday=0)

|     |     |   +------- month (1 - 12)

|     |     +--------- day of        month (1 - 31)

|     +----------- hour (0 - 23)

+------------- min (0 - 59)

Example:

00 03 * * * bash /users/fsk/backup_db.sh

This runs my backup script at 03:00 every day.

*/10 * * * * echo "job done"

This runs an echo every 10 minutes of every hour of every day.

To receive an email with any result from the jobs, add

Code

MAILTO=yourmail@home.com

to the top of the crontab. To discard any output add

Code

>/dev/null 2>&1

to the end of the job line or as the very first line (for all jobs).

Source


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
AnnoTrack: Rails System Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/annotrack-rails-system?blog=2 2011-01-31T13:01:00Z 2011-02-02T09:11:13Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

Please read elsewhere about general Ruby or Rails questions, there are blog entries about Ruby & Rails Terminology, Rails application layout

The AnnoTrack Ruby-on-Rails code can be found in svn/gencode/tracking_system/rails/. Most AnnoTrack-specific code is stored as "plugin" code in the Redmine directory. This means when trying to find a specific piece of code, you have to check the default application directory app, but also the plugin directory

vendor/plugins/redmine_annotrack/app. The language files defining the terminology and browser links used on the websites are

svn/gencode/tracking_system/rails/lang/en.yml and

svn/gencode/tracking_system/rails/vendor/plugins/redmine_annotrack/lang/en.yml/.

In these files an entry like

Code

label_project_new: New Gene

means "if you come across the term label_project_new, display it as New Gene in the browser".

To understand the code underlying specific web pages it is helpful to check the routing entries in

config/routes.rb and vendor/plugins/redmine_annotrack/routes.rb. Specific paths in the browser are mapped to specific functions in the rails code. E.g.:

Code

map.connect 'flags/show_tecs', :controller => 'flags', :action => 'show_tecs'

maps the URL http://annotrack.sanger.ac.uk/human/flags/show_tecs to

the show_tecs function in the file app/controllers/flags_controller.rb.

The list of chromosomes used as well as the different priority values are set on this page.

Some options for links on the transcript pages etc. can be changed through the administration interface.

These previous actions require administrator user rights in the AnnoTrack system. The list of different user right for all groups is shown here.

The documentation pages can be edited with a wiki-style syntax by clicking on the edit pencil on each page.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
AnnoTrack: General Documentation Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/annotrack-general-documentation?blog=2 2011-01-31T11:17:00Z 2011-02-02T09:06:26Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Setting up a new system & adjusting it to your needs

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

The system is flexible enough to be of use for other groups and projects performing genome annotation in a collaborative effort and is therefor provided here. These are notes on how to start a new annotation project with AnnoTrack.

General Redmine installation notes for troubleshooting are here, but all the sourcecode required for AnnoTrack is available here.

Most of the AnnoTrack code is written as a plugin for the Redmine system (rails/vendor/plugins/redmine_annotrack), but since there are some other changes required, which override Redmine's default code, you will need the complete package from this site.

General notes

you will need

  1. a database server (e.g. mysql 5)
  2. ruby on rails installation

    source and help on the official rails page documention for running on Mac OsX (usually pre-installed)

  3. a web server (e.g. Apache) when running in production mode, for testing, the Webbrick server supplied with Rails is fine.
  4. get the AnnoTrack source code and database from this page

    unpack:

    Code

    tar xzvf annotrack.version.tgz

trong>Database

create your database

Code

mysql -u<user> -p<password> -h<host> -P<port> -e"create database annotrack"
 
  mysql -u<user> -p<password> -h<host> -P<port> -Dannotrack < annotrack/database.sql

The main tables fo the database are outlined in this diagram.

Rails server

  • we have frozen the additional external Rails modules used by the application (gems) into the AnnoTrack rails code (rails/vendor/rails/) so you don't necessarily need to install all of them separately.
  • set your environment variables GEM_PATH and RAILS_ENV in your shell or in the file annotrack/rails/config/environment.rb
  • adjust the database configurations file in annotrack/rails/config/database.yml with your settings (production and development if desired)

    additional environments can be created (e.g. for multiple organisms) by adding an entry (e.g. "production_housemouse") and a file in environments (e.g. environments/production_mouse.rb)

  • start the server e.g. on port 6223:

    Code

    cd annotrack/rails
     
    ruby scripts/server -edevelopment -p6223 #(to use the development setup)
  • In a web browser your application will usually be at http://localhost:6223/. Log in as administrator ("admin"/"admin") to set up some initial values.

    The admin interface from Redmine is at DEFAULT_URL/admin, modifications should in particular be made on these pages:

    1. Settings: "Application title", "Welcome text", "Host name"
    2. AnnoTrack settings: "Menu links", "Browsers links", "other settings"

      vendor/plugins/redmine_annotrack/lang/en.yml holds the URL patterns used for browser links.

    3. Flags: define new flags to highlight errors
    4. Users: create & adjust user accounts
  • we have stored a gene with two transcript with two flags for demonstration;

    you can see these by clicking on "Transcripts" at the top of the page and then selecting "View all transcripts".

  • you can create a new gene-level entry manually at DEFAULT_URL/projects/add for testing, in general these will be created by scripts writing directly to the database.

Perl API/scripts

  • You can adjust the settings for your system in the central config.pm file.
  • We use the scripts/cron_jobs.pl file the run automatic updates of the core annotation, to update the stats given on the front page (issue and flag counts), please adjust this to your needs

    Some Perl programming knowledge is required to adjust / write parsers to handle the specific data you will be using.

  • The following additional perl modules (many of which are part of a standard installation) are required to use the AnnoTrack perl API:

    • Bio::Das::Lite
    • MIME::Lite
    • DBI
    • Getopt::Long
    • UNIVERSAL::require
    • Bio::EnsEMBL::DBSQL::DBAdaptor (when accessing Ensembl-style databases)
  • most probably you will have to adjust the source-specific scripts used for data loading and analysis stored in annotrack/perl/modules/annotrack

further hints

  • New genes/transcripts, categories and flags would usually be created by script access. There are functions for all this functionality which is documented "here":/human/docs/core
  • This (/documents/show/8) is a basic *source adaptor* reading data from a tab-delimited file to demonstrate how the modules work.
  • This (/documents/show/10) is an example *source adaptor* to demonstrate a module reading from a database with DBI.
  • Here (/human/docs/core) is the Perl-doc of the AnnoTrack core module.

Further adjustments

to customize the system for your own set-up there are a number of files you can modify:

  • rails/app/views/layouts/base.rhtml: Start page layout
  • rails/vendor/plugins/redmine_annotrack/lang/en.yml: Names and paths to browsers and project-related links
  • we are using a Lucene-based search engine for AnnoTrack, there is a switch option between this and the Redmine-internal search enginge on the Administration/Settings/Annotrack page
  • the scripts annotrack/perl/scripts/cron_jobs.pl.example and annotrack/perl/scripts/cron_queries.sh.example should be adjusted with your environment and run regulary (nightly) to update annotation data, update counts and optimize tables.
  • many "helper variables" are stored in the tmp_values table. Have a look there if stats etc. are not displayed as expected.

Upgrading

General notes on upgrading existing Redmine installations are here.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
AnnoTrack: Web-Server Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/annotrack-web-server?blog=2 2011-01-27T10:02:00Z 2011-02-02T09:30:11Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

AnnoTrack is a Ruby-On-Rails application with is executed by an Apache2 server with the mod-rails (Passenger) plugin. It is living on virtual machines (VM) where we don't run any other services as rails does not play nice with other web-services.

James Smith (webteam) knows most about this, Tim Cutts & Dave Holland (infrastructure management) can help with the VMs.

Access restrictions apply to connect to all the following services and the superuser rights.

There is a test environment on the VM web-annotrack, the production servers are running on two VM clones web-annotrack1 and web-annotrack2. All can be accessed directly with SSH:

Code

ssh web-annotrack
 
cd /var/www/annotrack-app

The different species have their own AnnoTrack/Redmine code installations as there does not seem to be another way to have them running in parallel otherwise:

annotrack-app == human

annotrack-app-mouse == mouse

annotrack-app-zfish == zebrafish

Rails/Passenger requires symbolic links from the root-level to the public folder:

human -> annotrack-app/public/

The test system is visible at http://web-annotrack.internal.sanger.ac.uk:8000

The port and other specific server settings are set in the apache2/sites-available/default file.

Re-starting Rails server:

Code

ssh web-annotrack[1,2]
 
sudo touch tmp/restart.txt

Re-starting entire web server:

Code

ssh web-annotrack[1,2]
 
sudo apache2ctl -k graceful

Service monitoring

The VMs are monitored with vSphere (web access, Windows client available as well) and Nagios (web-annotrack 1 / 2).

The website is also checked by the Montastic monitoring service.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Submitting to EMBLdb Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/submitting-to-embldb?blog=2 2011-01-24T13:31:00Z 2011-02-08T16:16:31Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To submit DNA sequences from capillary (Sanger) sequencing to the public EMBL database, these steps can be taken:

The strategy is to create one submission at the European Nucleotide Archive (ENA) @ EBI Webin submission page and attach a FASTA file with all sequences.

  1. remove low quality sequences. I my case the filter criteria were:

    • max 5 consecutive Ns
    • max 10% Ns
    • min 80bp length
  2. screen for vector contamination:

    • Use NBCI web interface for small sets
    • Use BioPerl for large set: get EMVEC file in EMBL format, convert to FASTA format file with BioPerl

      Code

      my $inseq = Bio::SeqIO->new(
       
            -file   => "<file.dat",
       
            -format => "embl" );
       
      my $outseq = Bio::SeqIO->new(
       
            -file   => ">file.fa",
       
            -format => "fasta" );
       
      while (my $seq = $inseq->next_seq) {
       
        $outseq->write_seq($seq);
       
      }
    • index with formatdb

      To extract sequences from a BLAST database you need an index file (for protein-dbs these files end with the extension: ".pin", for DNA dbs: ".nin"), a sequence file (".psq", ".nsq") and a header file (".phr" and ".nhr"). formatdb turns FASTA files into BLAST databases.

      Code

      formatdb -i emvec.fa -p F -o F

    • run BioPerl Blast with the sequences to be submitted against the EMVEC db:

      Code

      use Bio::Tools::Run::StandAloneBlast;
       
      my @blast_params = (program  => 'blastn', database => 'emvec.dat.fa');
       
      my $blast_hits = run_blast($seq);

      and filter out hits with very low (<0.1) eValues and long sequence hits.

  3. In my case these are submitted as ESTs. Log in to Webin, create a new submission, choose molecule type (eg.g. "EST"), add a reference publication, specify the number of sequences, describe the header (at least one field, eg. clone-identifier, must be specified to be read from the FASTA header), add common values in the small table to be added to add entries (e.g organism "Homo sapiens"), upload your FASTA file.

Sources:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Sequence Contaminations Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/sequence-contaminations?blog=2 2011-01-20T17:11:00Z 2012-11-20T11:42:05Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

When analysing sequences from public databases or from your own sequencer you have to be aware of potential contaminations.

A contaminated sequence is one that does not faithfully represent the genetic information from the biological source organism/organelle because it contains one or more sequence segments of foreign origin. [NCBI]

The primary approach to screening nucleic acid sequences for vector contamination is to run a sequence similarity search against a database of vector sequences. The preferred tool for conducting such a search is NCBI's VecScreen. VecScreen detects contamination by running a BLAST sequence similarity search against the UniVec vector sequence database.

An interactive web-service EMVEC Database BLAST to scan for contamination.

Help with the interpretation of the results of BLAST2 EVEC.

See also this post about submitting to EMBL db and this post about screening NGS reads locally.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
GVF Format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/gvf-format?blog=2 2011-01-11T09:23:07Z 2011-03-01T23:45:03Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Genome Variation Format (GVF) is a file format for describing sequence variants at nucleotide resolution relative to a reference genome. The GVF format was published in Reese et al., Genome Biol., 2010: A standard variation file format for human genome sequences.

GVF is a type of GFF3 file with additional pragmas and attributes specified.

Two examples:

Code

chr16 samtools SNV 49291141 49291141 . + . ID=ID_1;Variant_seq=A,G;Reference_seq=G;Genotype=heterozygous
 
chr16 samtools SNV 49291360 49291360 . + . ID=ID_2;Variant_seq=G;Reference_seq=C;Genotype=homozygous

Code

chr16 samtools SNV 49291141 49291141 . + . ID=ID_1;Variant_seq=A,G;Reference_seq=G;Genotype=heterozygous;Variant_effect=synonymous_codon 0 mRNA NM_022162;
 
chr16 samtools SNV 49302125 49302125 . + . ID=ID_3;Variant_seq=T,C;Reference_seq=C;Genotype=heterozygous;Variant_effect=nonsynonymous_codon 0 mRNA NM_022162;Alias=NP_071445.1:p.P45S;

This is used e.g. by Ensembl to write out "Watson SNPs" from the variation database (ftp).

Source and full specs: Sequenceontology.org


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
QSEQ File Format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/qseq-files-format?blog=2 2011-01-06T16:57:31Z 2011-03-01T23:45:03Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

QSEQ is a plain-text file format for sequence reads produced directly by many current next-generation sequencing machines. The content can be described as follows.

Each record is one line with tab separator in the following format:

- Machine name: unique identifier of the sequencer.

- Run number: unique number to identify the run on the sequencer.

- Lane number: positive integer (currently 1-8).

- Tile number: positive integer.

- X: x coordinate of the spot. Integer (can be negative).

- Y: y coordinate of the spot. Integer (can be negative).

- Index: positive integer. No indexing should have a value of 1.

- Read Number: 1 for single reads; 1 or 2 for paired ends.

- Sequence (BASES)

- Quality: the calibrated quality string. (QUALITIES)

- Filter: Did the read pass filtering? 0 - No, 1 - Yes.

Source: SRA_File_Formats_Guide.pdf


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
GENCODE: Generating release files Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/generate-gencode-freeze-files?blog=2 2011-01-04T09:35:23Z 2011-03-30T12:00:47Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

These are notes about the data handling steps involved in creating the GTF files released by the GENCODE project and submitted to the DCC. (Valid as of February 2011)

For general information and data access please visit the project website at http://www.gencodegenes.org, this blog post or the AnnoTrack annotation tracking system.

A. Input sources

-ensembl core database with gene models, stable ids and xrefs

-vega database of same release for id-lookup

-3-way pseudogene file with gene ids:

from Yale, based on pre-dump file from same release (using the newfullmerge.pl script)

-2-way (Yale/UCSC) pseudogene file with full locations and 2 sets of ids (from Yale)

-level-1 (and level-4 if defined) transcript file containing stable-ids

-optional file with additional annotation remarks

-file from HGNC web site with columns

HGNV-ID, gene_symbol, Pubmed-IDs, Vega-ID

-RefSeq NP / NM mapping from current xref database (from Ensembl core team):

Code

mysql -uensro -hens-research -Dianl_human_xref_release_61
 
  -e'select accession1, accession2 from pairs where accession1 like "NP%" and accession2 like "NM%"'
 
  > RefSeq_relations.txt

B. Code to use

svn/gencode/scripts/data_release/newfullmerge.pl

		      .../write_class_file.pl

		      .../gencode_addmetadata.pl

svn/gencode/modules/Gencode/Ensembl2GTF.pm

C. Procedure

Create directory where output files are written to and the following input files are placed:

3-way_consensus_pseudogenes.txt, classes.def, validated_level_1_ids.txt

The paths to these are needed in the newfullmerge.pl script...

mkdir /work/dir/gencode_7

for LSF output files:

mkdir /work/dir/gencode_7/outfiles

dump annotation data (using main chromosomes only)

Code

foreach chr ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT )
 
    bsub -o /work/dir/gencode_7/outfiles/gencode_$chr.out perl svn/gencode/scripts/data_release/newfullmerge.pl -basedir /work/dir/gencode_7 -chrom $chr
 
end

check jobs

Code

grep -c "^Successfully" gencode_*out

update PAR region (We are currently writing out X and Y PAR regions separately. They are stored only once in the Ensembl db though, so the ids need to be made non-redundant with this step)

Code

perl svn/gencode/scripts/data_release/update_y_ids.pl -x gencode_X.gtf -y gencode_Y.gtf -out gencode_YY.gtf

create joined file

add header to release file gencode.v7.annotation.gtf:

##description: evidence-based annotation of the human genome (GRCh37),

 version 7 (Ensembl 62)

##provider: GENCODE

##contact: gencode@sanger.ac.uk

##format: gtf

##date: 2011-03-23

Code

foreach chr ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X YY MT )
 
  cat gencode_$chr.gtf >> gencode.v7.annotation.gtf
 
end

check gene and transcripts numbers(compare to previous release and database ignoring haplotype regions etc.)

Code

awk '{if($3=="gene"){g++}else{if($3=="transcript"){t++}}} END{print "genes: "g"\ntranscripts: "t"\n"}' gencode.v7.annotation.gtf

check tags (annotation remarks)(compare to previous release)

Code

foreach t ( seleno pseudo_consens CCDS mRNA_start_NF mRNA_end_NF cds_start_NF cds_end_NF non_org_supp exp_conf PAR alternative_3_UTR alternative_5_UTR readthrough NMD_exception not_organism-supported not_best-in-genome_evidence non-submitted_evidence upstream_ATG downstream_ATG upstream_uORF overlapping_uORF NAGNAG_splice_site non_canonical_conserved non_canonical_genome_sequence_error non_canonical_other non_canonical_polymorphism non_canonical_U12 non_canonical_TEC )
 
  echo -n $t"\t"; awk '{if($3=="transcript"){print $0}}' gencode.v7.annotation.gtf | grep -c "$t"
 
end

split by level (level 1/2 and 3 are displayed as two sep. tracks in the UCSC browser)

Code

awk '{if($26=="3;"){print $0}}' gencode.v7.annotation.gtf | awk '{if($3!="gene"){print $0}}' > gencode.v7.annotation.level_3.gtf
 
awk '{if($26!="3;"){print $0}}' gencode.v7.annotation.gtf | awk '{if($3!="gene"){print $0}}' > gencode.v7.annotation.level_1_2.gtf

make class file (data loading at UCSC requires a mapping of all gene and transcripts id to a level and a type)

Find classes not yet defined:

Code

grep -h "^Class not defined" gencode_*.out | sort -u

add these manually to the classes.def file. Write out new lists:

Code

perl svn/gencode/scripts/data_release/write_class_file.pl -in gencode.v7.annotation.level_1_2.gtf -class classes.def -out gencode.v7.annotation.level_1_2.classes
 
perl svn/gencode/scripts/data_release/write_class_file.pl -in gencode.v7.annotation.level_3.gtf -class classes.def -out gencode.v7.annotation.level_3.classes

generate meta-data

perl svn/gencode/scripts/data_release/gencode_addmetadata.pl

requires list of new PAR region IDs

generate tRNAs

Code

bsub -o trna.out perl svn/gencode/scripts/data_release/newfullmerge.pl -trna -out gencode.v7.tRNAs.gtf

[622 lines]

Code

nice perl svn/gencode/scripts/data_release/write_class_file.pl -in gencode.v7.tRNAs.gtf -class classes.def -out gencode.v7.tRNAs.classes -types tRNAscan

generate polyAs

Code

nice perl svn/gencode/scripts/data_release/dump_polyAs.pl -out gencode.v7.polyAs.gtf

[28966 lines]

Code

nice perl svn/gencode/scripts/data_release/write_class_file.pl -in gencode.v7.polyAs.gtf -class classes.def -out gencode.v7.polyAs.classes

re-format 2-way pseudogenes (from Yale) NEEDS UPDATING

(create header)

Code

awk 'BEGIN{c=0} {print $1"\tYale_UCSC\ttranscript\t"$2"\t"$3"\t.\t"$4"\t.\tgene_id \"Overlap"c"\"; transcript_id \"Overlap"c"\"; gene_type \"pseudogene\"; gene_status \"UNKNOWN\"; gene_name \"Overlap"c"\"; transcript_type \"pseudogene\"; transcript_status \"UNKNOWN\"; transcript_name \"Overlap"c"\"; level 3; tag \"2way_pseudo_cons\"; yale_id \""$5"\"; ucsc_id \""$6"\"; parent_id \""$7"\";"; c++}' yale_ucsc_2way_consensus >> gencode.v7.2wayconspseudos.GRCh37.gtf
 
nice perl svn/gencode/scripts/data_release/write_class_file.pl -in gencode.v7.2wayconspseudos.GRCh37.gtf -class classes.def -out gencode.v7.2wayconspseudos.GRCh37.classes -types transcript

create transcript sequence files

Code

foreach chr ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT )
 
    bsub -o seqs/trans_$chr.out perl svn/gencode/scripts/data_release/newfullmerge.pl -outfile seqs/trans_$chr.fa -ass GRCh37 -sequence -chrom $chr
 
end

create protein sequence files

Code

foreach chr ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT )
 
    bsub -o seqs/prot_$chr.out perl svn/gencode/scripts/data_release/newfullmerge.pl -outfile seqs/prot_$chr.fa -ass GRCh37 -sequence -protein -chrom $chr
 
end

update PAR regions in sequence files

Code

nice perl svn/gencode/scripts/data_release/update_y_ids.pl -fasta -x gencode_X.gtf -y seqs/trans_Y.fa -out seqs/trans_YY.fa
 
nice perl svn/gencode/scripts/data_release/update_y_ids.pl -fasta -x gencode_X.gtf -y seqs/prot_Y.fa -out seqs/prot_YY.fa

combine sequence files

Code

foreach chr ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X YY MT )
 
  cat seqs/prot_$chr.fa >> gencode.v7.pc_translations.fa
 
  cat seqs/trans_$chr.fa >> gencode.v7.pc_transcripts.fa
 
end

files to release to the DCC

gencode.v7.annotation.level_1_2.gtf	 

gencode.v7.annotation.level_1_2.classes  

gencode.v7.annotation.level_3.gtf	

gencode.v7.annotation.level_3.classes	

gencode.v7.polyAs.gtf

gencode.v7.polyAs.classes

gencode.v7.2wayconspseudos.gtf

gencode.v7.2wayconspseudos.classes

metadata/

  gencode_Exon_supporting_feature

  gencode_HGNC

  gencode_PDB

  gencode_Pubmed_id

  gencode_RefSeq

  gencode_Source

  gencode_SwissProt

  gencode_Transcript_supporting_feature

Code

tar -czvf gencode7_GRCh37.tgz gencode7
 
cp gencode7_GRCh37.tgz PUB_FTP/gencode/release_7/gencode7_GRCh37.tgz

It can take up to 20 minutes before the files are visible on the public FTP site.

These additional files are added to the FTP sites individually for general users:

gencode.v7.annotation.gtf.gz

gencode.v7.pc_transcripts.fa.gz

gencode.v7.pc_translations.fa.gz

gencode.v7.polyAs.gtf.gz

gencode.v7.tRNAs.gtf.gz

Code

nice gzip -c gencode.v7.pc_transcripts.fa > gencode7_GRCh37.tgz PUB_FTP/gencode/release_7/gencode.v7.pc_transcripts.fa.gz

etc.

Other notes:

  • After every Havana/Ensembl merge a new OTT-/ENS ID mapping should be generated and loaded into the AnnoTrack tracking system. This can be done with the script
    svn/gencode/scripts/store_id_conversion.pl

    which will read the GTF file or a list of ids and create the SQL statements. It's better to use a release file with no versions in the Ensembl ids as the others can not be linked to the Ensembl web site directly and the "." might break some functions in AnnoTrack. Please remember this might create links to ids that are not yet "valid" until the official Ensembl release date.

    Code

    perl svn/gencode/scripts/store_id_conversion.pl -gtf -infile gencode.v7.annotation.gtf -out new_id_conversions.sql
     
    mysql -h -P -u -p -D gencode_tracking < new_id_conversions.sql
  • Also the external annotations in AnnoTrack should be updated from the new ensembl database. These are stored as custom_values with this script:

    Code

    bsub -q long -o job.out perl svn/gencode/tracking_system/perl/scripts/update_external_info.pl
     
                 -coredb homo_sapiens_core_61_37f
     
                 -comparadb ensembl_compara_61
     
                 -ontologydb ensembl_ontology_61

    This is looking at the live-mirror dbs by default, so either modify this or run this after the Ensembl release date.

  • Selenocysteine tags are now read directly from the database, to pull them out separately for other reasons into a file you can do:

    Code

    mysql -uensro -hens-livemirror -Dhomo_sapiens_core_60_37e -e"select tsi.stable_id, ta.value from translation_attrib ta, transcript_stable_id tsi, translation tl where tl.transcript_id=tsi.transcript_id and tl.translation_id=ta.translation_id and ta.attrib_type_id=12 order by stable_id;" | awk '{print $1"\t"$2}' > selenocystein.transcripts

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
FASTQ Sequence Files Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/fastq-sequence-files?blog=2 2010-12-15T09:52:00Z 2012-06-11T15:37:46Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

A good description of the FASTQ format can be found at Illumina:

"A fastq file is an ASCII encoded text file that stores DNA or RNA sequences and their corresponding IDs and quality scores. It uses unix newlines and consists of 4 lines per sequence unless wrapping occurs due to sequence length. The first line begins with an "@" followed by an identifier (ID) which acts as a label for the read/sequence, plus index and read (pair) numbers. Read numbers are 1 for single reads and 1 or 2 for paired reads. The second line represents a DNA or RNA sequence, and should consist only of standard bases, and IUPAC ambiguity codes (ACTGNURYSWKMBDHV). This line must be wrapped with newlines if the read is longer than 80nt. The third line must be a single "+" which signifies the end of the sequence, optionally followed by the identifier again. The fourth line is a quality score string showing the quality of each base in the prior sequence, represented as the ASCII character corresponding to the quality Phred score + 33. Phred scores must be 0 and 60 (ASCII chars 33 aka "!" to 93 aka "]"). The quality score must also be wrapped to multiple lines if longer than 80 characters, but must be exactly equal in length to it's corresponding sequence."

Example:

@READNAME[#index]/read_number

BASES

+READNAME[#index]/read_number

QUALITIES

As a sanity and QC check the DCC of the 1000 genomes project applies the following rules (source):
Syntax Checks:
-Each header line begins with @
-The third line always starts with a +
-There are four lines in each entry (implied by the above two rules)
-On line3, if a name follows the + sign, the name has to match the one found in line1
-The sequence and quality lines are the same length
-For paired end files, the _1 and _2 files have the same number of reads in them.
-For SOLID colourspace fastq, each read starts with a base followed by a string of numbers

Sequence Checks:
-Read is longer than 35bp for Solexa, 25bp for Solid, and 30 bp for 454
-Read does not contain any N's in the first 25, 30 or 35bp
-Quality values are all 2 or higher in the first 25bp, 30bp or 35bp
-The reads contain more than one type of base in the first 25, 30, or 35bp
-Read does not contain more than 50% Ns in its whole length
-Read does not contain characters other than ATGCN (this rule does not apply to SOLID reads)

Taking it further:


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Ensembl Core Database Schema Diagram Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ensembl-core-database-schema-diagram?blog=2 2010-11-26T09:23:00Z 2013-11-19T15:26:17Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To understand the concept of Ensembl and learn how to query the tables I find it extremely useful to have a schema diagram of the database in front of me.

This can be generated by using the schema.sql and foreign_keys.sql files from the sql directory of the Ensembl API cvs checkout. After loading this data into a program like the free MySQL Workbench the tables and connections can be arranged to your liking.

Here is a pdf version I created based on Ensembl core 59 with the MySQL Workbench file.

UPDATE:
Nice schema diagrams and a description of the different tables can now be found on the Ensembl pages!


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Genomic Start Coordinates Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/genomic-start-coordinates?blog=2 2010-10-23T19:01:00Z 2012-05-17T13:03:51Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Adding to the confusion about different notations of phases/frames, the start coordinates of genomic features are also noted differently between different genome browsers and file formats.

1. One-based

Counting bases starting with "1" at the first position.

Regions are specified by a "closed interval." Used e.g. by the Ensembl genome browser and annotation system, the GFF/GTF, SAM and wiggle file formats.

2. Zero-based

The interbase system counts spaces starting with "0" at the first position.

Regions are specified by a "half-closed-half-open interval". Used by the UCSC genome browser, Chado (the fruitfly browser), the BED, BAM and PSL file formats.

An example:

    One-based


     1 2 3 4 5 6

     | | | | | |

     C G A T G C

    | | | | | | |

    0 1 2 3 4 5 6


    Zero-based

The ATG interval would be described from 3-5 in the first, from 2-5 in the second system.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Gene Models (and the Central Dogma of Molecular Biology) Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/gene-models-and-gene-prediction?blog=2 2010-10-23T18:50:08Z 2011-05-13T08:22:19Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

What is a Gene Model?

I found the following text on the teaching pages of Prof. Ann Loraine and found it worth repeating (slightly modified) here:

Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically, with evidence-based gene-prediction programs, we use information from EST s (expressed sequence tags) , cDNAs or RNASeq reads to evaluate or create gene models. Alternatively models can be derived from the genomic sequence alone, looking for well-known characteristics (open-reading frames, splice-sites, stops, etc.) of the sequence of genes. This approach is called ab-initio gene prediction.

Itís important to remember at all times that a gene model is only that: a model.

To understand what a gene model represents, you need to refresh your memory about how transcription, RNA splicing, and polyadenylation operate.

Most protein-coding genes in eukaryotic organisms (like humans, the research plant Arabidopsis thaliana, fruit flies, etc.) are transcribed into RNA by an enzyme complex called RNA polymerase II, which binds to the five prime end of a gene in its so-called promoter region. The promoter region typically contains binding sites for transcription factors that help the RNA polymerase complex recognize the position in the genomic DNA where it should begin transcription. Many genes have multiple places in the genomic DNA where transcription can begin, and so transcripts arising from the same gene may have different five-prime ends. Transcripts arising from the same gene that have different transcription start sites are said to come from alternative promoters.

Once the RNA polymerase complex binds to the five prime end of gene, it can begin building an RNA copy of the DNA sense strand via the process known as transcription. The ultimate product of transcription is thus called a transcript. During and after transcription, another large complex of proteins and non-coding RNAs called the spliceosome attaches to the growing RNA molecule, cuts out segments of RNA called introns, and joins together (splices) the flanking sequences, which are called exons. Not every newly synthesized transcript is processed in this way; sometimes no introns are removed at all. Genes whose products do not undergo splicing are often called single ?exon genes.

Also, splicing may remove different segments from transcripts arising from the same gene. This variability in splicing patterns is called alternative splicing. In addition to splicing, RNA transcripts undergo another processing reaction called polyadenylation.

In polyadenylation, a segment of sequence at the 3-prime end of the RNA transcript is cut off, and a polymer consisting of adenosine residues called a polyA tail is attached to the 3-prime end of the transcript. The length of polyA tail may vary a lot from transcript to transcript, and the position where it is added may also differ. Genes whose transcripts can receive a polyA tail at more than one location are said to be subject to alternative polyadenylation or alternative 3?prime end processing. One of the functions of this polyA tail is thought to be increased stability of the transcript.

These processing reactions are believed to take place in the nucleus. Ultimately, most of the mature or maturing RNA transcripts are exported from the nucleus into the cytoplasm, where they will be translated by ribosomes into proteins, chains of amino acids that perform work in the cell (such as enzymes) or that provide form and structure (like actin in the cytoskeleton).

The continuous sequence of bases in an RNA that encode a protein is called a coding region, and the coding typically starts with an AUG codon and terminates with one of three possible stop codons. The segments of sequence that comprise a coding region are called CDSs and they generally occupy the same sequences as the exons, apart from the regions five and three prime of the start and stop codons, respectively.

Most RNAs code for one protein sequence, but there are some interesting exceptions in which one mature mRNA may contain more than one translated open reading frame. The three bases where the ribosome initiates translation are called a start codon and the triplet of bases immediately following the last translated codon are called the stop codon. The start codon encodes the amino acid methionine, typically, and the stop codon doesnít code for any amino acid.

A gene model thus consists of a collection of introns and exons and their locations in the genomic sequence, as well as the location of the translated region or region. Thus, a gene model implies a theory about where the RNA polymerase started transcription, as well as the location of the polyadenylation site and the starts and stops of translation. Usually, we draw gene models as showing the location of introns and exons relative to the genomic sequence, as if we are mapping the RNA copy back onto the genomic DNA itself.

This text nicely describes the classical central dogma of (molecular) biology (DNA -> RNA -> protein), what gene models are and some thoughts about gene prediction at the same time...


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Repeat Finding and Masking Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/repeat-finding-and-masking?blog=2 2010-10-13T10:27:00Z 2015-07-21T16:26:17Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

What are genomic interspersed repeats? [from the RepeatMasker docu] In the mid 1960's scientists discovered that many genomes contain stretches of highly repetitive DNA sequences ( see Reassociation Kinetics Experiments, and C-Value Paradox ). These sequences were later characterized and placed into five categories:

  1. Simple Repeats - Duplications of simple sets of DNA bases (typically 1-5bp) such as A, CA, CGG etc.
  2. Tandem Repeats - Typically found at the centromeres and telomeres of chromosomes these are duplications of more complex 100-200 base sequences.
  3. Segmental Duplications - Large blocks of 10-300 kilobases which are that have been copied to another region of the genome.
  4. Interspersed Repeats
    1. Processed Pseudogenes, Retrotranscripts, SINES - Non-functional copies of RNA genes which have been reintegrated into the genome with the assitance of a reverse transcriptase.
    2. DNA Transposons
    3. Retrovirus Retrotransposons
    4. Non-Retrovirus Retrotransposons ( LINES )

Currently up to 50% of the human genome is repetitive in nature and as improvements are made in detection methods this number is expected to increase.

Software for repeat identification

  • The best known program is RepeatMasker (Adrian Smit, Washington University), that screens DNA sequences for interspersed repeats and low complexity DNA sequences. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green. Alternatively WU-BLAST can be used for faster processing. A web-based analysis can be carried out at repeatmasker.org, but the sequence size limit is 100kb here.
  • Recon and RepeatScout are other (less well-maintained) de novo repeat-finding software packages
  • Dust is a program for filtering low complexity regions from nucleic acid sequences, has been used within BLAST for many years. (Paper in J. Comp. Biol.)
  • TRF is the "Tandem repeats finder". (Paper in NAR)

The default parameters for RepeatMasker as part of the Ensembl gene-prediction pipeline e.g. mouse are:

-nolow -species mouse -s

Further reading: Table in Nature with different programs. See also Tarailo-Graovac and Chen "Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences" in Current Protocols in Bioinformatics, March 2009. See also this RepeatMasker readme at animalgenome.org.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
RNA-Seq data quality scores Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/rna-seq-data-quality-scores?blog=2 2010-02-26T12:52:00Z 2013-04-22T16:09:41Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

There are different ways to encode the quality scores in FASTQ files from Next-generation sequencing machines. It is important to find out before using the data and to convert between formats if necessary.

  • Sanger format can encode a [[Phred quality score]] from 0 to 93 using [[ASCII]] 33 to 126 (although in raw read data the Phred quality score rarely exceeds 60, higher scores are possible in assemblies or read maps).
  • Illumina 1.3+ format can encode a [[Phred quality score]] from 0 to 62 using [[ASCII]] 64 to 126 (although in raw read data Phred scores from 0 to 40 only are expected).
  • Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using [[ASCII]] 59 to 126 (although in raw read data Solexa scores from -5 to 40 only are expected)

  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126


 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Source: wikipedia

For a simple look-up from ASCII to numeric scores you can use the following list:

ASCII	numeric		ASCII	numeric
!	0		@	31
"	1		A	32
#	2		B	33
$	3		C	34
%	4		D	35
&	5		E	36
'	6		F	37
(	7		G	38
)	8		H	39
*	9		I	40
+	10		J	41
,	11		K	42
-	12		L	43
.	13		M	44
/	14		N	45
0	15		O	46
1	16		P	47
2	17		Q	48
3	18		R	49
4	19		S	50
5	20		T	51
6	21		U	52
7	22		V	53
8	23		W	54
9	24		X	55
:	25		Y	56
;	26		Z	57
<	27		[	58
=	28		\	59
>	29		]	60
?	30		^	61

You can convert the Solexa read quality to Sanger read quality with Maq:

maq sol2sanger s_1_sequence.txt s_1_sequence.fastq

where s_1_sequence.txt is the Solexa read sequence file. Missing this step will lead to unreliable SNP calling when aligning reads with Maq.

Source: maq-manual

Phred itself is a base calling program for DNA sequence traces developed during the initial automation phase of the sequencing of the human genome.
After calling bases, Phred examines the peaks around each base call to assign a quality score to each base call. Quality scores range from 4 to about 60, with higher values corresponding to higher quality. The quality scores are logarithmically linked to error probabilities, as shown in the following table:

Phred quality	Probability of		Accuracy of
score		wrong base call		base call
10 		1 in 10 		90%
20 		1 in 100 		99%
30 		1 in 1,000 		99.9%
40 		1 in 10,000 		99.99%
50 		1 in 100,000 		99.999%

"High quality bases" are usually scores of 20 and above ("Phred20 score").

You can read the original publications about the Phred program and scoring by Brent Ewing et al. from Phil Green's lab here and here.

Source: www.phrap.com


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
ENCODE cell lines Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/encode-cell-lines?blog=2 2010-02-24T15:08:42Z 2010-08-25T14:28:50Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

These are some of the cell lines that are used in the various analysis of the ENCODE project. The first two are so-called tier-1 lines and covered by all the different types of experiments within ENCODE, the others are tier-2 lines, additionally there are a number of tier-3 cell lines.

  • GM12878 is a lymphoblastoid cell line produced from the blood of a female donor with northern and western European ancestry by EBV transformation. It was one of the original HapMap cell lines and has been selected by the International HapMap Project for deep sequencing using the Solexa/Illumina platform. This cell line has a relatively normal karyotype and grows well. Choice of this cell line offers potential synergy with the International HapMap Project and genetic variation studies. It represents the mesoderm cell lineage.
  • K562 is an immortalized cell line produced from a female patient with chronic myelogenous leukemia (CML). It is a widely used model for cell biology, biochemistry, and erythropoiesis. It grows well, is transfectable, and represents the mesoderm linage.
  • HepG2 is a cell line derived from a male patient with liver carcinoma. It is a model system for metabolism disorders and much data on transcriptional regulation have been generated using this cell line. It grows well, is transfectable, and represents the endoderm lineage.
  • HeLa-S3 is an immortalized cell line that was derived from a cervical cancer patient. It grows extremely well in suspension and is transfectable. It represents the ectoderm lineage. Many data sets were produced using this cell line during the pilot phase of the ENCODE Project. In addition, these cells have been widely used in biochemical and molecular genetic studies of gene function and regulation.
  • HUVEC (human umbilical vein endothelial cells) have a normal karyotype and are readily expandable to 108-109 cells. They represent the mesoderm lineage.
  • Keratinocytes have a normal karyotype and are readily expandable to 108-109 cells. They represent the ectoderm lineage.
  • H1 human embryonic stem cells.

Source

Full list


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
AnnoTrack: Data maintanance Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/annotrack-notes?blog=2 2009-12-08T12:26:12Z 2011-02-08T15:05:51Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

This document is part of the administrator documentation for the AnnoTrack software for genome annotation tracking.

Regular updates

The following Perl scripts update the data and re-set priorities and flags. They usually update Havana annotation data, but all other sources can be checked as well by activating the entry in the config file. They run as cron-job every night, but can also be run manually if needed. The cron-job is executed from svn/gencode/tracking_system/perl/scripts/cron_jobs.pl.

The general procedure (which can be also used to push new data into the system) is:

  1. update.pl - updating all sources specified in the config
  2. set_priorities.pl - update active flags to transcripts and set appropriate priorities. This also adds new flag types to the tmp_values table to keep count.
  3. set_relations.pl - create links between flagged transcripts in the same genomic region
  4. cron_queries.sh - SQL queries that update the counts that are shown on the from page

Common parameters are:

  1. env defines the target database as

    • prod or human: main human production database
    • dev: test database with human data on the mcs4a db server
    • zfish: zebrafish production db
    • mouse: mouse production db
  2. write: connect with write access and store changes in chosen db
  3. verbose: write (a lot) of output for testing

Running data update scripts:

Code

perl svn/gencode/tracking_system/perl/scripts/update.pl -env proc -core -write

Run as farm job, sources to be updated defined in config.pm. Set active flags and priorities based on flags:

Code

perl svn/gencode/tracking_system/perl/scripts/set_priorities.pl -env proc -core -write

Specific updates

  1. Our Ensembl friends regularly compare CCDS, exon, intron and cDNA features between Ensembl and Havana annotations. This will generate text files with locations and IDs that need to be reloaded into AnnoTrack. There are specific source modules for these files, so adjusting the config.pm file (for the affected source definition: pointing the "file" hash entry to the new file and setting the "active" flag to "1") and running update.pl script should be sufficient.
  2. After every Havana/Ensembl merge a new OTT-/ENS-ID mapping should be generated and loaded into the AnnoTrack tracking system. This can be done with the script svn/gencode/scripts/store_id_conversion.pl which will read the GTF file or a list of ids and create the SQL statements.

    Code

    perl svn/gencode/scripts/store_id_conversion.pl -gtf -infile current_freeze.gtf -out new_id_conversions.sql
     
    mysql -h -P -u -p -D gencode_tracking < new_id_conversions.sql

Adding new data

  1. Importing Ensembl objects

    If an important gene model is missing from Havana but was annotated by Ensembl an import into AnnoTrack can be accomplished easily with the script svn/gencode/tracking_system/perl/scripts/import_from_ensembl.pl with the following options:

    Code

    perl import_from_ensembl.pl -user Felix
     
                                -category Ensembl
     
                                -gene ENSG00012048
     
                             (or)
     
                                -transcript ENST00309486
     
                                -flag manual_selection
     
                                -note "important gene"

    Setting a flag (with the chosen flag-name) and adding a note (that will be displayed next to the flag) are optional.

  2. Importing via DAS

    A number of GENCODE sources were imported from external DAS servers. For updates or new sources these source adaptors should be checked at svn/gencode/tracking_system/perl/modules/gencode_tracking_system/sources/

  3. Importing from a file

    There are source adaptors for reading tab-delimited files (tab_file.pm) and GTF files (which can also used for GFF3). You might have a look at the source code of the parser in case it needs slight modifications to read your file format.

  4. Importing via other sources

    If there are new types of data sources not fitting above categories a new source-adator has to be created. The best way for this is to copy and modify an existing one from svn/gencode/tracking_system/perl/modules/gencode_tracking_system/sources/.

  5. Creating new entries through the web interface is possible but not recommended. A gene can be added on this admin page (Trackers: only Features is required, Modules: only Issue tracking is required), transcripts can then be added using the URL format "annotrack.sanger.ac.uk/human/projects/NEW-GENE-ID/issues/new".

For all imports with the update.pl script an entry describing the new data source needs to be created in the svn/gencode/tracking_system/perl/modules/gencode_tracking_system/config.pm config file. A hash "%OTHER_SERVERS" contains an entry for every source name with the parameters required:

  • active - set to "1" to include the source in the update procedure, all others should be set to "0"
  • dns/type/proxy - the server definitions for DAS sources
  • user_name - the login name from the users table
  • category - a name for the new data, usually the same as the source name itself
  • detached -
  • by_chrom - does the update need to be performed chromosome-by-chromosome? (for slow DAS servers)
  • description - a short description of the data source
  • update_function - name of the module used for the update, e.g. "gtf" or "missing_ccds"
  • data_type - name of the feature type, e.g. "UCSC_novel_genes"

Working with flags

Flags are the most important features of the system, they define what problems we are focusing on.

New flags can be set:

  • Through the web interface (see image 1) by any logged-in user by clicking on "add flag" on a transcript page
  • Through the web interface using a list of IDs (eg. "OTTHUMT00000334332") with this form.
  • Through the Perl script svn/gencode/tracking_system/perl/scripts/set_flags_from_file.pl. Other scripts (eg. import_from_ensembl.pl can also have an option to set new flags to the features they are working with.

If the same type of flag was already set and not resolved yet, the scripts should NOT set another flag.

To resolve flags

  • On the web interface the flags for every transcript can be resolved individually by clicking on the check/deny images next to them
  • or multiple flags at once by activating the checkbox and clicking on the check/deny images below the list of flags
  • programmatically, multiple flags can be resolved with the script svn/gencode/tracking_system/perl/scripts/resolve_flags.pl and a text file of solutions. Please check the perldocs.

resolve flags

image 1: resolving flags through the web interface

New types of flags can be created here. This creates an entry in the flags tale (with the issue_id=-1) and in the tmp_values table where stats are stored. Also check the list of all flag types and their priorities.

The description of flag types can be updated here.

In general it's a good idea to run new updates / imports against the development environment / test database first (by setting $ENV = "dev" in the config file or using the -dev env parameter for scripts). Changes can than by checked in the database or a test server first (at the Sanger at http://web-annotrack.internal.sanger.ac.uk:8000/human/.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Conditional Formatting in Ms Excel Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/conditional-formatting-in-ms-excel?blog=2 2009-11-12T14:20:01Z 2009-11-16T10:34:34Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

To change the format of a cell based on the content of that or another cell conditional formatting can be used.

  1. For simple things and up to three options the dialog "Format"-"Conditional Formatting" can be called after selecting the target cell. You can select
    • "Value" to use the content of the cell
    • "Formula" to insert any Excel formula, eg. =FIND("needle", A3)
    and then choose the desired style (font, background etc.).
  2. For other functions you can write a Macro in VBA (Visual Basic for Applications). Choose "Tools"-"Macro"-"Visual Basic Editor". In the editor right click on the "VBSProject" in the project box and add a module. Code away, an example to change the background color based on the occurence of certain strings is given below. This can be run directly from the editor or from the worksheet ("Tools"-"Macro") menu.

Code

Sub Color_groups()
 
 
 
    Set MyPlage = Range("A2:A1000")
 
 
 
    For Each Cell In MyPlage
 
 
 
        If InStr(1, Cell.Value, "Vic_") Then
 
 
 
            Cell.Interior.ColorIndex = 3
 
 
 
        ElseIf InStr(1, Cell.Value, "Tyl_") Then
 
 
 
            Cell.Interior.ColorIndex = 4
 
 
 
        ElseIf InStr(1, Cell.Value, "Wol_") Then
 
 
 
            Cell.Interior.ColorIndex = 6
 
 
 
        ElseIf InStr(1, Cell.Value, "Sim_") Then
 
 
 
            Cell.Interior.ColorIndex = 7
 
 
 
        ElseIf InStr(1, Cell.Value, "Sea_") Then
 
 
 
            Cell.Interior.ColorIndex = 8
 
 
 
        ElseIf InStr(1, Cell.Value, "Mar_") Then
 
 
 
            Cell.Interior.ColorIndex = 15
 
 
 
        ElseIf InStr(1, Cell.Value, "Lio_") Then
 
 
 
            Cell.Interior.ColorIndex = 17
 
 
 
        End If
 
 
 
    Next
 
 
 
End Sub

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Caching in ENSEMBL Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/caching-in-ensembl?blog=2 2009-11-11T17:49:00Z 2012-05-17T13:09:16Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

How to avoid falling in the cache...

Caching is a powerful way to speed up queries to the Ensembl database. It can get problematic however for example if you are repeating a query multiple times, but have updated the data set in between. It is important to know how to turn caching off if needed - this is not officially documented though.

To turn the caching off on the mysql server

Code

my $sa = $reg->get_adaptor($species,"core","slice");
 
    my $sth = $sa->dbc->db_handle->prepare("SET SESSION
 
query_cache_type = OFF");
 
    $sth->execute || die "set session failed\n";

Reset caches in Perl API

Code

sub free_caches{
 
  my $species = shift;
 
  my $group = shift;
 
 
 
  foreach my $adap (@{$registry->get_all_adaptors(-species =>
 
$species, -group => $group)}){
 
    $adap->{'_slice_feature_cache'} = undef;
 
 
 
    if(defined($adap->{'cache'})){
 
      $adap->{'cache'} = undef;
 
    }
 
 
 
    if(defined($adap->{'seq_region_cache'})){
 
      my $seq_region_cache = $adap->{'seq_region_cache'} =
 
        Bio::EnsEMBL::Utils::SeqRegionCache->new();
 
 
 
      $adap->{'sr_name_cache'} = $seq_region_cache->{'name_cache'};
 
      $adap->{'sr_id_cache'}   = $seq_region_cache->{'id_cache'};
 
    }
 
  }
 
 
 
}

Source: Ian Longden, EBI


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Proserver Setup Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/proserver-setup?blog=2 2009-11-02T13:38:36Z 2010-07-28T11:43:34Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Installing and Running Proserver to serve data via DAS

The Distributed Annotation System (DAS) is an elegant way of sharing data and using data from diverse sources. More information at http://www.biodas.org and on these blog pages. The Proserver is a lightweight software system to provide your data as a DAS source.

  1. Download from http://proserver.svn.sf.net/

    or

    Code

    svn co https://proserver.svn.sf.net/svnroot/proserver/trunk Bio-Das-ProServer

  2. p; move to your favorite location
  3. Build:

    Code

    cd Bio-Das-ProServer
     
    perl Build.PL
     
    ./Build
     
    ./Build test
     
    (optional:) ./Build install
  4. Run:

    Code

    eg/proserver -x -c eg/local.ini

Adjust the ini file with the source you want to serve, e.g.:

Code

[otter_das]
 
state        = on
 
adaptor      = otter_das
 
title        = Havana manual annotations
 
description  = A DAS source that provides access to the Havana annotation.
 
coordinates  = NCBI_36,Chromosome,Homo sapiens => 21:25673390,25733000
 
dsncreated   = 2008-03-11
 
maintainer   = felix@work.ac.uk
 
doc_href     = http://www.dasregistry.org/showProjectDetails.jsp?project_id=80
 
host         = otterlive
 
user         = username
 
port         = 3306
 
dbname       = loutre
 
driver       = mysql

Dependencies to re-install:

Compression libs Bundle-Compress-Zlib, Compress::Zlib, and such (http://search.cpan.org/dist/Compress-Raw-Zlib/lib/Compress/Raw/Zlib.pm) (must match each others versions to avoid errors like does not match bootstrap parameter).

Links:

Source & Full Guide

Sanger Institute pages


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Assessing Gene Predictions Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/assessing-gene-predictions?blog=2 2009-11-02T09:32:00Z 2013-07-10T09:47:58Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To compare gene predictions to a reference gene set (and similar tasks), the commonly used measures for calculating the prediction rate are specificity (precision) and sensitivity (recall) (Burset and Guigo, Genomics 34, 353-367, 1996).

 Specificity = TN / (TN + FP)

 Sensitivity = TP / (TP + FN)

with

 TP = true posisitives (correctly identified)

 FP = false positives (overpredicions)

 TN = true negatives  (correctly un-called)

 FN = false negatives (missed)

You can calculate a combined score like

  Score = Specificity x Sensitivity / 2

To assess base-coverage:

Correllation Coefficient = 
(TP x TN) - (FN x FP) ----------------------------------------- SQR( (TP + FN) x (TN + FP) x (TP + FP) x (TN + FN) )

See also this text by Roderic Guigo.

Alternatively you can use the combined F1 score:

†    F1 =  2 x Specificity x Sensitivity / Specificity + Sensitivity

Defined by van Rijsbergen in 1979, Source


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Improving Website performance Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/improving-website-performance?blog=2 2009-10-29T10:54:06Z 2012-02-16T11:40:41Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Some general notes on ways to shorten the response time of your web site.

1. Make fewer HTTP requests ñ Reducing 304 requests with Cache-Control Headers

2. Use a CDN

3. Use a customized php.ini ñ Creating and using a custom PHP.ini

4. Add an Expires header

ñ Caching with mod_expires on Apache

ñ Using .htaccess file with

Code

<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
 
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
 
</FilesMatch>
(source) or better in your httpd.conf file as described here.

5. Gzip components

ñ http://askapache.info/2.0/mod

/mod_deflate.html

ñ or with .htaccess file:

Code

<IfModule mod_gzip.c>
 
   mod_gzip_on Yes
 
   mod_gzip_dechunk Yes
 
   mod_gzip_item_include file \.(html?|txt|css|js|php|pl|jpg|png|gif|xml)$
 
   mod_gzip_item_include handler ^cgi-script$
 
   mod_gzip_item_include mime ^text/.*
 
   mod_gzip_item_include mime ^application/x-javascript.*
 
   mod_gzip_item_exclude mime ^image/.*
 
   mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
 
  </IfModule>

source

6. Put CSS at the top in head

7. Move Javascript to the bottom

8. Avoid CSS expressions, keep it simple

9. Make CSS and unobtrusive Javascript as external files not inline

10. Reduce DNS lookups ñ Use Static IP address, use a subdomain with static IP address for static content.

11. Minimize Javascript ñ Refactor the code, compress with dojo

12. Avoid external redirects ñ Use internal redirection with mod_rewrite, The correct way to redirect with 301

13. Turn off ETags ñ Prevent Caching with htaccess

14. Make AJAX cacheable and small

Source: Firebug-extension & http://www.askapache.com/web-cache/top-methods-for-faster-speedier-web-sites.html


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
awk Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/awk?blog=2 2009-09-02T14:30:16Z 2011-02-24T13:45:29Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

awk is an extremely useful unix tool for quick command-line task, in particular in combination with other commands like grep or sort.

"AWK is a data-driven programming language designed for processing text-based data, either in files or data streams. It is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions." [wikipedia]

Built-in Variables

  • ARGC

    The number of command line arguments (not including options or the awk program itself).

  • ARGV

    The array of command line arguments. The array is indexed from 0 to ARGC - 1. Dynamically changing the contents of ARGV can control the files used for data.

  • CONVFMT

    The conversion format to use when converting numbers to strings.

  • ENVIRON

    An array containing the values of the environment variables. The array is indexed by variable name, each element being the value of that variable. Thus, the environment variable HOME would be in ENVIRON["HOME"]. Its value might be `/u/close'. Changing this array does not affect the environment seen by programs which awk spawns via redirection or the system function. Some operating systems do not have environment variables. The array ENVIRON is empty when running on these systems.

  • FILENAME

    The name of the current input file. If no files are specified on the command line, the value of FILENAME is `-'.

  • FNR

    The input record number in the current input file.

  • FS

    The input field separator, a blank by default.

    using multiple alternative field separators:

    FS="\t|=|;" (nawk)

  • NF

    The number of fields in the current input record.

  • NR

    The total number of input records seen so far.

  • OFMT

    The output format for numbers for the print statement, "%.6g" by default.

  • OFS

    The output field separator, a blank by default.

  • ORS

    The output record separator, by default a newline.

  • RS

    The input record separator, by default a newline. RS is exceptional in that only the first character of its string value is used for separating records. If RS is set to the null string, then records are separated by blank lines. When RS is set to the null string, then the newline character always acts as a field separator, in addition to whatever value FS may have.

  • RSTART

    The index of the first character matched by match; 0 if no match.

  • RLENGTH

    The length of the string matched by match; -1 if no match.

  • SUBSEP

    The string used to separate multiple subscripts in array elements, by default "\034".

String functions

  • index(in, find)
  • length(string)
  • match(string, regexp)
  • split(string, array, fieldsep)
  • sprintf(format, expression1,...)
  • sub(regexp, replacement, target)
  • gsub(regexp, replacement, target)
  • substr(string, start, length)
  • tolower(string)
  • toupper(string)

source: http://people.cs.uu.nl/piet/docs/nawk/nawk_toc.html


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
The GTF Format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/the-gtf-format?blog=2 2009-07-09T11:54:00Z 2010-07-23T13:02:46Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

GTF stands for Gene transfer format. It borrows from GFF, but has additional structure that warrants a separate definition and format name. The current version is 2.2.

Structure is as GFF, so the fields are:

Code

<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes] [comments]

Attributes consist of key - value pairs, separated by one space.

Multiple attributes are separated by "; ".

The attributes list must start with gene_id and transcript_id.

Example attributes:

seq1     BLASTX  similarity   101  235 87.1 + 0  gene_id "gene-0"; transcript_id "transcript-0-1"; gene_name "Frst1"; expression 1;

More details:

http://mblab.wustl.edu/GTF22.html

http://www.bioperl.org/wiki/GTF


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
The SRF Format Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/next-gen-sequencing-data-processing?blog=2 2009-07-02T15:34:31Z 2009-12-11T14:18:05Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

SRF (Sequence Read Format) is a generic and flexible container format for sequencing and next-generation sequencing files.

Format working group: http://srf.sourceforge.net

It's the preferred format for the submission of sequencing results to archives like the European Nucleotide Archive.

How to use it:

SOLiD software to map SOLiD to SRF files.

SOLiD software to map MA (mapping) to GFF files.

An API: http://sourceforge.net/projects/srf

Also implemented within Staden package:

Fetch out basic read counts:

Code

/software/solexa/bin/srf_info -l 1 file.srf

To convert them to fastq:

Code

/software/solexa/bin/srf2fastq -c file.srf

(run without parameters/file for more options.)

Filter out reads flagged as "bad":

srf_filter -b infile.srf outfile.srf

Related blog post on Politigenomics

More info from SOLiD


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
RSYNC Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/rsync?blog=2 2009-06-15T13:17:00Z 2015-05-12T10:46:19Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Using rsync

"rsync is a software application for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate." [Wikipedia]

Fact is that it is very fast (faster than cp or scp) for file transfers and ideal for home-brew back-up solutions. There is a lot of documentation on the internet, here are some pointers that were useful for me.

Basic command to sync DIR to a different location:
rsync -r DIR ~/backup/

Basic command to sync DIR from hostA to hostB:
ssh hostA
rsync -r DIR user@hostB:~/backup/

Basic command to list files at a remote server:
rsync rsync://pub@your-ip-or-hostname/

If you require a password, the easiest way is to put it in a file and chmod 700 it.
--password-file=your_file

To fetch recursively us the recursive -r or the archive -a options:
rsync -aPv source/someDirectory .

To specify a timeout you can set these options (in seconds):
--contimeout=1000 --timeout=1000

You can conveniently specify selected files and directories you want to transfer in an include file, ignoring the rest. Example: transfer basic data from a sequencing run to a remote host:

Code

> cat include_file.txt:
Data
Data/Intensities
Data/Intensities/BaseCalls
Data/Intensities/BaseCalls/***
InterOp/***
RunInfo.xml
RunParameters.xml
RunParameters.xml
*.csv
 
> rsync -arv --include-from='~/include_file.txt' --exclude='*' RunFolder_A34MJNACXX/ ~/temp/

Use --dry-run to test the results before starting a long transfer.

Use the -vvvv option to see the full step-wise processing of all files and directories.

Note: The Sanger firewall seems to block rsync, you can use it on the (guest) wireless network though.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Majordomo Commands Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/majordomo-commands?blog=2 2009-06-11T09:13:40Z 2009-06-11T09:14:12Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

User commands for the mailing list software Majordomo

Send these commands in the email body to majordomo@your-server.com, example:

mailto: Majordomo@ebi.ac.uk

        who mac-list

help

    Majordomo replies with a list of acceptable Majordomo commands. 

subscribe listname

    Majordomo subscribes the sender to the named list. (Example) 

subscribe listname address

    Majordomo subscribes the address given to the named list. 

unsubscribe listname

    Majordomo unsubscribes the sender from the named list if the sender sent the mail from exactly the address he was subscribed to. (Example) 

unsubscribe listname address

    Majordomo unsubscribes the address from the named list. 

which

    Majordomo sends back a catalogue of the mailing lists the sender is subscribed to at the address he sent the mail from. 

which address

    Majordomo sends back a list of the mailing lists the address given is subscribed to. 

lists

    Majordomo sends back a catalogue of the mailing lists which Majordomo handles with a half-line description of each list. (Example) 

info listname

    Majordomo sends back summary information about the list. (Example) 

who listname

    Majordomo replies with a roster of the e-mail addresses that are subscribed to the named list. 

index listname

    Messages sent to all mailing lists are archived monthly unless the list owner explicitly requests no archiving. Majordomo replies with the filenames of the archived files of the named list. 

get listname filename

    Majordomo sends the archived messages for the filename requested. Files are archived under the filename listname.yymm. Example: www-l.9503 contains all messages sent to the list www-l during March 1995. 

end

    Majordomo ignores anything in an e-mail message to Majordomo which comes after the command "end". This can be useful if you have a signature or other text at the end of your message, or if you want to include more than one Majordomo command. 

Source


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Lucene Search Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/lucene-search?blog=2 2009-06-11T08:39:26Z 2011-03-15T13:20:21Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To use the Lucene search engine for querying the AnnoTrack annotation tracking system of Gencode, an XML dump must be prepared. This should be done daily to allow a regular re-indexing of the search.

We are indexing on the gene and transcript level separately. The XML can be written out with this script:

~fsk/3_scripts/gencode/lucene_dump.pl

It writes the following format:

XML

<entry id="otthumg00000159378">
 
      <name>OTTHUMG00000159378</name>
 
      <description>Description: putative novel protein
 
Genename: AP000221.2</description>
 
      <cross_references>
 
        <ref dbname="vega" dbkey="OTTHUMG00000159378" />
 
        <ref dbname="gentrack_transcript" dbkey="548587" />
 
      </cross_references>
 
      <additional_fields>
 
        <field name="transcript_count">1</field>
 
        <field name="location">21:25747378,25760913:-</field>
 
        <field name="chromosome">21</field>
 
        <field name="category">HAVANA</field>
 
      </additional_fields>
 
    </entry>
 
  </entries>

With cross-references to Gene/Transcript entries in AnnoTrack and Vega.

Characters to escape in XML:

XML

"   &quot;
 
  <   &lt;
 
  >   &gt;
 
  &   &amp;

The search can be initiated from http://www.sanger.ac.uk/search, a direct link could look like http://www.sanger.ac.uk/search?db=annotrack&t=brca1


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
FTP at the Sanger Institute Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ftp-at-the-sanger-institute?blog=2 2009-06-08T09:34:21Z 2009-06-12T16:25:16Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To updload data to the Sanger FTP server:

  • Using an ftp program, connect to ftp.sanger.ac.uk
  • Anonymous logins are possible.
  • Change into the pub/incoming directory
  • Transfer the files.
  • Disconnect
  • 20 minutes after the file has been untouched, it will be copied to the /nfs/ftp_uploads/default directory. This directory should be mounted on every Sanger machine, and so be readily accessible internally. You will need to let the person know the files have been uploaded if someone is waiting for them.
  • Files in /nfs/ftp_uploads will be automatically removed after 30 days, to conserve space.

Source: http://intweb.sanger.ac.uk/Sysman/FAQ/ftp.shtml


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Next-Gen Sequence-Submissions to the ENA Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/submit-sequences?blog=2 2009-06-03T17:23:00Z 2012-11-14T11:05:45Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

To make them available to everybody (and for paper submissions) Next-Generation Sequencing results should be submitted to the European Read Archive (ERA) - now called European Nucleotide Archive (ENA) which collaborates with the NCBI Short Read Archive (SRA) (If this is still being funded).

In the GENCODE project we submitted the RT-PCR-Seq data to the ENA using the ArrayExpress submission system.

Please note this system was about to change at the time of writing and might be different now...

Documentation (EBI)

General guidelines (NCBI)

Meta data hierarchy:


  Study

    Sample

      Experiment

        Run

  Submission

In detail:

The SRA tracks the following five objects:

Study - Identifies the sequencing study or project and contains multiple experiments.

Sample - Identifies the organism, isolate, or individual being sequenced.

Experiment - Specifies the sample, sequencing protocol, sequencing platform, and data

processing that will result one or more runs.

Run - Identifies run data files, the experiment they are contained in, and any runtime

parameters gathered from the sequencing instrument.

Analysis - Packages data associated with short read objects that are intended for

downstream usage or that otherwise needs an archival home. Examples include

assemblies, alignments, spreadsheets, QC reports, and read lists.

XLM schemata for different levels

Re-sequenced transcripts (Sanger sequ.) are submitted to the EMBL db, using the Webin interface

All the meta-data in the ERA is available here

the sample.xml file contains a single attribute for each sample e.g.

XML

<SAMPLE_ATTRIBUTES>
 
   <SAMPLE_ATTRIBUTE>
 
       <TAG>sample_origin</TAG>
 
       <VALUE>Trypanosome brucei genetic crosses between T. brucei 927 and T. b. gambiense 386</VALUE>
 
   </SAMPLE_ATTRIBUTE>
 
</SAMPLE_ATTRIBUTES>

you can put any name/value pair in a SAMPLE_ATTRIBUTE block, check the attributes column at

http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&f=sample&m=data&s=sample

to see what other people have used.

See also the Trace archive @ Sanger


An easier way to automate large parts of the process is to submit the data through ArrayExpress. This can be done through the magetab web interface.

  • Create a username/password (New submitter) and log in
  • Create a new experiment by giving it a name, selecting UHTS, and selecting those parameter from the drop-down list that are appropriate for your data. I've used Biological design:"Organism part comparison", Technology used:"Transcription profiling by high-throughput sequencing", Materials used:"Organism part" (cell tissue extractions) , Organisms used:"Homo sapiens"
  • Submitting this gives you the option to generate and download a meta data file. You can import this into Excel and fill in the information that is required (Submitter names, experiment description, information on the data sets, etc.) which is used to store the data in IDF and SDRF formats.
  • "Upload files" from here of from the Experiment list page gives you the option to select the experiment and submit the meta-data file saved as a txt file and the raw data file as a compressed file / archive.
  • You can change and re-generate the meta-data file by selecting Edit from the Experiment list page
  • Submitting this will create a ticket through which the people at Array Express can get in touch with you. I've found them to be very helpful, answering all my silly questions.
  • If the raw data is too big, upload an empty compressed file, place the data on the FTP site at ftp://ftp-private.ebi.ac.uk (user name is aexpress and password is aexpress1) and let ArrayExpress know what the name of the file is.

more info: ArrayExpress help


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Unix Process Information Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/unix-process-information?blog=2 2009-05-14T08:24:31Z 2009-12-16T18:31:18Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

To find out more about processes running on your machine you can:

  1. top
  2. top -p 6363 for specific process
  3. ps and ps -gauwxe | more
  4. look into the process's data in /proc

    eg. less /proc/5987/cmdline


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Ruby String Functions Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ruby-string-functions?blog=2 2009-04-09T12:58:00Z 2009-04-09T12:58:00Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

METHODS THAT ARE OPERATORS

Operators such as + and * work on strings (concatenate and replicate). The % operator is a short form for sprintf, and the << operator is the same as +. You can treat a character string as an array of characters too.

OTHER METHODS

To change case:

capitalize - first character to upper, rest to lower

downcase - all to lower case

swapcase - changes the case of all letters

upcase - all to upper case

To rejustify:

center - add white space padding to center string

ljust - pads string, left justified

rjust - pads string, right justified

To trim:

chop - remove last character

chomp - remove trailing line separators

squeeze - reduces successive equal characters to singles

strip - deletes leading and trailing white space

To examine:

count - return a count of matches

empty? - returns true if empty

include? - is a specified target string present in the source?

index - return the position of one string in another

length or size - return the length of a string

rindex - returns the last position of one string in another

slice - returns a partial string

To encode and alter:

crypt - password encryption

delete - delete an intersection

dump - adds extra \ characters to escape specials

hex - takes string as hex digits and returns number

next or succ - successive or next string (eg ba -> bb)

oct - take string as octal digits and returns number

replace - replace one string with another

reverse - turns the string around

slice! - DELETES a partial string and returns the part deleted

split - returns an array of partial strings exploded at separator

sum - returns a checksum of the string

to_f and to_i - return string converted to float and integer

tr - to map all occurrences of specified char(s) to other char(s)

tr_s - as tr, then squeeze out resultant duplicates

unpack - to extract from a string into an array using a template

To iterate:

each - process each character in turn

each_line - process each line in a string

each_byte - process each byte in turn

upto - iterate through successive strings (see "next" above)

source: http://www.wellho.net/solutions/ruby-string-functions-in-ruby.html


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
HTML Codes Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/html-codes?blog=2 2009-03-17T09:17:00Z 2012-04-25T11:10:59Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Here's a quick list with the HTML codes to safely display common characters on web pages.

More details are e.g. here.
See also the list of ASCII codes only.

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

&copy; © &#33; ! &#95; _ &#157;  &#219; Û
&reg; ® &#34; " &#96; ` &#158; ž &#220; Ü
&nbsp;   &#35; # &#97; a &#159; Ÿ &#221; Ý
&quot; " &#36; $ &#98; b &#160;   &#222; Þ
&amp; & &#37; % &#99; c &#161; ¡ &#223; ß
&lt; < &#38; & &#100; d &#162; ¢ &#224; à
&gt; > &#39; ' &#101; e &#163; £ &#225; á
&Agrave; À &#40; ( &#102; f &#164; ¤ &#226; â
&Aacute; Á &#41; ) &#103; g &#165; ¥ &#227; ã
&Acirc; Â &#42; * &#104; h &#166; ¦ &#228; ä
&Atilde; Ã &#43; + &#105; i &#167; § &#229; å
&Auml; Ä &#44; , &#106; j &#168; ¨ &#230; æ
&Aring; Å &#45; - &#107; k &#169; © &#231; ç
&AElig; Æ &#46; . &#108; l &#170; ª &#232; è
&Ccedil; Ç &#47; / &#109; m &#171; « &#233; é
&Egrave; È &#48; 0 &#110; n &#172; ¬ &#234; ê
&Eacute; É &#49; 1 &#111; o &#173; ­ &#235; ë
&Ecirc; Ê &#50; 2 &#112; p &#174; ® &#236; ì
&Euml; Ë &#51; 3 &#113; q &#175; ¯ &#237; í
&Igrave; Ì &#52; 4 &#114; r &#176; ° &#238; î
&Iacute; Í &#53; 5 &#115; s &#177; ± &#239; ï
&Icirc; Î &#54; 6 &#116; t &#178; ² &#240; ð
&Iuml; Ï &#55; 7 &#117; u &#179; ³ &#241; ñ
&ETH; Ð &#56; 8 &#118; v &#180; ´ &#242; ò
&Ntilde; Ñ &#57; 9 &#119; w &#181; µ &#243; ó
&Otilde; Õ &#58; : &#120; x &#182; &#244; ô
&Ouml; Ö &#59; ; &#121; y &#183; · &#245; õ
&Oslash; Ø &#60; < &#122; z &#184; ¸ &#246; ö
&Ugrave; Ù &#61; = &#123; { &#185; ¹ &#247; ÷
&Uacute; Ú &#62; > &#124; | &#186; º &#248; ø
&Ucirc; Û &#63; ? &#125; } &#187; » &#249; ù
&Uuml; Ü &#64; @ &#126; ~ &#188; ¼ &#250; ú
&Yacute; Ý &#65; A &#127; ? &#189; ½ &#251; û
&THORN; Þ &#66; B &#128; &#190; ¾ &#252 ü
&szlig; ß &#67; C &#129;  &#191; ¿ &#253; ý
&agrave; à &#68; D &#130; &#192; À &#254; þ
&aacute; á &#69; E &#131; ƒ &#193; Á &#255; ÿ
&aring; å &#70; F &#132; &#194; Â    
&aelig; æ &#71; G &#133; &#195; Ã    
&ccedil; ç &#72; H &#134; &#196; Ä    
&egrave; è &#73; I &#135; &#197; Å    
&eacute; é &#74; J &#136; ˆ &#198; Æ    
&ecirc; ê &#75; K &#137; &#199; Ç    
&euml; ë &#76; L &#138; Š &#200; È    
&igrave; ì &#77; M &#139; &#201; É    
&iacute; í &#78; N &#140; Œ &#202; ?    
&icirc; î &#79; O &#141;  &#203; Ë    
&iuml; ï &#80; P &#142; ž &#204; Ì    
&eth; ð &#81; Q &#143;  &#205; Í    
&ntilde; ñ &#82; R &#144;  &#206; Î    
&ograve; ò &#83; S &#145; &#207; Ï    
&oacute; ó &#84; T &#146; &#208; Ð    
&ocirc; ô &#85; U &#147; &#209; Ñ    
&otilde; õ &#86; V &#148; &#210; Ò    
&ouml; ö &#87; W &#149; &#211; Ó    
&oslash; ø &#88; X &#150; &#212; Ô    
&ugrave; ù &#89; Y &#151; &#213; Õ    
&uacute; ú &#90; Z &#152; ˜ &#214; Ö    
&ucirc; û &#91; [ &#153; &#215; ×    
&yacute; ý &#92; \ &#154; š &#216; Ø    
&thorn; þ &#93; ] &#155; &#217; Ù    
&yuml; ÿ &#94; ^ &#156; œ &#218; Ú    

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
mySQL Data Types admin http://blog.kokocinski.net/blogs/ http://blog.kokocinski.net/index.php/mysql-data-types-1?blog=2 2009-03-06T17:12:07Z 2011-09-26T11:20:04Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

A quick table to look up the data types used in the mySQL database management system.

Ty p e

S i z e

D e s c r i p t i o n

CHAR[Length]

Length bytes

A fixed-length field from 0 to 255 characters long.

VARCHAR(Length)

String length + 1 bytes

A fixed-length field from 0 to 255 characters long.

TINYTEXT

String length + 1 bytes

A string with a maximum length of 255 characters.

TEXT

String length + 2 bytes

A string with a maximum length of 65,535 characters.

MEDIUMTEXT

String length + 3 bytes

A string with a maximum length of 16,777,215 characters.

LONGTEXT

String length + 4 bytes

A string with a maximum length of 4,294,967,295 characters.

TINYINT[Length]

1 byte

Range of -128 to 127 or 0 to 255 unsigned.

SMALLINT[Length]

2 bytes

Range of -32,768 to 32,767 or 0 to 65535 unsigned.

MEDIUMINT[Length]

3 bytes

Range of -8,388,608 to 8,388,607 or 0 to 16,777,215

unsigned.

INT[Length]

4 bytes

Range of -2,147,483,648 to 2,147,483,647 or 0 to 4,294,967,295

unsigned.

BIGINT[Length]

8 bytes

Range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

or 0 to 18,446,744,073,709,551,615 unsigned.

FLOAT

4 bytes

A small number with a floating decimal point.

DOUBLE[Length, Decimals]

8 bytes

A large number with a floating decimal point.

DECIMAL[Length, Decimals]

Length + 1 or Length + 2 bytes

A DOUBLE stored as a string, allowing for a fixed decimal

point.

DATE

3 bytes

In the format of YYYY-MM-DD.

DATETIME

8 bytes

In the format of YYYY-MM-DD HH:MM:SS.

TIMESTAMP

4 bytes

In the format of YYYYMMDDHHMMSS; acceptable range ends

inthe year 2037.

TIME

3 bytes

In the format of HH:MM:SS

ENUM

1 or 2 bytes

Short for enumeration, which means that each column

can haveone of several possible values.

SET

1, 2, 3, 4, or 8 bytes

Like ENUM except that each column can have more than

one ofseveral possible values.

source: http://www.peachpit.com/, mySQL


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
SourceAdaptor for Proserver DAS sources admin http://blog.kokocinski.net/blogs/ http://blog.kokocinski.net/index.php/sourceadaptor-for-proserver-das-sources-1?blog=2 2009-01-28T13:12:27Z 2011-02-03T14:42:51Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

The Distributed Annotation System allows to share data across servers and applications. Some other blog entries about DAS.

To use and serve the data from a specific source (your database or flat file) using the Perl Proserver you need to define a Source-Adaptor. This will translate from your specific data format to the common DAS XML format understood by all DAS servers and clients.

Here are two example SourceAdaptors for Proserver DAS sources using the 1.53e standard (and complying to GENCODE format).

Example 1: Reading data from GFF file

Code

package Bio::Das::ProServer::SourceAdaptor::example;
 
 
 
use strict;
 
#Proserver module:
 
use base qw(Bio::Das::ProServer::SourceAdaptor);
 
#for the datestamp format:
 
use Date::Format;
 
 
 
 
 
# General initialization function
 
# Set metadata such as the commands supported by this source.
 
sub init {
 
  my ($self) = @_;
 
  $self->{'capabilities'} = { 'features' => '1.0' };
 
}
 
 
 
 
 
# General function for "features" DAS command
 
# Gather the features annotated in a given segment of sequence.
 
sub build_features {
 
  my ($self, $args) = @_;
 
 
 
  my $segment = $args->{'segment'}; # The query segment ID
 
  my $start   = $args->{'start'};   # The query start position (optional)
 
  my $end     = $args->{'end'};     # The query end position (optional)
 
 
 
  my @features = ();
 
  my %group_start;
 
  my %group_end;
 
  my %group_count;
 
 
 
  #category controlled vocabulary:
 
  #id: ECO:00000067; name: inferred from electronic annotation
 
  my $typecategory = "ECO:00000067";
 
 
 
  #read data from gff file
 
  open FH, '<', '/Users/fsk/great_data/annotation.gff'
 
    or die "Unable to open data file";
 
 
 
  while (defined (my $line = <FH>)) {
 
 
 
    chomp $line;
 
    my ($f_seg, $method, $type, $f_start, $f_end, $score, $strand, $phase, $add) = split /\t/, $line;
 
 
 
    #get extra info from last column
 
    my ($f_id, $stamp) = split(";", $add);
 
 
 
    #replace unwanted characters
 
    $f_id =~ s|\"||g;
 
    $f_seg =~ s/^chr//;
 
 
 
    #create group attributes for new set of features
 
    if($type eq "mRNA"){
 
      $group_start{$f_id} = $f_start;
 
      $group_end{$f_id}   = $f_end;
 
      $group_count{$f_id};
 
      next;
 
    }
 
 
 
    #convert datestamp from machine time format into desired format (2006-04-07T15:15:58+0100)
 
    my $modstamp = time2str("%Y-%m-%dT%H:%M:%S%z", $stamp);
 
 
 
    #index for unique feature id
 
    $group_count{$f_id}++;
 
 
 
    #get the features overlapping this genomic region only
 
    if (($f_seg eq $segment) && ($f_start <= $end && $f_end >= $start) ) {
 
 
 
      #create individual feature
 
      my $feature = {
 
         #unique id
 
         'id'           => $f_id."_".$group_count{$f_id},
 
         #genomic start
 
         'start'        => $f_start,
 
         #genomic end
 
         'end'          => $f_end,
 
         #strand: +/-/0
 
         'ori'          => $strand,
 
         #name of this method/annotation
 
         'method'       => $method,
 
         #type must be exon, intron, etc.
 
         'type'         => $type,
 
         #category type: ECO id
 
         'typecategory' => $typecategory,
 
         #phase: 0/1/2/-
 
         'phase'        => '-',
 
         #score, 0 if n.a.
 
         'score'        => 0,
 
         #note for various fields, key=value pairs
 
         'note'         => [
 
               'lastmod='.$modstamp,
 
               ],
 
         #group of features
 
         'group_id'     => $f_id,
 
         'grouptype'    => $method."_prediction",
 
         'groupnote'    => 'Note='.$group_start{$f_id}."-".$group_end{$f_id},
 
        };
 
 
 
      #store in feature array
 
      push @features, $feature;
 
    }
 
 
 
  }
 
  close FH or warn "Problem closing data file";
 
 
 
  #return entire features array
 
  return @features;
 
}
 
 
 
1;

Example 2: Connecting to a database to serve transcript features

Code

package Bio::Das::ProServer::SourceAdaptor::example2;
 
 
 
use strict;
 
#Proserver module:
 
use base qw(Bio::Das::ProServer::SourceAdaptor);
 
#for accessing mysql dbs
 
#docu eg. at http://www.perl.com/pub/a/1999/10/DBI.html
 
use DBI;
 
 
 
# General initialization function
 
# Set metadata such as the commands supported by this source.
 
sub init {
 
  my ($self) = @_;
 
  $self->{'capabilities'} = { 'features' => '1.0' };
 
}
 
 
 
 
 
# General function for "features" DAS command
 
# Gather the features annotated in a given segment of sequence.
 
sub build_features {
 
  my ($self, $args) = @_;
 
 
 
  my $config = $self->config;
 
 
 
  #the region of interest
 
  my $qchrom = $args->{'segment'}; # The query segment ID
 
  my $qstart = $args->{'start'};   # The query start position (optional)
 
  my $qend   = $args->{'end'};     # The query end position (optional)
 
 
 
  my @features = ();
 
  my %group_start;
 
  my %group_end;
 
  my %group_count;
 
 
 
  #category, controlled vocabulary:
 
  #example: id: ECO:00000067; name: inferred from electronic annotation
 
  my $typecategory = "ECO:00000067";
 
 
 
  #method used to create genes/transcripts
 
  my $method = "CAPS_analysis";
 
 
 
  #read data from mysql database
 
  #connection parameters are given in the proserver config (ini) file
 
  my $dsn = "DBI:".$config->{driver}.":".$config->{dbname}.":".
 
            $config->{host}.":".$config->{port};
 
  my $db  = DBI->connect($dsn, $config->{user}, $config->{dbpass})
 
    or die "cant connect to database ".$config->{dbname}."\n";
 
 
 
  #example query to get transcripts overlapping roi
 
  my $type  = "transcript";
 
  my $query = "SELECT transcript_name, chromosome, start, end, ".
 
              "strand, phase, gene_name, lastmod ".
 
              "FROM transcripts ORDER BY chromosome, start ".
 
        "WHERE chromosome = ? AND start >= ? AND end <= ?";
 
  my $handle = $db->prepare($query);
 
  $handle->execute($qchrom, $qend, $qstart);
 
  while (my ($name, $chromosome, $start, $end, $strand,
 
             $phase, $gene_name, $lastmod) = $handle->fetchrow_array) {
 
 
 
    #create individual feature
 
    my $feature = {
 
       #unique id
 
       'id'           => $name,
 
       #genomic start
 
       'start'        => $start,
 
       #genomic end
 
       'end'          => $end,
 
       #strand: +/-/0
 
       'ori'          => $strand,
 
       #name of this method/annotation
 
       'method'       => $method,
 
       #type must be exon, intron, etc.
 
       'type'         => $type,
 
       #category type: ECO id
 
       'typecategory' => $typecategory,
 
       #phase: 0/1/2/-
 
       'phase'        => $phase,
 
       #score, - if n.a.
 
       'score'        => '-',
 
       #note for various fields, key=value pairs
 
       'note'         => [
 
              'lastmod='.$lastmod,
 
             ],
 
      };
 
 
 
    #store in feature array
 
    push @features, $feature;
 
  }
 
 
 
  $handle->finish();
 
  $db->disconnect();
 
 
 
  #return entire features array
 
  return @features;
 
}
 
 
 
1;

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Ruby & MySQL admin http://blog.kokocinski.net/blogs/ http://blog.kokocinski.net/index.php/ruby-aamp-mysql?blog=2 2009-01-12T17:36:23Z 2009-01-12T17:36:23Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Using MySQL from Ruby

Ruby can connect to a MySQL database using the Ruby/MySQL module. There is a complete introduction into the subject can be found here:

http://www.kitebird.com/articles/ruby-mysql.html

The core functions of the modules are explained here:

http://www.tmtm.org/en/mysql/ruby/

To establish a connection:


   require "mysql"

   begin

     # connect to the MySQL server

     dbh = Mysql.real_connect("localhost", "testuser", "testpass", "test")



     # .....



     # disconnect

     dbh.close

   end

To run a basic query you can simply do the following:


   # issue a retrieval query, perform a fetch loop, print

   # the row count, and free the result set



   res = dbh.query("SELECT name, category FROM animal")



   while row = res.fetch_row do

     printf "%s, %s\n", row[0], row[1]

   end

   puts "Number of rows returned: #{res.num_rows}"



   res.free

This can also help to try out small queries from within a Rails application, eg. like this:

     @issue_count = ActiveRecord::Base.connection.execute(

         "SELECT default_count from tmp").fetch_row[0].to_i

Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Rails Optimization Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/rails-optimization?blog=2 2009-01-08T18:36:00Z 2015-01-06T11:23:56Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Find bottlenecks with NewRelic RPM

  1. Register at https://rpm.newrelic.com for the free light version of the RPM benchmarking tool.
  2. An email will be sent with a configuration file which goes into rails-application/config/.
  3. Go into rails-application/ and run
    script/plugin install http://svn.newrelic.com/rpm/agent/newrelic_rpm
    This will automatically fetch and install the plugin.
  4. Restart the server
  5. Play around with your rails app like any user would. The benchmarking results will be available almost instantly when you go back to https://rpm.newrelic.com (requires being logged in).


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Grep Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/grep?blog=2 2008-12-02T12:15:40Z 2011-02-24T13:38:14Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

grep is an other most useful unix command line text search utility. The name is taken from the first letters in global / regular expression / print.

It can be used to find occurrences of a specific string or pattern in a file or in all files in a large directory in a few seconds.

Some useful option for the unix command grep

just COUNT occurrences:

grep -c find text.txt

using NOT:

grep -v notfind text.txt

ignore the case:

grep -i UPorLOW text.txt

using OR:

egrep 'this|that' text.txt

show context:

also show 2 previous lines: grep -B2 find text.txt

also show 2 next lines: grep -A2 find text.txt


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Sanger Web-Proxy Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/sanger-web-proxy?blog=2 2008-11-20T09:36:27Z 2011-08-17T08:05:16Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

If you work at the Welcome Trust Sanger Institute and would like to browse web pages at home the same way you do at work, here is how to set it up: You need an ssh login, open a terminal window and connect like this



ssh -L3128:webcache.sanger.ac.uk:3128 YOUR-LOGIN-NAME@ssh.sanger.ac.uk

Then change the proxy settings in your web browser to point to:

localhost 3128

In Firefox this can be found at Edit / Preferences / Advanced / Network / Connection

This will create a "tunnel", forwarding the pages and other data you request through the Sanger network. You can see the intweb and journal pages, etc.

Even more conveniently, you can store the tunnel set up in the ~/ssh/config.ssh file and just connect ssh username@ssh.sanger.ac.uk.

Exmple:



Host ssh.sanger.ac.uk

LocalForward 14301 imap.sanger.ac.uk:143

LocalForward 25001 mail.sanger.ac.uk:25

LocalForward 3128 wwwcache.sanger.ac.uk:3128

LocalForward 2222 deskproXXXX.dynamic.sanger.ac.uk:22

A good ressource for more SSH productivity tips is this blog entry.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Ruby & Rails Terminology Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/ruby-aamp-rails-terminology?blog=2 2008-11-04T14:42:00Z 2009-01-09T09:08:45Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Basic Terminology from troubleshooters.com

Rails

A framework for developing web applications.

Ruby on Rails

Synonym for Rails.

Ruby

The computer language used to write Rails, and also the language you use to turn the Rails framework into an application. Ruby is a loosely typed interpreter with a full yet simple object model, and in my opinion is a very productive computer language.

Web application

A computer program that interfaces with the user through a web browser.

Framework

A ready made bunch of code and code generators to perform the majority of a software program. It is then up to the application developer to add the code that makes his application unique. Such code is typically added in many different spots throughout the framework.

MVC

Stands for Model, View and Controller. Many web application frameworks, including Ruby, partition their code into models, views and controllers. Doing so makes it easier to change and scale the program.

Model

The part of the application that interfaces to persistant data, whether that data is stored in a DBMS (MySQL, Postgres, MS SQL Server, Oracle and the like), or as a flat file on the local disk, or some other way. The persistent data is accessed and validated by code in the model.

There is typically one model for each database table, and one for each relevent flat file.

View

The part of the application that paints screens. Ideally, code in the view paints the screen but does nothing else. Lookups and calculations are done elsewhere, and the view simply sends results of those lookups and calculations to the screen, properly formatted.

There is typically one view for each type of screen, although often one view is used for several similar but slightly different screens. For instance, screens for data insert, modification and deletion are all similar enough to be accomplished with one view using flags set by the controller.

Controller

The part of the application that does what the model and view don't. Some people claim the controller contains the "business rules". I consider that a little pompous. After all, many applications are not intended to be used just for business. Also, some business rules, such as "we don't accept anyone with a credit score under 500" are typically implemented in a model as validation routines.

Every Rails application has at least one controller. There might be more, but usually not a large number. One way of splitting the work is to create a controller for each type of person using the system. For instance, there might be one controller called DataEntryPerson, another called Accountant, and a third called Administrator.

DRY

Stands for Don't Repeat Yourself. This means have each piece of information in one place. This is a basic part of the Ruby philosophy, and of course is also the philosophy behind data normalization.

AJAX

Stands for Asynchronous JavaScript And XML. This technology enables a web page to communicate with the server and update parts of itself without refreshing the whole page, thereby saving bandwidth.

Webrick

The web server that comes with Rails. You run it with this command:

script/server

It can serve only a single application on a single port, so it's more useful for development and testing than for production. Luckily, other web servers can serve Rails pages in production.

Apache

The market leader in web servers. Apache can serve Rails pages if you're willing to put in some deployment work.

InstantRails

Ruby, Gems and Rails, with production quality web server, in one bundle. Unfortunately, as of 1/18/2006 it's Windows only, but a Linux/Unix/BSD version is being worked on.

Locomotive

A production quality Rails-capable web server, which unfortunately is Mac only.

fastcgi

A system whereby CGI (Common Gateway Interface) programs stay in memory rather than being spawned as individual process when requested. This makes for much better efficiency. The lighttpd server comes with a fastcgi interface.

lighttpd

Production quality, Rails-capable, Ruby-centric web server available for Linux/Unix/BSD. Requires fastcgi. See http://wiki.rubyonrails.com/rails/pages/Lighttpd and http://www.lighttpd.net/.

RubyGems

A package manager for Ruby packages. Used to install Rails.

scaffold

An autocoded chunk of code facilitating creation ofscreens to list out a data table, and to provide create, edit and delete facilities for a data table, based on the structure of that data table, which the scaffold generator reads and uses as a specification. You can use a few scaffolds to create a quick and dirty web app to show your client.

session

A hash like structure within Rails apps to hold state between pages. It's a front end to cookies, where the state info is really held.

flash

This is NOT Macromedia flash, and is nothing like Macromedia flash!

In Rails, the term "flash" refers to a facility to pass temporary objects between actions. It's a module: ActionController::Flash. Whatever you place in flash will be exposed in the very next action, but then deleted, so you don't need to delete it manually (which is why it's better than the session for this type of thing). It's often used for error, warning and informational messages displayed on the screen after one the user has just filled out.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Gerp score Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/gerp-score?blog=2 2008-10-09T17:22:41Z 2009-01-09T09:09:19Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

GERP (Genomic Evolutionary Rate Profiling)

"GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as "Rejected Substitutions". Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element." [Sidow lab]

It was developed primarily by Greg Cooper in the lab of Arend Sidow at Stanford University (Depts of Pathology and Genetics), in close collaboration with Eric Stone (Biostatistics, NC State), and George Asimenos and Eugene Davydov in the lab of Serafim Batzoglou (Dept. of Computer Science, Stanford).

For more information, see the GERP section of the

track description for the ENCODE TBA Conservation

track in the Human May 2004 (hg17) genome browser,

and the GERP web page at the Sidow lab:

http://mendel.stanford.edu/sidowlab/downloads/gerp/index.html

and the publication:

Cooper, et. al.

"Distribution and intensity of constraint in mammalian genomic sequence"

Genome Research, July 2005

http://www.genome.org/cgi/content/abstract/15/7/901

Blog entry with more details.


Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300
]]>
Subversion Notes Felix Kokocinski https://www.gene-test.com http://blog.kokocinski.net/index.php/subversion-notes?blog=2 2008-08-06T18:07:06Z 2011-02-02T14:12:22Z Deprecated: Function create_function() is deprecated in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_core/_url.funcs.php on line 761

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Warning: preg_match(): Compilation failed: invalid range in character class at offset 119 in /www/htdocs/v076697/bioinformatics_blog/b2evolution/blogs/inc/_ext/_url_rel2abs.php on line 300

Subversion (svn) is an open-source version control system like the Concurrent Versions System (cvs).

It's home is here: http://subversion.apache.org/

These are notes from the creation/maintenance of the Gencode code repository at the Sanger institute.

Checking out a repository

svn co svn+ssh://cvs.internal.sanger.ac.uk/repos/svn/gencode

Adding an external subversion code directory to my repository:

1. Remove existing svn files recursively:

find tracking_system/ -name ".svn" -print | xargs rm -rf

2. Tell svn to ignore certain types of files:

vi ~/.subversion/config

add the line: global-ignores =