xargs command

April 16th, 2008

xargs is a Unix command useful for command-line processing with large numbers of arguments and setting up other commands from piped output/input. It's ideal if you are e.g. extracting specific strings from a file and want to use them as part of another command:
cut -f2 file.txt | xargs -i grep {} another_file.txt

Some of the options:

  • -i - Normally xargs places input arguments at the end of command. Used with the -i option, xargs will replace all instances of {} with input arguments. You need to put them in single brackets or use a backslash (\) before each bracket to keep the shell from interpreting the special characters.
  • -t - Echo each command before executing. Nice for debugging
  • -p - Prompts the user before executing each command. Useful for debugging.

Good introduction: http://en.wikipedia.org/wiki/Xargs
Arguments: gnu.org

Image Manipulation with ImageMagick

April 16th, 2008

ImageMagick provides a suite of commandline utilities for creating, converting, editing, and displaying images:

Display is a machine architecture independent image processing and display program. It can display an image on any workstation display running an X server.

Import reads an image from any visible window on an X server and outputs it as an image file. You can capture a single window, the entire screen, or any rectangular portion of the screen.

Montage creates a composite by combining several separate images. The images are tiled on the composite image with the name of the image optionally appearing just below the individual tile.

Convert converts an input file using one image format to an output file with a differing image format.

Mogrify transforms an image or a sequence of images. These transforms include image scaling, image rotation, color reduction, and others. The transmogrified image overwrites the original image.

Identify describes the format and characteristics of one or more image files. It will also report if an image is incomplete or corrupt.

Composite composites images to create new images.

Conjure interprets and executes scripts in the Magick Scripting Language (MSL).

Links:

convert example:

convert -compress JPEG -quality 80 input_file.jp2 output_file.jpeg

in batch:

ls *.jp2 | awk -F. '{print $1"."$2" "$1".jpeg"}' | xargs -n2 convert

Proserver features

April 13th, 2008

When serving data in the DAS protocol using the Proserver, the features command will call a build_features function of the appropriate source adapter.

It is passed one of:

 { 'segment'    => $, 'start' => $, 'end' => $ }



 { 'feature_id' => $ }



 { 'group_id'   => $ }

and is expected to return a reference to an array of hash references, i.e.

[{},{}...{}]

Each hash returned represents a single feature and should contain a

subset of the following keys and types. For scalar types (i.e. numbers

and strings) refer to the specification on biodas.org.

 start                         => $

 end                           => $

 note                          => $ or [$,$,$...]

 id       || feature_id        => $ 

 label    || feature_label     => $

 type                          => $ 

 typetxt                       => $ 

 method                        => $ 

 method_label                  => $ 

 group_id || group             => $ or [{

                                         grouplabel   => $,

                                         grouptype    => $,

                                         groupnote    => $,

                                         grouplink    => $,

                                         grouplinktxt => $,

                                         note         => $ or [$,$,$...],

                                         target       => [{

                                                            id        => $,

                                                            start     => $,

                                                            stop      => $,

                                                            targettxt => $,

                                                           }],

                                        },{}...]

 grouplabel                    => $

 grouptype                     => $

 groupnote                     => $

 grouplink                     => $

 grouplinktxt                  => $

 score                         => $

 ori                           => $

 phase                         => $

 link                          => $

 linktxt                       => $

 target                        => scalar or [{

                                              id        => $,

                                              start     => $,

                                              stop      => $,

                                              targettxt => $,

                                             },{}...]

 target_id                     => $

 target_start                  => $

 target_stop                   => $

 targettxt                     => $

 typecategory || type_category => $

 typesubparts                  => $

 typesuperparts                => $

 typereference                 => $

Note: This description is based on the DAS format 1.53, please always check for the latest version!

Some other blog entries about DAS.

Ref: SourceAdapter Docu

GFF Format

April 3rd, 2008

Information about working with the Generic (General) Feature Format

GFF is a file format used for describing genes and other features of DNA, RNA and protein sequences. It's useful for describing the localization of elements in the genome. The latest standard GFF3 also allows the definition of parent/child relationships.

Fields are:

<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes][comments]

Missing or NA values are replaced by ".".

Attributes consist of key - value pairs.

Multiple attributes are separated by "; ".

Free text values must be quoted with double quotes.

The format of the attributes is different between GFF2 and GFF3:

GFF2

Key-value pairs are separated by one space

Example line:

##gff-version 2

seq1     BLASTX  similarity   101  235 87.1 + 0  Target "HBA_HUMAN"; E_value 0.0003

GFF 2 format definition

GFF3

Key-value pairs are separated by "=".

All attributes that begin with an uppercase letter are reserved.

Example line:

##gff-version 3

seq1     BLASTX  similarity   101  235 87.1 + 0  ID=201; target="HBA_HUMAN"; e_value=0.0003

GFF 3 format definition

Perl software module: http://www.sanger.ac.uk/Software/formats/GFF

http://doc.bioperl.org/releases/bioperl-1.2/Bio/DB/GFF.html

validate_gff3 is a perl module/script and online GFF3 validator.

When run locally it allows you to validate huge files and define a specific ontology for the verification of the 3rd column terms. It requires a few additional perl modules from CPAN (FindBin::Real, File::Format, Config::General for me) and uses a local database (eg. mySQL) for temporary storage.

perl validate_gff3.pl -gff3_file example.gff \

 -config Validator-Files-2007-12-17/validate_gff3.cfg \

 -out validate.out -dbname fsk_gff3_db -dbhost dbhost1 -dbport 3306 -pass dbpass1

More infos on different formats

R programming (intro level)

March 27th, 2008

Some notes on basic programming using the statistical language/environment R

Alternatives for reading data from files

Simple flat files:

data <- scan("simple_file.txt")

data <- read.table("table.txt", header=1, sep=",")

Microsoft Excel files:

library(RODBC)

channel <- odbcConnectExcel("myexel.xls")

data <- sqlFetch(channel, "mysheet")

odbcClose(channel)

Getting data from a mySQL database

library(RMySQL)

mycon <- dbConnect(MySQL(), user='cws', dbname="cws", host="pi", password='delores')

rs <- dbSendQuery(mycon, "SELECT slide_name FROM arrays limit 5")

data1 <- fetch(rs, n = -1)

Data Conversion:

number1 <-as.numeric(string1)

Use an existing R script:

source("script.R")

Run code from scriptA.R:

functions in R...

dev.off()

q()

R --no-save < scriptA.R > scriptA.log

Getting help:

type "help", then the specific command or search-word

transpose data:

t(data1)

Sources:

http://www.onlamp.com

kickstart