FTP at the Sanger Institute

June 8th, 2009

To updload data to the Sanger FTP server:

  • Using an ftp program, connect to ftp.sanger.ac.uk
  • Anonymous logins are possible.
  • Change into the pub/incoming directory
  • Transfer the files.
  • Disconnect
  • 20 minutes after the file has been untouched, it will be copied to the /nfs/ftp_uploads/default directory. This directory should be mounted on every Sanger machine, and so be readily accessible internally. You will need to let the person know the files have been uploaded if someone is waiting for them.
  • Files in /nfs/ftp_uploads will be automatically removed after 30 days, to conserve space.

Source: http://intweb.sanger.ac.uk/Sysman/FAQ/ftp.shtml

Next-Gen Sequence-Submissions to the ENA

June 3rd, 2009

To make them available to everybody (and for paper submissions) Next-Generation Sequencing results should be submitted to the European Read Archive (ERA) - now called European Nucleotide Archive (ENA) which collaborates with the NCBI Short Read Archive (SRA) (If this is still being funded).

In the GENCODE project we submitted the RT-PCR-Seq data to the ENA using the ArrayExpress submission system.

Please note this system was about to change at the time of writing and might be different now...

Documentation (EBI)

General guidelines (NCBI)

Meta data hierarchy:


  Study

    Sample

      Experiment

        Run

  Submission

In detail:

The SRA tracks the following five objects:

Study - Identifies the sequencing study or project and contains multiple experiments.

Sample - Identifies the organism, isolate, or individual being sequenced.

Experiment - Specifies the sample, sequencing protocol, sequencing platform, and data

processing that will result one or more runs.

Run - Identifies run data files, the experiment they are contained in, and any runtime

parameters gathered from the sequencing instrument.

Analysis - Packages data associated with short read objects that are intended for

downstream usage or that otherwise needs an archival home. Examples include

assemblies, alignments, spreadsheets, QC reports, and read lists.

XLM schemata for different levels

Re-sequenced transcripts (Sanger sequ.) are submitted to the EMBL db, using the Webin interface

All the meta-data in the ERA is available here

the sample.xml file contains a single attribute for each sample e.g.

XML

<SAMPLE_ATTRIBUTES>
 
   <SAMPLE_ATTRIBUTE>
 
       <TAG>sample_origin</TAG>
 
       <VALUE>Trypanosome brucei genetic crosses between T. brucei 927 and T. b. gambiense 386</VALUE>
 
   </SAMPLE_ATTRIBUTE>
 
</SAMPLE_ATTRIBUTES>

you can put any name/value pair in a SAMPLE_ATTRIBUTE block, check the attributes column at

http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&f=sample&m=data&s=sample

to see what other people have used.

See also the Trace archive @ Sanger


An easier way to automate large parts of the process is to submit the data through ArrayExpress. This can be done through the magetab web interface.

  • Create a username/password (New submitter) and log in
  • Create a new experiment by giving it a name, selecting UHTS, and selecting those parameter from the drop-down list that are appropriate for your data. I've used Biological design:"Organism part comparison", Technology used:"Transcription profiling by high-throughput sequencing", Materials used:"Organism part" (cell tissue extractions) , Organisms used:"Homo sapiens"
  • Submitting this gives you the option to generate and download a meta data file. You can import this into Excel and fill in the information that is required (Submitter names, experiment description, information on the data sets, etc.) which is used to store the data in IDF and SDRF formats.
  • "Upload files" from here of from the Experiment list page gives you the option to select the experiment and submit the meta-data file saved as a txt file and the raw data file as a compressed file / archive.
  • You can change and re-generate the meta-data file by selecting Edit from the Experiment list page
  • Submitting this will create a ticket through which the people at Array Express can get in touch with you. I've found them to be very helpful, answering all my silly questions.
  • If the raw data is too big, upload an empty compressed file, place the data on the FTP site at ftp://ftp-private.ebi.ac.uk (user name is aexpress and password is aexpress1) and let ArrayExpress know what the name of the file is.

more info: ArrayExpress help

Unix Process Information

May 14th, 2009

To find out more about processes running on your machine you can:

  1. top
  2. top -p 6363 for specific process
  3. ps and ps -gauwxe | more
  4. look into the process's data in /proc

    eg. less /proc/5987/cmdline

Ruby String Functions

April 9th, 2009

METHODS THAT ARE OPERATORS

Operators such as + and * work on strings (concatenate and replicate). The % operator is a short form for sprintf, and the << operator is the same as +. You can treat a character string as an array of characters too.

OTHER METHODS

To change case:

capitalize - first character to upper, rest to lower

downcase - all to lower case

swapcase - changes the case of all letters

upcase - all to upper case

To rejustify:

center - add white space padding to center string

ljust - pads string, left justified

rjust - pads string, right justified

To trim:

chop - remove last character

chomp - remove trailing line separators

squeeze - reduces successive equal characters to singles

strip - deletes leading and trailing white space

To examine:

count - return a count of matches

empty? - returns true if empty

include? - is a specified target string present in the source?

index - return the position of one string in another

length or size - return the length of a string

rindex - returns the last position of one string in another

slice - returns a partial string

To encode and alter:

crypt - password encryption

delete - delete an intersection

dump - adds extra \ characters to escape specials

hex - takes string as hex digits and returns number

next or succ - successive or next string (eg ba -> bb)

oct - take string as octal digits and returns number

replace - replace one string with another

reverse - turns the string around

slice! - DELETES a partial string and returns the part deleted

split - returns an array of partial strings exploded at separator

sum - returns a checksum of the string

to_f and to_i - return string converted to float and integer

tr - to map all occurrences of specified char(s) to other char(s)

tr_s - as tr, then squeeze out resultant duplicates

unpack - to extract from a string into an array using a template

To iterate:

each - process each character in turn

each_line - process each line in a string

each_byte - process each byte in turn

upto - iterate through successive strings (see "next" above)

source: http://www.wellho.net/solutions/ruby-string-functions-in-ruby.html

HTML Codes

March 17th, 2009

Here's a quick list with the HTML codes to safely display common characters on web pages.

More details are e.g. here.
See also the list of ASCII codes only.

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

HTML Code

Browser View

&copy; © &#33; ! &#95; _ &#157;  &#219; Û
&reg; ® &#34; " &#96; ` &#158; ž &#220; Ü
&nbsp;   &#35; # &#97; a &#159; Ÿ &#221; Ý
&quot; " &#36; $ &#98; b &#160;   &#222; Þ
&amp; & &#37; % &#99; c &#161; ¡ &#223; ß
&lt; < &#38; & &#100; d &#162; ¢ &#224; à
&gt; > &#39; ' &#101; e &#163; £ &#225; á
&Agrave; À &#40; ( &#102; f &#164; ¤ &#226; â
&Aacute; Á &#41; ) &#103; g &#165; ¥ &#227; ã
&Acirc; Â &#42; * &#104; h &#166; ¦ &#228; ä
&Atilde; Ã &#43; + &#105; i &#167; § &#229; å
&Auml; Ä &#44; , &#106; j &#168; ¨ &#230; æ
&Aring; Å &#45; - &#107; k &#169; © &#231; ç
&AElig; Æ &#46; . &#108; l &#170; ª &#232; è
&Ccedil; Ç &#47; / &#109; m &#171; « &#233; é
&Egrave; È &#48; 0 &#110; n &#172; ¬ &#234; ê
&Eacute; É &#49; 1 &#111; o &#173; ­ &#235; ë
&Ecirc; Ê &#50; 2 &#112; p &#174; ® &#236; ì
&Euml; Ë &#51; 3 &#113; q &#175; ¯ &#237; í
&Igrave; Ì &#52; 4 &#114; r &#176; ° &#238; î
&Iacute; Í &#53; 5 &#115; s &#177; ± &#239; ï
&Icirc; Î &#54; 6 &#116; t &#178; ² &#240; ð
&Iuml; Ï &#55; 7 &#117; u &#179; ³ &#241; ñ
&ETH; Ð &#56; 8 &#118; v &#180; ´ &#242; ò
&Ntilde; Ñ &#57; 9 &#119; w &#181; µ &#243; ó
&Otilde; Õ &#58; : &#120; x &#182; &#244; ô
&Ouml; Ö &#59; ; &#121; y &#183; · &#245; õ
&Oslash; Ø &#60; < &#122; z &#184; ¸ &#246; ö
&Ugrave; Ù &#61; = &#123; { &#185; ¹ &#247; ÷
&Uacute; Ú &#62; > &#124; | &#186; º &#248; ø
&Ucirc; Û &#63; ? &#125; } &#187; » &#249; ù
&Uuml; Ü &#64; @ &#126; ~ &#188; ¼ &#250; ú
&Yacute; Ý &#65; A &#127; ? &#189; ½ &#251; û
&THORN; Þ &#66; B &#128; &#190; ¾ &#252 ü
&szlig; ß &#67; C &#129;  &#191; ¿ &#253; ý
&agrave; à &#68; D &#130; &#192; À &#254; þ
&aacute; á &#69; E &#131; ƒ &#193; Á &#255; ÿ
&aring; å &#70; F &#132; &#194; Â    
&aelig; æ &#71; G &#133; &#195; Ã    
&ccedil; ç &#72; H &#134; &#196; Ä    
&egrave; è &#73; I &#135; &#197; Å    
&eacute; é &#74; J &#136; ˆ &#198; Æ    
&ecirc; ê &#75; K &#137; &#199; Ç    
&euml; ë &#76; L &#138; Š &#200; È    
&igrave; ì &#77; M &#139; &#201; É    
&iacute; í &#78; N &#140; Œ &#202; ?    
&icirc; î &#79; O &#141;  &#203; Ë    
&iuml; ï &#80; P &#142; ž &#204; Ì    
&eth; ð &#81; Q &#143;  &#205; Í    
&ntilde; ñ &#82; R &#144;  &#206; Î    
&ograve; ò &#83; S &#145; &#207; Ï    
&oacute; ó &#84; T &#146; &#208; Ð    
&ocirc; ô &#85; U &#147; &#209; Ñ    
&otilde; õ &#86; V &#148; &#210; Ò    
&ouml; ö &#87; W &#149; &#211; Ó    
&oslash; ø &#88; X &#150; &#212; Ô    
&ugrave; ù &#89; Y &#151; &#213; Õ    
&uacute; ú &#90; Z &#152; ˜ &#214; Ö    
&ucirc; û &#91; [ &#153; &#215; ×    
&yacute; ý &#92; \ &#154; š &#216; Ø    
&thorn; þ &#93; ] &#155; &#217; Ù    
&yuml; ÿ &#94; ^ &#156; œ &#218; Ú