Majordomo Commands

June 11th, 2009

User commands for the mailing list software Majordomo

Send these commands in the email body to majordomo@your-server.com, example:

mailto: Majordomo@ebi.ac.uk

        who mac-list

help

    Majordomo replies with a list of acceptable Majordomo commands. 

subscribe listname

    Majordomo subscribes the sender to the named list. (Example) 

subscribe listname address

    Majordomo subscribes the address given to the named list. 

unsubscribe listname

    Majordomo unsubscribes the sender from the named list if the sender sent the mail from exactly the address he was subscribed to. (Example) 

unsubscribe listname address

    Majordomo unsubscribes the address from the named list. 

which

    Majordomo sends back a catalogue of the mailing lists the sender is subscribed to at the address he sent the mail from. 

which address

    Majordomo sends back a list of the mailing lists the address given is subscribed to. 

lists

    Majordomo sends back a catalogue of the mailing lists which Majordomo handles with a half-line description of each list. (Example) 

info listname

    Majordomo sends back summary information about the list. (Example) 

who listname

    Majordomo replies with a roster of the e-mail addresses that are subscribed to the named list. 

index listname

    Messages sent to all mailing lists are archived monthly unless the list owner explicitly requests no archiving. Majordomo replies with the filenames of the archived files of the named list. 

get listname filename

    Majordomo sends the archived messages for the filename requested. Files are archived under the filename listname.yymm. Example: www-l.9503 contains all messages sent to the list www-l during March 1995. 

end

    Majordomo ignores anything in an e-mail message to Majordomo which comes after the command "end". This can be useful if you have a signature or other text at the end of your message, or if you want to include more than one Majordomo command. 

Source

Lucene Search

June 11th, 2009

To use the Lucene search engine for querying the AnnoTrack annotation tracking system of Gencode, an XML dump must be prepared. This should be done daily to allow a regular re-indexing of the search.

We are indexing on the gene and transcript level separately. The XML can be written out with this script:

~fsk/3_scripts/gencode/lucene_dump.pl

It writes the following format:

XML

<entry id="otthumg00000159378">
 
      <name>OTTHUMG00000159378</name>
 
      <description>Description: putative novel protein
 
Genename: AP000221.2</description>
 
      <cross_references>
 
        <ref dbname="vega" dbkey="OTTHUMG00000159378" />
 
        <ref dbname="gentrack_transcript" dbkey="548587" />
 
      </cross_references>
 
      <additional_fields>
 
        <field name="transcript_count">1</field>
 
        <field name="location">21:25747378,25760913:-</field>
 
        <field name="chromosome">21</field>
 
        <field name="category">HAVANA</field>
 
      </additional_fields>
 
    </entry>
 
  </entries>

With cross-references to Gene/Transcript entries in AnnoTrack and Vega.

Characters to escape in XML:

XML

"   &quot;
 
  <   &lt;
 
  >   &gt;
 
  &   &amp;

The search can be initiated from http://www.sanger.ac.uk/search, a direct link could look like http://www.sanger.ac.uk/search?db=annotrack&t=brca1

FTP at the Sanger Institute

June 8th, 2009

To updload data to the Sanger FTP server:

  • Using an ftp program, connect to ftp.sanger.ac.uk
  • Anonymous logins are possible.
  • Change into the pub/incoming directory
  • Transfer the files.
  • Disconnect
  • 20 minutes after the file has been untouched, it will be copied to the /nfs/ftp_uploads/default directory. This directory should be mounted on every Sanger machine, and so be readily accessible internally. You will need to let the person know the files have been uploaded if someone is waiting for them.
  • Files in /nfs/ftp_uploads will be automatically removed after 30 days, to conserve space.

Source: http://intweb.sanger.ac.uk/Sysman/FAQ/ftp.shtml

Next-Gen Sequence-Submissions to the ENA

June 3rd, 2009

To make them available to everybody (and for paper submissions) Next-Generation Sequencing results should be submitted to the European Read Archive (ERA) - now called European Nucleotide Archive (ENA) which collaborates with the NCBI Short Read Archive (SRA) (If this is still being funded).

In the GENCODE project we submitted the RT-PCR-Seq data to the ENA using the ArrayExpress submission system.

Please note this system was about to change at the time of writing and might be different now...

Documentation (EBI)

General guidelines (NCBI)

Meta data hierarchy:


  Study

    Sample

      Experiment

        Run

  Submission

In detail:

The SRA tracks the following five objects:

Study - Identifies the sequencing study or project and contains multiple experiments.

Sample - Identifies the organism, isolate, or individual being sequenced.

Experiment - Specifies the sample, sequencing protocol, sequencing platform, and data

processing that will result one or more runs.

Run - Identifies run data files, the experiment they are contained in, and any runtime

parameters gathered from the sequencing instrument.

Analysis - Packages data associated with short read objects that are intended for

downstream usage or that otherwise needs an archival home. Examples include

assemblies, alignments, spreadsheets, QC reports, and read lists.

XLM schemata for different levels

Re-sequenced transcripts (Sanger sequ.) are submitted to the EMBL db, using the Webin interface

All the meta-data in the ERA is available here

the sample.xml file contains a single attribute for each sample e.g.

XML

<SAMPLE_ATTRIBUTES>
 
   <SAMPLE_ATTRIBUTE>
 
       <TAG>sample_origin</TAG>
 
       <VALUE>Trypanosome brucei genetic crosses between T. brucei 927 and T. b. gambiense 386</VALUE>
 
   </SAMPLE_ATTRIBUTE>
 
</SAMPLE_ATTRIBUTES>

you can put any name/value pair in a SAMPLE_ATTRIBUTE block, check the attributes column at

http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&f=sample&m=data&s=sample

to see what other people have used.

See also the Trace archive @ Sanger


An easier way to automate large parts of the process is to submit the data through ArrayExpress. This can be done through the magetab web interface.

  • Create a username/password (New submitter) and log in
  • Create a new experiment by giving it a name, selecting UHTS, and selecting those parameter from the drop-down list that are appropriate for your data. I've used Biological design:"Organism part comparison", Technology used:"Transcription profiling by high-throughput sequencing", Materials used:"Organism part" (cell tissue extractions) , Organisms used:"Homo sapiens"
  • Submitting this gives you the option to generate and download a meta data file. You can import this into Excel and fill in the information that is required (Submitter names, experiment description, information on the data sets, etc.) which is used to store the data in IDF and SDRF formats.
  • "Upload files" from here of from the Experiment list page gives you the option to select the experiment and submit the meta-data file saved as a txt file and the raw data file as a compressed file / archive.
  • You can change and re-generate the meta-data file by selecting Edit from the Experiment list page
  • Submitting this will create a ticket through which the people at Array Express can get in touch with you. I've found them to be very helpful, answering all my silly questions.
  • If the raw data is too big, upload an empty compressed file, place the data on the FTP site at ftp://ftp-private.ebi.ac.uk (user name is aexpress and password is aexpress1) and let ArrayExpress know what the name of the file is.

more info: ArrayExpress help

Unix Process Information

May 14th, 2009

To find out more about processes running on your machine you can:

  1. top
  2. top -p 6363 for specific process
  3. ps and ps -gauwxe | more
  4. look into the process's data in /proc

    eg. less /proc/5987/cmdline