Allow others to upload data to your Amazon S3 bucket

August 14th, 2017

When processing data on the cloud for somebody else (e.g. a customer) or you want to share large amounts of data for other reasons, a good option is to directly use cloud storage for this. Besides DropBox (which requires a contract with a monthly fee for this kind of data) and other offerings, Amazon provides S3: You already get 5 GB of free S3 storage as part of the "Free Tier", for anything else you only pay for what you use. At the current price (August 2017) you pay $0.023 per GB for standard storage.Additionally you have to pay for access operations: PUT, COPY, POST, or LIST Requests $0.005 per 1,000 requests, GET and all other Requests $0.004 per 10,000 requests. Data transfers between S3 buckets or from S3 to any Amazon cloud service(s) within the same region are free, to other regions or download via the internet incurr additional prices. Please check back on the Amazon pages for curent prices and conditions.

Using the excellent Walk-through example from AWS, here is a condensed version of the set-up that worked for me. I needed to process large data files for a customer, called DataHeros Ltd. here, downloading to my machine and uploading to the cloud would have been highly inefficient. FTP or other access to the customer data was not possible.

The main steps are:

  1. Create IAM profile for the customer
    - Sign in to AWS, create a new IAM user (let's call him/her "data-heros-ltd") with password but no permissions yet. 
    - Also create a group (called "customers")

  2. Create S3 bucket with folder for the customer
    - Go to the S3 console.
    - Create a bucket "my-company-bucket" and a folder "customer-A", "customer-B", etc. within "customer-A" I create a subfolder "DataHeros-Ltd". All customers will see the initial listing, but they don't need to see the actual names of other customer. At the same time the customer is sure that the second-level folder is the correct one for his/her data.
    Let's also create a folder "other-data" for just that, where only I have access to.

  3. Set permission to allow acess to this area only
    All customers (group) need list acess to the bucket, the current customer DataHeros-Ltd needs write access to the "customer-A / DataHeros-Ltd" area.
    The permissions are set using:
    - the inline policy of the user data-heros-ltd: Policy 1 below 
    - a policy attached to the group customers: Policy 2
    - the general Bucket Policy in the Permissions tab of the bucket Policy 3, which includes your AWS account ID. You can find this on top of the billing console.
  4. From the console you can now customize the link to give to customers along with their username and password. They should be able to upload data to their folderusing the web interface which I can then process.

 

Policy 1:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowListBucketIfSpecificPrefixIsIncludedInRequest",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::my-company-bucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "customer-A/*"
                    ]
                }
            }
        },
        {
            "Sid": "AllowUserToReadWriteObjectDataInDevelopmentFolder",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::my-company-bucket/customer-A/*"
            ]
        }
    ]
}

 

Policy 2:

{
  "Version": "2012-10-17",                 
  "Statement": [
    {
      "Sid": "AllowGroupToSeeBucketListAndAlsoAllowGetBucketLocationRequiredForListBucket",
      "Action": ["s3:ListAllMyBuckets", "s3:GetBucketLocation"],
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::*"]
    },
    {
      "Sid": "AllowRootLevelListingOfCompanyBucket",
      "Action": ["s3:ListBucket"],
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::my-company-bucket"],
      "Condition":{
          "StringEquals":{"s3:prefix":[""]}
       }
    },
    {
      "Sid": "RequireFolderStyleList",
      "Action": ["s3:ListBucket"],
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::*"],
      "Condition":{
          "StringNotEquals":{"s3:delimiter":"/"}
       }
     },
    {
      "Sid": "ExplictDenyAccessToPrivateFolderToEveryoneInTheGroup",
      "Action": ["s3:*"],
      "Effect": "Deny",
      "Resource":["arn:aws:s3:::my-company-bucket/other-data/*"]
    },
    {
      "Sid": "DenyListBucketOnPrivateFolder",
      "Action": ["s3:ListBucket"],
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::*"],
      "Condition":{
          "StringLike":{"s3:prefix":["other-data/"]}
       }
    }
  ]
}

Policy 3:

{
    "Version": "2012-10-17",
    "Id": "Policy1502460168202",
    "Statement": [
        {
            "Sid": "Stmt1502460165827",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::MY-AWS-ID:root"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::my-company-bucket/*"
        }
    ]
}

 

 

GAL file format

July 20th, 2017

GenePix Array List (GAL) files are text files with specific information about the location, size, and name of each DNA spot on a microarray. They are therefor of vital importance for the analysis of scanned microarray images. The format defines a specific header before the list of data columns follows:

Example:

ATF	1			

9	5			

Type=GenePix ArrayList V1.0				

BlockCount=1				

BlockType=0				

"Block1=10000, 38780, 150, 20, 200, 18, 200"				

Supplier=BioRobotics				

ArrayerSoftwareName=TAS Application Suite (MicroGrid II)				

ArrayerSoftwareVersion=2.7.1.18					

ScanResolution=10	

Block	Column	Row	ID	Name

1	1	1	RP11-163J21	Clone 1

1	1	2	RP11-163J21	Clone 2

Explanantions:

ATF -> File conforms to Axon Text File
1 -> Version number of ATF
9 -> Number of header lines before the "Block, Column, Row, ..." line
5 -> Number of data columns (Block, Column, Row, Name, ID)
Type=GenePix ArrayList V1.0 -> Type of file, same for all GAL files
Block Count=1 -> Number of blocks described in the file
Block Type=0 -> Type of block, 0 = rectangular Block
X=A, B, C, D, E, F, G -> The position and dimensions of each block.
A -> xOrigin
B -> yOrigin
C -> Feature diameter
D -> xFeatures
E -> xSpacing
F -> yFeatures
G -> ySpacing ScanResolution - Optional parameter to scale the position on higher-resolution images Block arrangement

1	2	3	4

5	6	7	8

9	10	11	12

13	14	15	16

The data columns are:

  • Block
  • Column
  • Row
  • Name
  • ID

Further reading and sources:

aCGH array QC measures

July 20th, 2017

The within-array quality for (genomic) microarrays is often measured using the following metrics:

  1. Standard Deviation Autosome / Robust (SD autosome) Measure of the dispersion of Log2 ratio of all clones on the array, giving an overall picture of the noise in the array. It is calculated on the normalised but unsmoothed data. The SD robust is the middle 58%/66% of the data. By excluding outliers large changes such as trisomies will not cause this number to change significantly. (The SD robust is the number we use when we say “3 SDs away from the noise” in the calling algorithm.) Both measures are given after all data processing but excluding any smoothing. For BlueFuse Multi processed data the values should be 0.07-0.15 and 0.05-0.11 for the autosome and robust measure respectively.
  2. Signal to Background Ratio (SBR) Brightness of the mean signal (after the background has been subtracted) divided by the raw background signal (global signal).
  3. Derivative Log2 Ratio / Fused (DLR) measure of the probe to probe variability. In an ideal world, probes within a region will have essentially the same ratio. In a noisy array adjacent probes can have a very large ratio difference. The DLR raw is before any data processing, DLR fused is after normalization and data correction BUT is always done on unsmoothed data so it is user setting independent and a cannot be adjusted by the user thereby giving a consistent array-to-array measure of noise. BlueFuse results should be < 0.2.
  4. % included clones Percentage of all clones that were not excluded on a BAC array due to inconsistencies between clone replicates. For BlueFuse results this should be > 95 %.
  5. Mean Spot Amplitude the mean fluorescent signal intensities for the two channels; channel 1 = sample (standardly Cy3; ex 550nm, emm 570nm) and channel 2 = reference (standardly Cy5; ex 650nm, emm 670nm). This metric is variable due to the differences between available scanners. The mean spot amplitude metric can give an indication of how well the DNA has labelled with fluorescent dyes, but more importantly, really high values can indicate over scanning of the microarray image OR can indicate poor washing so there is lots of non-specific signal left. The balance between channels can be assessed but the Cy5 signal tends to give a higher intensity than Cy3, major differences in the channels may indicate a labelling or a scanner problem.

Source: BlueGnome user docs

See also: Microarray Scanners and PGS consulting in the UK & Ireland

Vaccination of newborns

April 28th, 2017

Most of us take vaccinations for granten and rely on them from our very first days. The whooping cough as an example can be deadly, especially for young babies who are too young to be protected by their own vaccination. Since 2010, the Centers for Disease Control and Prevention (CDC) has recorded between 10,000 and 50,000 cases each year in the United States and up to 20 babies dying. One recent study showed that many whooping cough deaths among babies could be prevented if all babies received the first dose of vaccination on time at 2 months old, when they are old enough to get vaccinated (CDC). Still, some parents believe they know better and risk their childrens life by not vaccinating them at all. 

 

For the US the CDC recommends vaccination of newborns / babies against the following diseases:

For Germany the situation is almost the same and the following vacciantions are recommended for babies under 2 years:

  • Hib H. influenzae Typ b
  • Diphtherie
  • Hepatitis B
  • Masern
  • Mumps
  • Pertussis (Keuchhusten)
  • Pneumokokken
  • Poliomyelitis (Kinderlaehmung)
  • Röteln
  • Tetanus
  • Rotaviren
  • Varizellen (Windpocken)
  • Meningokokken C

Sources: CDC, Robert-Koch-Institut

Genetic Conditions Screened in Newborns

April 13th, 2017
Genetic Conditions Screened in Newborns

As part of the health assessment of newborn babys, a test for common genetic conditions is done by drawing a few drops of blood from the heel of the baby and sending this off for analysis. Any positive results will then be followed up by confirmatory test and a treatment can be initiated if required. The conditions are mostly life-threatening or disabeling for the child if undiagnosed or left untreated.

Below is a list of conditions that are screened as part of the current standard panel of core conditions and secondary conditions in the US-american health system. Secondary conditions are results that will be additionally (unintentinally) revealed when testing for the core conditions. If desired there are even more options for testing (supplemental screening). What test are offered or paid for depends on the state and the insurance. This information is taken from babysfirsttest.org.

 

1. Metabolic Disorders

ORGANIC ACID CONDITIONS

  • 2-Methyl-3-Hydroxybutyric Acidemia (2M3HBA)
  • 2-Methylbutyrylglycinuria (2MBG)
  • 3-Hydroxy-3-Methylglutaric Aciduria (HMG) *
  • 3-Methylcrotonyl-CoA Carboxylase Deficiency (3-MCC) *
  • 3-Methylglutaconic Aciduria (3MGA)
  • Beta-Ketothiolase Deficiency (BKT) *
  • Ethylmalonic Encephalopathy (EME)
  • Glutaric Acidemia, Type I (GA-1) *
  • Holocarboxylase Synthetase Deficiency (MCD)
  • Isobutyrylglycinuria (IBG)
  • Isovaleric Acidemia (IVA) *
  • Malonic Acidemia (MAL)
  • Methylmalonic Acidemia (Cobalamin Disorders) (Cbl A,B) *
  • Methylmalonic Acidemia (Methymalonyl-CoA Mutase Deficiency) (MUT) *
  • Methylmalonic Acidemia with Homocystinuria (Cbl C, D, F)
  • Propionic Acidemia (PROP) *

FATTY ACID OXIDATION DISORDERS

  • 2,4 Dienoyl-CoA Reductase Deficiency (DE RED)
  • Carnitine Acylcarnitine Translocase Deficiency (CACT)
  • Carnitine Palmitoyltransferase I Deficiency (CPT-IA)
  • Carnitine Palmitoyltransferase Type II Deficiency (CPT-II)
  • Carnitine Uptake Defect (CUD) *
  • Glutaric Acidemia, Type II (GA-2)
  • Long-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (LCHAD) *
  • Medium-Chain Acyl-CoA Dehydrogenase Deficiency (MCAD) *
  • Medium-Chain Ketoacyl-CoA Thiolase Deficiency (MCAT)
  • Medium/Short-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (M/SCHAD)
  • Short-Chain Acyl-CoA Dehydrogenase Deficiency (SCAD)
  • Trifunctional Protein Deficiency (TFP) *
  • Very Long-Chain Acyl-CoA Dehydrogenase Deficiency (VLCAD) *

AMINO ACID DISORDERS

  • Argininemia (ARG)
  • Argininosuccinic Aciduria (ASA) *
  • Benign Hyperphenylalaninemia (H-PHE)
  • Biopterin Defect in Cofactor Biosynthesis (BIOPT-BS)
  • Biopterin Defect in Cofactor Regeneration (BIOPT-REG)
  • Carbamoyl Phosphate Synthetase I Deficiency (CPS)
  • Citrullinemia, Type I (CIT) *
  • Citrullinemia, Type II (CIT II)
  • Classic Phenylketonuria (PKU) *
  • Homocystinuria (HCY) *
  • Hypermethioninemia (MET)
  • Hyperornithine with Gyrate Deficiency (Hyper ORN)
  • Maple Syrup Urine Disease (MSUD) *
  • Nonketotic Hyperglycinemia (NKH)
  • Ornithine Transcarbamylase Deficiency (OTC)
  • Prolinemia (PRO)
  • Tyrosinemia, Type I (TYR I) *
  • Tyrosinemia, Type II (TYR II)
  • Tyrosinemia, Type III (TYR III)

 

2. Endocrine Disorders

  • Congenital Adrenal Hyperplasia (CAH) *
  • Primary Congenital Hypothyroidism (CH) *

 

3. Hemoglobin Disorders

  • Glucose-6-Phosphate Dehydrogenase Deficiency (G6PD)
  • Hemoglobinopathies (Var Hb)
  • S, Beta-Thalassemia (Hb S/ßTh) *
  • S, C Disease (Hb S/C) *
  • Sickle Cell Anemia (Hb SS) *

 

4. Other Disorders

  • Adrenoleukodys-trophy (ALD)
  • Biotinidase Deficiency (BIOT) *
  • Classic Galactosemia (GALT) *
  • Congenital Toxoplasmosis (TOXO)
  • Critical Congenital Heart Disease (CCHD) *
  • Cystic Fibrosis (CF) *
  • Formiminoglutamic Acidemia (FIGLU)
  • Galactoepimerase Deficiency (GALE)
  • Galactokinase Deficiency (GALK)
  • Hearing loss (HEAR)
  • Human Immunodeficiency Virus (HIV)
  • Hyperornithinemia-Hyperammonemia-Homocitrullinuria Syndrome (HHH)
  • Pyroglutamic Acidemia (5-OXO)
  • Severe Combined Immunodeficiency (SCID) *
  • T-cell Related Lymphocyte Deficiencies

 

5. Lysosomal Storage Disorders

  • Fabry (FABRY)
  • Gaucher (GBA)
  • Krabbe
  • Mucopolysaccharidosis Type-I (MPS I)
  • Mucopolysaccharidosis Type-II (MPS II)
  • Niemann-Pick Disease (NPD)
  • Pompe (POMPE)

 

See more at: www.babysfirsttest.org