Microsoft will host large, open Genomics data sets and is asisting with Molecular Informatics

Molecular Informatics and Next Generation Sequencing (NGS)

A fundamental big data challenge Microsoft is working on is related to Next-generation sequencing (NGS) technologies, which generate data at a rate much faster than that of the growth of compute and storage capacity. Researchers predict that the amount of genomic data will exceed that of YouTube, and genomic computations will exceed 5 million compute days per year by 2020.

By the end of 2016, Microsoft will host large, open Genomics data sets on the Microsoft Azure cloud platform for the first time at no cost.  Such datasets, including1000 Genomes, will be valuable for researchers and practitioners at universities, hospitals, and other health institutions who are making precision medicine a reality.

Diabetes care

The overall umbrella in this exciting area is “Molecular Informatics”. Here just a few insights into current developments:

Bio Model Analyzer is a sketching tool that enables users to draw out a biological system of interest (e.g. a genetic regulatory network) by dragging and dropping cells, their contents (DNA, proteins, etc.), extracellular components and relationships onto a simple canvas.

Azimuth is a machine learning system that allows researchers to more quickly and effectively use the powerful gene editing tool CRISPR. It predicts, which part of a gene to target when a scientist wants to knockout – or shut off – a gene. Machine learning enables the model to make predictions for any gene of interest, including those not seen in the experimental training data.

Project Premonition is a system that aims to detect infectious disease outbreaks before they become widespread, with the goal of preventing major health disasters. Project Premonition seeks to detect pathogens in animals before these pathogens make people sick. It does this by treating a “mosquito as a device” that can find animals and sample their blood.

Virginia Tech boosted research with high-performance cloud computing. Provide storage and processing resources to handle exponentially increasing data and accelerate speed to insight by Cloud based data analysis. “Azure is enabling us to keep up with the data deluge in the DNA sequencing space. “We’re not only analyzing data faster, but analyzing it more intelligently.” said Wu Feng, Professor of Computer Science.