Of microbes and centralization

October 11, 2008

The National Institutes of Health this week awarded more than $14 million to informatics projects intended to support the Human Microbiome Project, which aims to take a genomic tally of microbes that live in the body and on the skin.

Bioinform

It’s good to see our friendly little symbiotes get some informatics attention. A big chunk of the money goes to a data analysis and coordination center. I am getting increasingly disillusioned by such centers. It’s one think to have centers to generate data, but data analysis and coordination? I think we need to think about distributed work more, and let centers come up organically. The analogy that comes to mind is git (distributed version control) and github (organic centralized repository which makes it easy to move code around).

Your thoughts?

Reblog this post [with Zemanta]

Comments

Viewing 6 Comments

    • ^
    • v
    it's somhow close to the me epigenetics that recently was also funded by the US gov.
    The idea itself is awesome! After gene sequencing and investigation of the influence of a particular microbe - you can make new drugs for some exotic disease or a problem. E.g. for body smell, for gut disease, etc....
    • ^
    • v
    I think the problem here is that by and large analysis services need to stay next to data. If the data is large and the analysis complex enough to require significant compute - there is always going to be an argument for getting it centralised. I don't personally believe the cloud options are sufficiently advanced to allow large scale bioinformatics work. The simple fact is that it's still no fun chunking around TB of data between sites. Working on a 'half grid, half cloud' project for the last 18 months has given me a real appreciation that the data centre isn't going to be killed off just yet in the face of some emerging computing paradigms.
    • ^
    • v
    Not arguing against the data center or data generation activities, but rather the centralization of data analysis paradigms and methods. We need more people at different institutes working on this via loose federations.
    • ^
    • v
    You'll get no argument from me on the sharing of analysis methods and code. I just hope that in the UK at least it becomes as important to release these as it does the data (most research councils now mandate the data is released at the end of the project). I would love to see the same conditions imposed on the methodology..
    • ^
    • v
    I'm with Dan on this one. If I were heading up the computational analyses from this sort of project, I would want dedicated, on-site infrastructure and I'd want it in place before data started to roll out. I don't think we're anywhere close to being able to do that kind of work by aggregating tools from multiple, remote locations.

    Where the center could make a difference: making the data and software open. This could encourage contributions from interested parties and ultimately, lead to a kind of federated analysis.
    • ^
    • v
    It depends on what you're trying to achieve and what your goals are. If you are churning out data by the gazillions and want to crunch on that in near real time, that's one thing. If you want to make it accessible and enable access, we have a different discussion. And based on what I am seeing on a nearly daily basis now, many are going through that very decision making process.

    For a start take a look at what companies like Mashery and Gnip enable, at least in the API world. That's something we need to think about at the very least.

    That said, the argument in this post is not about physical resources, but about the mental resources required to come up with data analysis methods and standards. $9m just for a center for data analysis and coordination (not a computational resource per se) seems overkill as a percentage of the grant.
 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus