#Genomic analysis and #BigData using #FPGA’s
Posted on November 17th, 2016
11/17/2016 @ Phosphous, 1140 Broadway, NY, 11th floor
Rami Mehio @Edico Genome spoke about the fast analysis of a human genome (initially did secondary analysis which is similar to telecommunications – errors in the channel) as errors come from the process due to the repeats and mistakes in the sequencer)
Genomic data doubles every 7 months historically, but the computational speed to do the analysis lags, as Moore’s law has a doubling every 18 months. With standard CPUs, mapping takes 10 to 30 hours on a 24 core server. Quality control adds several hours.
In addition, a human genome file is a 80GB Fastq file. (this is only for a rough look at the genome at 30x = # times DNA is multiplied = #times the analysis is redone.)
Using FPGAs reduced the analysis time to 20 minutes. Also the files in CRAM compression are reduced to 50GB.
The server code is in C/C++. The FPGAs are not programmed, but their connectors are specified using the VITAL or VHDL languages.
HMM and Smith-Waterman algorithms require the bulk of the processing time, so both are implemented in the FPGAs. Other challenges are to get sufficient data to feed the FPGA which means the software needs to run in parallel. Also, the FPGAs are configured so they can change the algorithm selectively to make advantage of what needs to be done at the time.