Current computational mass spectrometry techniques are limited by data acquisition techniques which often make sub-optimal use of mass spectrometry hardware and produce datasets which may not uniquely identify all proteins present in the biological sample. This is largely due to the offline nature of the data analysis, which is only conducted after acquisition is complete. Recently proposed online data analysis techniques which guide data acquisition, known as information-driven or directed tandem mass spectrometry (MS/MS) techniques, show promise in producing mass spectrometry datasets which uniquely identify a greater number of proteins, but these techniques have not yet been feasible due to the strict real-time requirements of mass spectrometry data acquisition. With the introduction of novel parallel programming models, such as the heterogeneous multicore Cell broadband engine (Cell B/E) architecture, information-driven MS/MS may now be possible. One of the biggest computational hurdles in creating an information-driven MS/MS system is the need to rapidly search proteomic databases for peptide fragments as they are identified by the mass spectrometer in real-time. Therefore, as a first step toward information-driven MS/MS, we have implemented a parallel string matching algorithm which is tailored to single peptide fragment searches over large proteomic databases. The Orthogonal Parabix algorithm introduced here has achieved sustained throughputs of 215.4 Gbps on a QS22 Cell blade representing a 4x speedup over leading general-purpose string matching algorithms on comparable hardware and more than 10x over an equivalent serial algorithm on a modern desktop processor. The peptide string matching algorithms developed here will form an integral part of a complete real-time information-driven MS/MS system which is expected to achieve higher-confidence protein identifications, particularly for low-abundance proteins and biomarkers.

Additional Metadata
Keywords Bioinformatics, Cell broadband engine (Cell B/E), Exact string matching, High-performance computing, Proteomics, Tandem mass spectrometry (MS/MS)
Persistent URL dx.doi.org/10.5405/jmbe.824
Journal Journal of Medical and Biological Engineering
Citation
Peace, R.J. (Robert James), Mahmoud, H.A. (Hanan Akram), & Green, J. (2011). Exact string matching for MS/MS protein identification using the cell broadband engine. Journal of Medical and Biological Engineering, 31(2), 99–104. doi:10.5405/jmbe.824