The Cell BE is a heterogeneous multi-core processor offering multiple levels of parallelism. When these are properly leveraged, the Cell BE demonstrates impressive performance acceleration for several high performance computing applications, including exact string matching on streaming data. The present study investigates the suitability of the Cell BE for a string matching problem of relevance to proteomics - the identification of tryptic digest points based on the presence of a short sequence motif. Three string matching algorithms are implemented and evaluated over several proteomic datasets. In its first application to bioinformatics, Parabix, a method of high-throughput XML stream processing which relies on bit transposition and the effective use of single-instruction multiple-data (SIMD) instructions, is applied here with great success. This method performs very well when the protein database is pre-processed in the form of parallel bit streams. Double buffering is also critical to hide the latency of DMA data transfers. Performance results are computed for both the cycle-accurate Cell BE simulator and also using real hardware. This problem is also placed in the larger context of using the Cell BE to achieve hypothesis-driven protein identification.

Additional Metadata
Keywords Parallel processing, String matching
Persistent URL
Conference 2009 Canadian Conference on Electrical and Computer Engineering, CCECE '09
Green, J, Mahmoud, H. (Hanan), & Dumontier, M. (Michel). (2009). Modeling tryptic digestion on the cell be processor. Presented at the 2009 Canadian Conference on Electrical and Computer Engineering, CCECE '09. doi:10.1109/CCECE.2009.5090220