The progressive elucidation of positive protein-protein interactions (PPIs) as wet-lab techniques continue to improve in both throughput and precision has increased the number and quality of known PPIs across the spectrum of life. Creating high quality datasets of positive PPIs is critical for training PPI prediction algorithms and for assessing the performance of PPI detection efforts. We present the Positome, a web service to acquire sets of positive PPIs based on user-defined criteria pertaining to data provenance including interaction type, throughput level, and detection method selection in addition to filtration by multiple lines of evidence (i.e. PPIs reported by independent research groups). The Positome provides a tunable interface to obtain a specified subset of interacting PPIs from the BioGRlD database. Both intra- and inter-species PPIs are supported. Using a number of model organisms, we demonstrate the trade-off between data quality and quantity, and the benefit of higher data quality on PPI prediction precision and recall. A web interface and REST web service are available at http://bioinf.sce.carleton.ca/POSITOME/.

Additional Metadata
Keywords data provenance, data quality, datasets, machine learning, protein-protein interaction prediction
Persistent URL dx.doi.org/10.1109/CIBCB.2017.8058545
Conference 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2017
Citation
Dick, K. (Kevin), Dehne, F, Golshani, A, & Green, J. (2017). Positome: A method for improving protein-protein interaction quality and prediction accuracy. In 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2017. doi:10.1109/CIBCB.2017.8058545