In this research we have tried to identify successful and unsuccessful projects on GitHub from a sample of 5000 randomly picked projects in a number of randomly selected languages (Java, PHP, JavaScript, C#/C++, HTML). We have selected 1000 projects for each of these languages through the publicly available GitHub API, refined our dataset, and applied different machine learning algorithms to achieve our aim. We initially implemented numerous queries against the dataset and found meaningful relationships and correlations between some of the fetched attributes which have an effect on the popularity of these projects. Later we could develop an application that will determine the success or failure of a specific open source project.

Additional Metadata
Keywords Data mining, GHTorrent, Git, GitHub, Machine learning, MSR, Open source, Source forge, Truck-factor, WEKA
Persistent URL
Conference 2017 International Conference on Management Engineering, Software Engineering and Service Sciences, ICMSS 2017
Maqsood, J. (Junaid), Eshraghi, I. (Iman), & Ali, S.S. (Syed Sarmad). (2017). Success or failure identification for GitHub's open source projects. In ACM International Conference Proceeding Series (pp. 145–150). doi:10.1145/3034950.3034957