6th International Conference on Data Mining & Knowledge Management Process (DKMP-2018)

February 17~18, 2018, Melbourne, Australia

Accepted Papers

  • Event Detection over Continuous Data Stream for the Sustainable Growth
    Janagan Sivagnanasundaram and Athula Ginige, Western Sydney University, Australia and Jeevani Goonetillake, University of Colombo, Sri Lanka
    In this real world, coordination failure is a concept that can explain the failure of human communities to coordinate and act on a specific problem properly. The failure of proper coordination among communities can lead to societal problems such as lower and over production of goods, pollution, healthcare issues, improper disaster management and poverty which will affect a country's sustainable growth in vast level. The Information and Communication Technologies (ICT) have enabled the rise of new trend "Collaborative Consumption": the peer-to-peer-based protocol for coordinating and sharing the services through online services in community has been expected to alleviate the above mentioned societal problems. The advancement in ICTs cause massive, fast-moving, streamed and heterogeneous data contributed by users of a given community in a collaborative consumption environment. In addition, most of these data describe events associated with people and their activities. An event describing a situation will be initiated by a user, followed by others in a community within a defined time frame. Extracting and detecting useful patterns of events from user contributed collaborative environment and acting based on it will be a possible solution to overcome many societal problems. In this paper, we have proposed a conceptual solution to build an event-based knowledge management service that assists the retrieval of important events from various data streams to act and make better decisions.
  • Comparison of Bankruptcy Prediction Models with Public Records and Firmographics
    Lili Zhang, Jennifer Priestley, and Xuelei Ni, Kennesaw State University, Georgia, USA
    Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to study the impacts of public records and firmographics and predict the bankruptcy in a 12-month-ahead period with using different classification models and adding values to traditionally used financial ratios. Univariate analysis shows the statistical association and significance of public records and firmographics indicators with the bankruptcy. Further, seven statistical models and machine learning methods were developed, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and Neural Network. The performance of models were evaluated and compared based on classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset. Moreover, an experiment was set up to show the importance of oversampling for rare event prediction. The result also shows that Bayesian Network is comparatively more robust than other models without oversampling.
  • Predicting Players' Performance in One Day International Cricket Matches Using Machine Learning
    Kalpdrum Passi and Niravkumar Pandey, Laurentian University, Canada
    Player selection is one the most important tasks for any sport and cricket is no exception. The performance of the players depends on various factors such as the opposition team, the venue, his current form etc. The team management, the coach and the captain select 11 players for each match from a squad of 15 to 20 players. They analyze different characteristics and the statistics of the players to select the best playing 11 for each match. Each batsman contributes by scoring maximum runs possible and each bowler contributes by taking maximum wickets and conceding minimum runs. This paper attempts to predict the performance of players as how many runs will each batsman score and how many wickets will each bowler take for both the teams. Both the problems are targeted as classification problems where number of runs and number of wickets are classified in different ranges. We used naive bayes, random forest, multiclass SVM and decision tree classifiers to generate the prediction models for both the problems. Random Forest classifier was found to be the most accurate for both the problems.