This section includes an overview of various university data analytics research projects I have worked on, listed in reverse chronological order.
I am associated with a project of Professor Christopher O'Shea, University of Birmingham
Hybrid
Corresponding Author (Supervisor): Professor Christopher O'Shea
Ref. available on request
Joint research work supervised by Prof. Paul King (Data Science Project)
Abstract: This research project delves into the intricacies of road traffic accidents severity in the UK, employing a potent combination of machine learning algorithms, econometric techniques, and traditional statistical methods to analyse longitudinal historical data. Our robust analysis framework includes descriptive, inferential, bivariate, and multivariate methodologies, correlation analysis: Pearson’s and Spearman's Rank Correlation Coefficient, multiple and logistic regression models, Multicollinearity Assessment, and Model Validation. In addressing heteroscedasticity or autocorrelation in error terms, we've advanced the precision and reliability of our regression analyses using the Generalized Method of Moments (GMM). Additionally, our application of the Vector Autoregressive (VAR) model and the Autoregressive Integrated Moving Average (ARIMA) models have enabled accurate time-series forecasting. With this approach, we've achieved superior predictive accuracy, marked by a Mean Absolute Scaled Error (MASE) of 0.800 and a Mean Error (ME) of -73.80 compared to a naive forecast. The project further extends its machine learning application by creating a random forest classifier model with a precision of 73 per cent, a recall of 78 per cent, and an F1-score of 73 per cent. Building on this, we employed the H2O AutoML process to optimize our model selection, resulting in an XGBoost model that exhibits exceptional predictive power, as evidenced by an RMSE of 0.1761205782994506 and MAE of 0.0874235576229789. Factor Analysis was leveraged to identify underlying variables or factors that explain the pattern of correlations within a set of observed variables. Scoring history, a tool to observe the model's performance throughout the training process, was incorporated to ensure the highest possible performance of our machine learning models. We also incorporated Explainable AI (XAI) techniques, utilizing the SHAP (Shapley Additive Explanations) model to comprehend the contributing factors to accident severity. Features such as Driver_Home_Area_Type, Longitude, Driver_IMD_Decile, Road_Type, Casualty_Home_Area_Type, and Casualty_IMD_Decile were identified as significant influencers. Our research contributes to the nuanced understanding of traffic accident severity and demonstrates the potential of advanced statistical, econometric and machine learning techniques in informing evidence-based interventions and policies for enhancing road safety.
Supervised by Prof. Jeremy Levesley (Data Analytics for E-sports Project)
Abstract: The objective of this project is to strengthen the creditworthiness of the e-sports sector by creating a predictive model to anticipate the revenue/profit from item sales. Machine learning methods are used to create a regression model for making future predictions after analysing the data from the previous few years using Python-based data-driven technology. The dataset includes variables like game, earnings, player count, tournament count, date, and merchandise profit. I analysed the data and evaluated the model's precision using statistical tests, feature correlation analysis, and visualisations. By forecasting upcoming merchandise sales and revenue, the results of the developed model can be used to improve the financial health of the e-sports industry. Thus, stakeholders can take data-driven decisions to assist the eSports sector, and increase their revenue and profitability.