D6.2 Automatic Feature Selection and Machine Learning Algorithm Training
Task 6.2. “Automatic Feature Selection and Machine Learning Algorithm Training” is the second task of Work Package (WP) 6 in the RAYUELA project. In this deliverable, we will use the synthetic data generated in T6.1 and the actual data collected from RAYUELA pilots for analysing and identifying hidden patterns to create risk profiles (of victim and offender) with the help of machine learning algorithms. The goal is to obtain key insights from the game adventures and the decisions taken by the players to set the foundation for evidence-based recommendations, guidelines and measures in WP7.
Specifically, as the first step the synthetic datasets were used to identify potential variables/questions that could help in identifying player risk profiles. In the next step, a feature engineering pipeline is developed for data preprocessing and serialization of the data collected from the pilots. The core idea is to use them together to further fine-tune the synthetic data generation process and have more training samples for predictive algorithms. Various machine learning algorithms were implemented on the real dataset to address the research questions proposed in Section 1.
The evaluation results indicate decision trees (ML algorithm) to be an optimal choice to profile players into victimazation or perpetrator role. Additionally, they also provide interpretations to the predictions indicating important features affecting an outcome. The conclusions indicate that the data collected from the game pilots contain valuable information and the response from a subset of game adventures are indicative towards cyberbullying.