D6.1 Agent-based simulator for synthetic data generation

Published:

Task 6.1. “Agent-based modeling and synthetic data generation” is the first task of WP6 in the RAYUELA project. Therefore, this deliverable represents the first output of this essential WP, which focuses on: 

  • Process and interpret the data gathered using the serious game developed in WP3 via Bayesian data analysis algorithms and Machine Learning techniques. 
  • Build an ad hoc model to generate synthetic data to enable fine-tuning all the developed data analysis algorithms and techniques and increase the volume of viable available data. 
  • Obtain key insights to set the foundations for evidence-based recommendations, guidelines, policies and measures in WP7. 

This deliverable creates an agent-based model based on the previous research and game development efforts carried out in WP1, 2 and 3. The primary objective of this simulator is to calibrate the algorithms and techniques that will be used in later stages of the project, as well as to generate synthetic data to increase the amount of available data in the project. Therefore, these synthetically generated data, combined with the incoming data from the pilot studies in WP5, will serve as baseline to perform the analysis described in Tasks 6.2 and 6.3. 

The deliverable is organized as follows: 

  • Section 1 introduces the context and motivation for this work within the RAYUELA project, together with the objective and summary of the content. 
  • Section 2 discusses synthetic data’s concept and its rise (and causes) in recent years. We also discuss the main types of synthetic data and introduce some key concepts to understand the work developed. 
  • Section 3 represents the early stages of simulator development, outlining design considerations and the proposed architecture. 
  • Section 4 details the implementation of each part of the simulator, explaining the proposed agent models and the approach used to take advantage of external information (e.g., expert knowledge and prevalence data). 
  • Section 5 shows some examples of datasets generated with the simulator. Moreover, we perform identifiability tests that aim to ensure the usefulness of the data generated by training Bayesian and Machine Learning models. 
  • Finally, Section 6 draws the main conclusions from the work carried out. 

The code developed in this work is open and freely available in the GitHub repository of the RAYUELA project. 

Download: