JULY CECAN WORKSHOP: COMPLEX-IT and the SACS TOOLKIT: A Case-Based Computational Modeling Platform for Data Mining Complex Issues in Policy and Evaluation

 CECAN Training Workshop




COMPLEX-IT and the SACS TOOLKIT: A Case-Based Computational Modeling Platform for Data Mining Complex Issues in Policy and Evaluation

When: – Friday 7th July 2017 (1 day)

Location: – University of Surrey, Guildford, UK

Purpose: The complex socio-technical arenas (nexus issues) that government seeks to improve (e.g., health, food, water, safety, infrastructure) are not driven by a single factor or consequence.  Instead, they are driven by multiple factors at multiple levels, which lead to different trends or outcomes for different areas/groups of people.
The challenge is how to model such diversity and complexity?  The complexity sciences, data mining and big-data offer some useful solutions.  The challenge, however, is stitching these methodological solutions together into a user-friendly platform and APP, which policy makers, social scientists, evaluation commissioners and civil servants can use – hence our creation of COMPLEX-IT and the SACS TOOLKIT.

Intended audience:  This workshop is for anyone involved in evaluating the impact of policy (and its improvement) on complex nexus issues and would like to explore new software and mixed-methods options for doing so.

Level of prior knowledge of subject required:  For policy makers and evaluation commissioners, it is helpful to have a basic sense of statistics and an interest in data mining and the complexity sciences.  For researchers and methodologists, it is helpful to have an understanding of the latest developments in interdisciplinary mixed-methods, computational modeling and data-mining big-data.
Participants are strongly encouraged to bring to the workshop a policy issue or research concern (e.g., modeling multiple trajectories across time, dealing with large numbers of variables, etc) that they would like to use COMPLEX-IT and the SACSTOOLKIT to explore.

At the end of this course, participants will:

GOAL 1: Understand the theory behind case-based computational modeling, including
  • Having a basic sense of the principles guiding case-based complexity.
  • Understanding the philosophy behind data mining and computational modeling.
  • Developing a working knowledge of COMPLEX-IT APP and SACS TOOLKIT.
GOAL 2: Learn how to apply case-based computational modeling to their nexus topic, including how to:
  • Build a complex systems model of their nexus issue.
  • Explore how policy impacts different groups or areas across time/space.
  • Use this information to create your study’s case-based profile.
  • Identify major and minor case-based clusters and key causal factors.
  • Identify major and minor cluster trends (for longitudinal data).
  • Identify key global-temporal dynamics, such as spiraling sources and saddle points.
  • Use network analysis (where appropriate) to explore cluster links and structure.
  • Examine how different clusters and trends lead to different outcomes.
  • Run simulations to explore how policy can change outcomes.
  • Compare resulting model to original theoretical formulation.
GOAL 3: Learn how to use the COMPLEX-IT APP, including how to:
  • Download and install the software.
  • Run the software, including the R Studio environment in which it works.
  • Upload the case study database.
  • Identify key variables for case-based profiles.
  • Explore how to deal with missing data and errors in variable choice.
  • Use k-means cluster analysis to identify initial clusters, including how to identify. optimal solutions and run k-means for trend data.
  • Use the SOM neural net to corroborate clusters and identify possible sub-clusters.
  • Use SOM and k-means to identify underlying causal model.
GitHub - Cschimpf/Complex-It: Complex-It Development

Click here to download R Studio

RNetLogo - ...and two worlds are yours
R Marries NetLogo: Introduction to the RNetLogo Package | Thiele | Journal of Statistical Software
GitHub - NetLogo/Mathematica-Link: allows Mathematica to control NetLogo (and not vice versa)
Georg-August-Universität Göttingen - Agent-based/individual-based simulation tools
CRAN - Package RNetLogo
CRAN - Package gafit
CRAN - Package GA
How to load the {rJava} package after the error "JAVA_HOME cannot be determined from the Registry" | R-statistics blog
Agent Based Models and RNetLogo | R-bloggers
Facilitating Parameter Estimation and Sensitivity Analysis of Agent-Based Models
NetLogo User Community Models: SegregationExtended
Blog overview of SOMbrero R Package
Stochastic Gradient Descent
Wikipedia Article on Stochastic Gradient Descent
The art of running the SOM and choosing the map size
Discussion about setting the random seed for reproducible results
This is the main one we used for our Workshop.

Welsh Multiple Index of Deprivation

Another data website source.

Full dataset, which is a very useful case study for exploring how to use the COMPLEX-IT App
(The entire dataset is shown for all authority areas, across multiple indicators)

This provides multiple databases for multiple levels of analysis

This goes with the above, as it is their annual data release website for IMD


LSOA MAPS for Wales (Lower Super Output Areas, roughly N=1,500 people each) for a total of about 1,896 LOSAs in Wales http://gov.wales/docs/statistics/lsoamaps/lsoa.htm

NOTE: Area maps for each LSOA in Wales are available from the links below. LSOA is the geographic unit used in the Welsh Index of Multiple Deprivation (WIMD). LSOAs are built from groups of Output Areas (OAs) used for the 2001 Census. There are 1,896 LSOAs in Wales each with a population of about 1,500 people. Because the size and boundaries of LSOAs have not changed since they were created in 2004, the same areas are analysed in the three recent WIMD updates (WIMD 2005, WIMD 2008 and WIMD 2008: Child Index). The maps can be used alongside each of the three updates to identify the area covered by each LSOA.

Brian Castellani, Ph.D. is Professor of Sociology and Lead of the Complexity in Health and Infrastructure Group at Kent State University, as well as Adjunct Professor of Psychiatry, Northeast Ohio Medical University and co-editor of the Complexity in Social Science series, Routledge.  Trained as a sociologist, clinical psychologist and methodologist, Brian has spent the past ten years developing a new case-based data mining approach to modeling complex social systems, which he and his colleagues have used to help practitioners and policy makers address and improve complex public health issues such as community wellbeing, stress and coping (allostatic load), comorbid depression in primary care, addiction, medical education and grid reliability. Recently, Brian received a systems science scholarship from the Robert Wood Johnson Foundation to present at the 2016 AcademyHealth Conference – the leading organization in the States for health services researchers, policymakers, and health care practitioners and stakeholders. For more information, including publications on case-based complexity, see Brian’s website at www.personal.kent.edu/~bcastel3/

Corey Schimpf, Ph.D. is a Learning Analytics Scientist at the Concord Consortium, a not-for-profit company that develops curriculum and software for K-12 science, technology, engineering and math learning, just outside of Boston.  He received a Ph.D. in Engineering Education and a M.A. in Sociology from Purdue University and has several years of programming and software development experience. One avenue of Corey’s work focuses on the development and analysis of learning analytics that model students’ cognitive states or strategies from fine-grained computer-logged data from students participating in open-ended technology-centered science and engineering projects. I n another avenue of Corey’s work, he has been the lead or team member developing software to assist researchers dealing with complex, high dimensional problems and data-sets, such as an interface and infrastructure to integrate several methodological tools or a multi-purpose data processing tools for high volume data with limited structure.