# Install all required packages
install.packages(c(
"tidyverse", # Data manipulation and visualization
"haven", # Import/export SPSS, SAS, Stata files
"writexl", # Export to Excel
"arrow", # Parquet files for big data
"lavaan", # Structural equation modeling in R
"MplusAutomation" # Interface between R and Mplus
))Software Tools
Software Integration for Data Analysis
Overview
In this class, we will focus on becoming multilingual in data science. We will learn how to seamlessly move data between different statistical software environments (R, SPSS, SAS, Mplus, and Excel). We will also compare outputs from the same statistical model (CFA) run in both Mplus and R’s lavaan package. Finally, we will discuss the use of AI tools in statistical programming and data analysis.
Required Packages
Please install the following R packages before class:
| Package | Description | Documentation |
|---|---|---|
| tidyverse | Data wrangling and visualization | tidyverse.org |
| haven | Import/export SPSS, SAS, Stata | haven.tidyverse.org |
| writexl | Export data to Excel (.xlsx) | docs.ropensci.org/writexl |
| arrow | Parquet files & big data | arrow.apache.org/docs/r |
| lavaan | Structural equation modeling | lavaan.ugent.be |
| MplusAutomation | Run Mplus from R | MplusAutomation docs |
Note: To use MplusAutomation, you must have Mplus installed on your computer.
Learning Objectives
By the end of this lesson, you will be able to:
- Simulate scale data (e.g., Likert scale) with known parameters
- Export data from R to multiple software formats
- Run a confirmatory factor analysis (CFA) in Mplus
- Run the equivalent CFA in R using lavaan
- Compare and interpret outputs from different software
- Use AI tools effectively for coding
Topics
Part I: The Multilingual Analyst
- Data Simulation: Creating psychological scale data with Likert responses (5 items) and known factor loadings
- Data Interoperability: Transforming data across formats:
- R \(\leftrightarrow\) Excel/CSV
- R \(\leftrightarrow\) SPSS (
.sav) - R \(\leftrightarrow\) SAS (
.sas7bdat) - R \(\leftrightarrow\) Parquet (
.parquet) - R \(\leftrightarrow\) Mplus (
.dat,.inp)
Part II: Comparing Mplus and lavaan
- Mplus CFA: Writing input files, running models, and extracting results with
MplusAutomation - lavaan CFA: Specifying and fitting models in R
- Output Comparison: Understanding differences in syntax, defaults, and output formats
- Parameter Recovery: Verifying that both programs recover the true simulation parameters
Part III: AI-Assisted Analysis
- AI Capabilities: Code generation, debugging, translation, and learning
- AI Limitations: Domain expertise, verification, and reproducibility concerns
- Best Practices: Verification, understanding, documentation, and iteration
- Ethics: Transparency, data privacy, and disclosure
Materials
Assignment Overview
In the assignment, you will:
- Simulate your own psychological scale data with known factor loadings
- Introduce missing data for one group (MCAR)
- Fit a CFA model using Mplus or lavaan
- Compare your estimated parameters to the true values
Readings
Required
- Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling, 25(4), 621–638.
Supplemental
- Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.
- Wickham, H., & Grolemund, G. (2016). R for Data Science. O’Reilly Media. (Chapter on Data Import/Tidy Data).
Learning to code is about more than syntax—it’s about joining a global community. Discover R user groups, conferences, and ways to get involved.