Software Tools

Software Integration for Data Analysis

Author

Francisco Cardozo, PhD

Published

March 30, 2026

Overview

In this class, we will focus on becoming multilingual in data science. We will learn how to seamlessly move data between different statistical software environments (R, SPSS, SAS, Mplus, and Excel). We will also compare outputs from the same statistical model (CFA) run in both Mplus and R’s lavaan package. Finally, we will discuss the use of AI tools in statistical programming and data analysis.

Required Packages

Please install the following R packages before class:

# Install all required packages
install.packages(c(
    "tidyverse",      # Data manipulation and visualization
    "haven",          # Import/export SPSS, SAS, Stata files
    "writexl",        # Export to Excel
    "arrow",          # Parquet files for big data
    "lavaan",         # Structural equation modeling in R
    "MplusAutomation" # Interface between R and Mplus
))
Package Description Documentation
tidyverse Data wrangling and visualization tidyverse.org
haven Import/export SPSS, SAS, Stata haven.tidyverse.org
writexl Export data to Excel (.xlsx) docs.ropensci.org/writexl
arrow Parquet files & big data arrow.apache.org/docs/r
lavaan Structural equation modeling lavaan.ugent.be
MplusAutomation Run Mplus from R MplusAutomation docs
Note

Note: To use MplusAutomation, you must have Mplus installed on your computer.

Learning Objectives

By the end of this lesson, you will be able to:

  1. Simulate scale data (e.g., Likert scale) with known parameters
  2. Export data from R to multiple software formats
  3. Run a confirmatory factor analysis (CFA) in Mplus
  4. Run the equivalent CFA in R using lavaan
  5. Compare and interpret outputs from different software
  6. Use AI tools effectively for coding

Topics

Part I: The Multilingual Analyst

  • Data Simulation: Creating psychological scale data with Likert responses (5 items) and known factor loadings
  • Data Interoperability: Transforming data across formats:
    • R \(\leftrightarrow\) Excel/CSV
    • R \(\leftrightarrow\) SPSS (.sav)
    • R \(\leftrightarrow\) SAS (.sas7bdat)
    • R \(\leftrightarrow\) Parquet (.parquet)
    • R \(\leftrightarrow\) Mplus (.dat, .inp)

Part II: Comparing Mplus and lavaan

  • Mplus CFA: Writing input files, running models, and extracting results with MplusAutomation
  • lavaan CFA: Specifying and fitting models in R
  • Output Comparison: Understanding differences in syntax, defaults, and output formats
  • Parameter Recovery: Verifying that both programs recover the true simulation parameters

Part III: AI-Assisted Analysis

  • AI Capabilities: Code generation, debugging, translation, and learning
  • AI Limitations: Domain expertise, verification, and reproducibility concerns
  • Best Practices: Verification, understanding, documentation, and iteration
  • Ethics: Transparency, data privacy, and disclosure

Materials

TipSlides

Access the class presentation slides:

View Class Slides

WarningAssignment

Complete the CFA with Missing Data assignment:

View Assignment

Assignment Overview

In the assignment, you will:

  1. Simulate your own psychological scale data with known factor loadings
  2. Introduce missing data for one group (MCAR)
  3. Fit a CFA model using Mplus or lavaan
  4. Compare your estimated parameters to the true values

Readings

Required

Supplemental

TipJoin the Open Source Community!

Learning to code is about more than syntax—it’s about joining a global community. Discover R user groups, conferences, and ways to get involved.

Explore Communities