What Do I Have to Do to Prepare My Data for RMLCA?
The most challenging step in data preparation for us has been deciding on the coding of indicators and number of repeated measures we wanted to model. It can be challenging to find the right balance between precision and parsimony in terms of deciding on the nature and specific coding of the indicators. For example, we decided to use a simple, binary indicator of smoking status repeated 27 times. This meant that we lost the ability to distinguish patterns based on the amount smoked on any given day (only whether the person smoked or not) , but it also allowed us to estimate a model out to 27 days to get a broad picture of patterns of change in the first four weeks of quitting. We stopped at 27 days because we noticed a drop in observations beyond that and did not want our data matrix to become sparse at the end of the epoch of interest. We have not been able to get models with more complex indicators (e.g., number of cigarettes) to converge with so many repeats.
Class indicators need to have minimal values of 1 (not 0). Dummy coded variables will need to be recoded.
We also had to identify cases that had no relevant observations in the target period. Some missingness is acceptable and is addressed using maximum likelihood estimation, but cases must contain at least one indicator to be retained in the model.
The data need to be in a wide file format (not a long file) with the repeated indicators as separate variables rather than separate records. That is, each subject will have one record, with each variable representing a value of the outcome at each time-point.
RMLCA can be conducted in MPLUS or SAS PROC LCA and should be formatted according to the program you will be using. We ran the same models in both Mplus and SAS to ensure convergent results.