Introduction
to Factor Analysis
Statistics can be used to find unseen connections or
latent variables (called factors) between a lot of variables that can be seen.
The data is easier to understand because linked factors are put into smaller
parts that can't be seen but are representative of the data as a whole.
Mathematically, factor analysis assumes that:
Observed variables =
Linear combinations of latent factors + error terms.
There are tools like SPSS and AMOS that can help you
find secret links between a set of data that you can see. It's easier to
understand data when similar factors are grouped together into common parts.
This helps experts find trends that aren't clear at first in large collections.
Finding trends in data and making it smaller is what factor analysis is all
about. Because of this, it works really well for polls, social studies, and
psychology tests. To look at models that have already been made, most people use
SPSS for Exploratory Factor Analysis (EFA) and AMOS for Confirmatory Factor
Analysis (CFA).

Purpose of Factor Analysis
1. Data reduction
means combining a lot of factors into a smaller group of parameters that are
easier to understand without losing any important data.
2. Finding Structure: To look for the patterns or structures that lie beneath
different sets of data.
3. Making a scale: This is how to make psychometric scales or poll tools by
making sure that the groups of items are correct.
Predictor factors that are too closely linked in regression can be fixed by
taking into account multicollinearity.
This method uses numbers to help explain ideas that are hard to understand,
like fear, intelligence, or customer happiness.
Key Features
• Latent
Variables: These are not-seen parts that can be found out from variables that
can be seen. It's easier to understand information when the number of
dimensions is reduced.
• Correlative inputs: This means that the variables in the inputs are linked to
each other.
• Factor Loadings: This shows how strong the links are between factors and
variables and which way they point.
• Orthogonal and oblique rotations: these help you understand things better.

Core Applications of Factor Analysis
1. Scale Development
It is used to check if poll questions or items are
linked to certain secret structures. As an example, FA might show how certain
questions fit into groups like internal or external motivation while making a
motivation scale.
2. Construct Validation
Checks to see if poll questions or things are linked
to certain secret structures. In the case of a motivation measure, FA might
show how certain questions fit into groups, such as those about internal or
secondary motivation.
3. Latent Structure Identification
It helps find theories that haven't been tried yet but
could explain trends between variables that can be seen. You can use this tool
to do informal study or to come up with new theory theories.
4. Data Reduction
Researchers can fit a lot of variables that are linked
together into a small set of elements. This makes it simple to work with files
and keep all important data safe. When there are more than one variable, this
is a great way to group them and describe them.
Types of Factor Analysis: EFA vs. CFA
There are two primary kinds of factor analysis, which
depend on the study's purpose and how much is already known about the component
structure:
Exploratory Factor Analysis (EFA)
When there is no existing theory, Exploratory Factor
Analysis (EFA) is a data-driven method that tries to find the underlying
structure of a group of observed variables. The objective is to find out how
many hidden constructs (or factors) may explain the correlations between the
observable data.
Below is a step-by-step explanation of procedures used
in EFA:
1. Data Suitability Checks
It's important to make sure the dataset is suitable
for factor analysis before using EFA.
a. Sample Size
• At least 5–10
observations for each variable (more than 100 is best).
• Bigger samples provide you more solid and dependable factor solutions.
b. Linearity and Normality
• Assumes that there are linear
correlations between variables.
• EFA is robust enough to handle
small breaches of normality.
c. Correlation Matrix Inspection
• EFA presupposes
that the variables are at least somewhat connected.
• A correlation matrix is made to look at the connections.
2. Tests for Factorability
These tests make sure that the dataset is good for
finding useful factors.
a. Kaiser-Meyer-Olkin (KMO) Test
• Checks to see
whether the sample is big enough.
• A KMO value of more than 0.6 is OK; a value of more than 0.8 is best.
• Shows whether the patterns of correlations are close enough together for
factor extraction to be accurate.
b. Bartlett’s Test of Sphericity
• Checks to see
whether the correlation matrix is substantially different from an identity
matrix, which means that all of the variables are not correlated.
• A p-value less than 0.05 means that EFA can be done.
3. Factor Extraction Methods
This phase tells you how many factors to keep.
a. Principal Component Analysis (PCA)
• People sometimes
mix it up with EFA, although it's mostly used to cut down on data.
• It has total
variance (common + unique).
• PCA is used to compare things, but not to do real factor analysis.
b. Principal Axis Factoring (PAF)
• Only takes out
the common variation across variables.
• Good when the objective is to find hidden constructions.
c. Maximum Likelihood (ML)
• Assumes that the
data is normally distributed in more than one way.
• It Makes it possible to do statistical
tests and find confidence intervals.
4. Determining Number of Factors to Retain
There are a several ways to figure out how many
elements to extract:
a. Eigenvalues > 1 (Kaiser Criterion)
• Keep factors
that have eigenvalues larger than 1.
• Shows components that explain greater variation than just one observable
variable.
b. Scree Plot
• A graph that
shows the relationship between eigenvalues and the number of factors.
• Find the "elbow" point, when the slope levels out. Keep the
parameters that come before this point.
c. Parallel Analysis
• Compares the
eigenvalues of real data with those of data that was made up at random.
• More accurate than the rule that says eigenvalue > 1.
5. Factor Rotation
Rotation makes the result easier to understand by
making the factor loadings simpler.
a. Orthogonal Rotation (Varimax)
• Assumes that the
factors are not connected to each other (independent).
• Makes the columns of the factor loading matrix easier to read.
b. Oblique Rotation (Promax, Oblimin)
• Makes it
possible for factors to be related.
• More realistic in study on psychology and society.
• Gives you a structure matrix (factor correlations) and a pattern matrix
(loading strength).
6. Interpretation of Factor Loadings
• Factor Loadings:
These are the correlation coefficients between observable variables and latent
factors.
o Loading that is more than 0.4
or less than -0.4 is typically seen as important.
• Items are put into groups depending on how much they load.
• The names of factors are based on the conceptual meaning of the variables
that are grouped together.
7. Reliability Testing
After identifying factors:
• The internal
consistency of items loading on each factor is measured by Cronbach's Alpha.
Alpha
> 0.7 means that the results are reliable.
Purpose & Uses of EFA
Exploratory Factor Analysis (EFA) is
a method used in the beginning phases of research to find the underlying
structure among a huge number of variables. It is particularly helpful when
there is no existing theory or model and the objective is to find hidden
structures or elements that explain the correlations that have been seen.
Researchers use EFA to combine similar variables, cut down on the number of
dimensions, and create new scales or measurement models that may be used in
further studies to confirm and analyze their findings.
Key Features
• An inductive technique that is
based on data
• The number of factors is not set
ahead of time
• Factors are found using eigenvalues
and visual aids like the Scree plot
• After using
rotation methods (Varimax for orthogonal and Promax for oblique), loadings are
interpreted.
2. Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis (CFA) is a way to utilize
statistics to see whether a set of factors matches the data that has been
collected. The method includes defining the model by determining which
observable variables load onto which latent components and if the factors are
connected. After making sure the model is correct and estimating the parameters
using methods like Maximum Likelihood (ML), the model is tested using
goodness-of-fit indices like CFI, RMSEA, and SRMR.
Purpose
Its main task is to check measurement models, check
the validity of conceptions, and confirm theoretical notions like work
satisfaction, motivation, or anxiety. CFA is usually used in latter phases of
research, once a trustworthy scale has been made using procedures like EFA.
Process of Performing CFA
1. Model Specification
Define:
• the observable
variables go with the hidden factors.
• If the elements are connected to each other.
• Any error covariances that make sense in theory.
2. Model Identification
Make sure that the model can be mathematically
approximated by having:
• There are enough
data points (observations > parameters to estimate).
• Enough indicators for each element (at least three observed variables for
each factor is a good rule of thumb).
3. Model Estimation
Use estimation techniques like:
- Maximum
Likelihood (ML)
- Generalized
Least Squares (GLS)
Software options: AMOS, LISREL, Mplus, R (lavaan),
Python (semopy)
4. Model Evaluation (Goodness-of-Fit Indices)
Evaluate how well the model fits the actual data using
several fit indices:
Fit Index
|
Acceptable Value
|
Interpretation
|
Chi-Square (χ²)
|
p > 0.05
|
Low values = good fit
|
CFI (Comparative Fit Index)
|
> 0.90 (good), > 0.95 (excellent)
|
Compares model fit to null model
|
RMSEA (Root Mean Square Error of Approximation)
|
< 0.08 (good), < 0.05 (excellent)
|
Measures approximation error
|
SRMR (Standardized Root Mean Square Residual)
|
< 0.08
|
Measures difference between observed and predicted
correlations
|
5. Model
Modification
• If the fit isn't
good, utilize modification indices to find relationships that should be added
or taken away (like error covariances).
• Only make changes based on theory; don't let data-driven overfitting happen.
6. Model
Interpretation
• Factor loadings
should be at least 0.5 (0.7 is better).
• To see whether anything is important, look at the standard errors and crucial
ratios.
• Check the validity by looking at the construct reliability (CR) and the
average variance extracted (AVE).
Key Features
• An strategy
based on theory (deductive)
• The researcher sets the factor structure ahead of time
• Statistical indicators are used to check how well the model fits:
•
CFI/TLI (a score of more than 0.90 denotes a good match)
•
RMSEA (less than 0.08 means the mistake is acceptable)
•
SRMR (< 0.08 means low residuals)
•
Chi-square/df (less than 3 is usually okay)
The difference between EFA & CFA:
Aspect
|
Exploratory Factor Analysis (EFA)
|
Confirmatory Factor Analysis (CFA)
|
Purpose
|
Discover underlying structure
|
Test a predefined structure
|
Theoretical Framework
|
Not required
|
Required
|
Approach
|
Data-driven (inductive)
|
Hypothesis-driven (deductive)
|
Factor Specification
|
Factors and loadings are derived from data
|
Factors and loadings are specified in advance
|
Rotation Used?
|
Yes (to clarify loadings)
|
No (model structure is fixed)
|
Model Fit Indices
|
Not applicable
|
Required (CFI, RMSEA, SRMR, etc.)
|
Best Suited For
|
New scale development, exploring unknown structures
|
Theory testing, scale validation, measurement
invariance studies
|
Software Tools
|
SPSS, R (psych), Python
|
AMOS, LISREL, Mplus, R (lavaan)
|
Principal Component Analysis
Principle Component Analysis (PCA) is a method used in
statistics and machine learning to make enormous datasets easier to work with
by changing the original variables into a new collection of uncorrelated
variables known as principle components. The first few of these components
retain much of the variation (information) from the original dataset.
PCA is a mathematical process that finds the eigenvalues and eigen vectors of
the covariance or correlation matrix to find the directions (principal
components) in which the data changes the most. The first principal component
captures the most variation, the second catches the next most variance that is
not in the same direction as the first, and so on. PCA is a common way to
compress data, make it easier to see, and prepare it for machine learning by
improving the performance of models.
Factor Analysis vs Principal Component Analysis (PCA)
People commonly use Factor Analysis (FA) and Principal
Component Analysis (PCA) to mean the same thing, although they are not the same
thing and use distinct statistical assumptions:
Feature
|
Factor Analysis (FA)
|
Principal Component Analysis (PCA)
|
Purpose
|
Identify latent constructs
|
Summarize total variance
|
Based on
|
Shared (common) variance
|
Total variance (common + unique + error)
|
Error Assumption
|
Accounts for measurement error
|
Does not account for measurement error
|
Use Case
|
Theory-driven: construct validation
|
Data-driven: dimensionality reduction
|
Output
|
Latent variables (factors)
|
Principal components (composite scores)
|
Real-Life Example of EFA and CFA
Exploratory Factor Analysis (EFA) – Real-Life Example:
Context: A university sends out a survey with 30
questions to find out how happy students are with things like the quality of
teaching, the campus amenities, the support services, and the extracurricular
activities.
Application: The institution doesn't know how these parts are put together,
therefore they employ EFA to look at the data. The study shows that the
questions fall into four main groups: Academic Experience, Infrastructure
Satisfaction, Administrative Support, and Campus Life. The university can
better learn what makes people happy and make future polls easier with these
groups.
Confirmatory Factor Analysis (CFA) – Real-Life
Example:
Based on the EFA above, the university creates a more
detailed 20-item student satisfaction survey that is divided into the four
variables that have previously been found.
Application: CFA is used on a group of new students to see whether the
four-factor model fits the data effectively. The university checks the
structure and certifies the measuring tool's dependability using fit indices
like CFI and RMSEA. This gives them the confidence to utilize the scale for
frequent student input and benchmarking across institutions.