Built upon the blogs about text clustering and text model, we are ready to dive into a classic topic - Sentiment Analysis. Sentiment analysis computationally identifies and categorizes emotions in text, helping understand public opinions or consumer feedback.
What is Sentiment Analysis?
Sentiment analysis determines the emotional tone behind text, classifying it as positive, negative, or neutral. It’s widely used for social media monitoring and understanding consumer needs.
Why Use Sentiment Analysis?
- Public Opinion: Gauge sentiment on topics or brands.
- Consumer Insights: Quickly identify customer reactions (e.g., Expedia Canada’s commercial case).
Challenges
Human language is complex, and machines struggle with nuances like sarcasm (e.g., “Verrryyyyyy goooodddd!!” may be misread as positive). Algorithms are evolving to handle such cases but aren’t 100% accurate.
Sentiment Analysis Process
- Text Preprocessing:
- Tokenization: Split text into words or phrases.
- Stop Word Filtering: Remove common words (e.g., “and,” “the”).
- Negation Handling: Address negations (e.g., “not good” vs. “not not good”).
- Stemming: Reduce words to their root form (e.g., “running” to “run”).
- Sentiment Classification: Assign polarity (positive/negative) using lexicons or algorithms.
- Sentiment Scoring: Quantify sentiment strength, considering factors like capitalization (e.g., “GOOD” indicates stronger emotion).
Example Data
| Text | Sentiment | |——|———-| | Loves the German bakeries in Sydney… | Positive | | @VivaLaLauren Mine is broken too!… | Negative | | @Mofette briliant! May the fourth be with you… | Positive |
Sentiment Analysis in R
Using R’s tm
, syuzhet
, and other packages, we can preprocess text and analyze sentiment.
Preprocessing and Word Cloud
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)
# Sample text data for sentiment analysis
text <- c(
"I absolutely love this product! It has exceeded all my expectations and works perfectly.",
"The service was terrible. I've never been more disappointed with a purchase in my life.",
"This is just okay. Not great, but not bad either. It gets the job done I suppose.",
"The customer support team was incredibly helpful and resolved my issue within minutes. Amazing!",
"The quality is quite poor for the price. I expected much better based on the reviews.",
"This is hands down the best thing I've bought this year. Worth every penny!",
"I'm really frustrated with the shipping delay. The product is good but the wait was unacceptable.",
"The instructions were unclear, but once I figured it out, the product worked as described.",
"I wouldn't recommend this to anyone. Complete waste of money and time.",
"The design is beautiful and it's very easy to use. I'm extremely satisfied with my purchase!"
)
docs <- Corpus(VectorSource(text))
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, stemDocument)
# Create Term-Document Matrix
dtm <- TermDocumentMatrix(docs)
dtm_m <- as.matrix(dtm)
dtm_v <- sort(rowSums(dtm_m), decreasing=TRUE)
dtm_d <- data.frame(word = names(dtm_v), freq=dtm_v)
# Generate word cloud with adjusted parameters
# Set up plot margins (bottom, left, top, right)
par(mar = c(0, 0, 0, 0)) # Remove all margins
# Create a new plot with larger dimensions
png("wordcloud.png", width = 10, height = 8, units = "in", res = 300) # High resolution
set.seed(1234)
wordcloud(
words = dtm_d$word,
freq = dtm_d$freq,
min.freq = 1,
max.words = 50,
random.order = FALSE,
rot.per = 0, # No rotation
scale = c(4, 0.8), # Scale between largest and smallest words
colors = brewer.pal(8, "Dark2"),
vfont = c("sans serif", "plain"),
use.r.layout = TRUE # Better layout algorithm
)
dev.off() # Close the device
# Display the saved image
if (requireNamespace("png", quietly = TRUE) && requireNamespace("grid", quietly = TRUE)) {
library(png)
library(grid)
if (file.exists("wordcloud.png")) {
img <- png::readPNG("wordcloud.png")
grid::grid.raster(img)
} else {
warning("Word cloud image not found. Please check the file path.")
}
} else {
warning("Please install 'png' and 'grid' packages to display the word cloud.")
}
This code preprocesses text, removes noise, and visualizes frequent words in a word cloud.
Sentiment Scoring
Using syuzhet
for sentiment analysis with different lexicons:
library(syuzhet)
library(ggplot2)
# Sentiment scoring with multiple methods
syuzhet_vector <- get_sentiment(text, method="syuzhet")
bing_vector <- get_sentiment(text, method="bing")
afinn_vector <- get_sentiment(text, method="afinn")
# Compare first few scores
rbind(
sign(head(syuzhet_vector)),
sign(head(bing_vector)),
sign(head(afinn_vector))
)
# Emotion classification with NRC
d <- get_nrc_sentiment(text)
td <- data.frame(t(d))
td_new <- data.frame(rowSums(td))
names(td_new) <- "count"
td_new <- cbind("sentiment" = rownames(td_new), td_new)
# Create a more informative plot
ggplot(td_new, aes(x = reorder(sentiment, count), y = count, fill = sentiment)) +
geom_bar(stat = "identity") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") + # Remove legend since it's redundant
labs(title = "Sentiment Analysis Results",
x = "Sentiment",
y = "Count") +
scale_fill_brewer(palette = "Set3") +
coord_flip() # Flip coordinates for better readability
# Create multiple visualizations
# 1. Basic sentiment scores comparison
sentiment_scores <- data.frame(
Text = 1:length(text),
Syuzhet = syuzhet_vector,
Bing = bing_vector,
Afinn = afinn_vector
)
# Reshape data for plotting
sentiment_long <- tidyr::pivot_longer(sentiment_scores,
cols = c(Syuzhet, Bing, Afinn),
names_to = "Method",
values_to = "Score")
# Plot 1: Compare different sentiment scoring methods
p1 <- ggplot(sentiment_long, aes(x = Text, y = Score, fill = Method)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Comparison of Sentiment Scoring Methods",
x = "Text Sample",
y = "Sentiment Score") +
scale_fill_brewer(palette = "Set2")
# Plot 2: NRC Emotion Analysis (already created above)
p2 <- ggplot(td_new, aes(x = reorder(sentiment, count), y = count, fill = sentiment)) +
geom_bar(stat = "identity") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
labs(title = "Emotion Analysis Results",
x = "Emotion",
y = "Count") +
scale_fill_brewer(palette = "Set3") +
coord_flip()
# Plot 3: Word Cloud (already created above)
# The word cloud is saved as "wordcloud.png"
# Display all plots
print(p1)
print(p2)
# Print summary statistics
cat("\nSummary of Sentiment Scores:\n")
print(summary(sentiment_scores[, -1]))
# Print the most positive and negative texts
cat("\nMost Positive Text:\n")
print(text[which.max(syuzhet_vector)])
cat("\nMost Negative Text:\n")
print(text[which.min(syuzhet_vector)])
Summary of Sentiment Scores:
Syuzhet Bing Afinn
Min. :-1.750 Min. :-2.00 Min. :-5.00
1st Qu.:-0.250 1st Qu.: 0.00 1st Qu.:-0.75
Median : 0.325 Median : 0.00 Median : 1.50
Mean : 0.600 Mean : 0.80 Mean : 2.20
3rd Qu.: 1.738 3rd Qu.: 2.75 3rd Qu.: 5.75
Max. : 3.150 Max. : 4.00 Max. :10.00
Most Positive Text:
[1] "The customer support team was incredibly helpful and resolved my issue within minutes. Amazing!"
Most Negative Text:
[1] "I'm really frustrated with the shipping delay. The product is good but the wait was unacceptable."
This code applies sentiment scoring using syuzhet
, bing
, and afinn
lexicons and visualizes emotions (e.g., joy, sadness) with the NRC lexicon.
Lexicon-Based Analysis
Lexicons like bing
and afinn
assign sentiment scores to words:
- Bing: Binary (positive/negative, e.g., “abandon” = negative).
- Afinn: Numeric scores (e.g., “abandon” = -2).
- NRC: Categorizes emotions (anger, joy, etc.).
Example: Hotel Sentiment Scores
Hotel | Agoda Sentiment | Agoda Rating | Booking.com Sentiment | Booking.com Rating |
---|---|---|---|---|
One World | 6.85 | 8.5 | 6.59 | 8.5 |
Summer Suite | 7.27 | 8.4 | 7.1 | 8.7 |
These scores reflect overall sentiment from reviews, often aligning with ratings but providing deeper emotional insights.
Conclusion
Sentiment analysis offers a powerful way to understand emotions in text, though it requires careful interpretation due to linguistic complexities. Using R’s tm
and syuzhet
packages, you can preprocess text, score sentiments, and visualize emotions, making it ideal for social media or review analysis.