Natural Language Processing
Review Sentiment Analysis
About this Project
The goal of this project was to analyze product reviews from two different websites to determine the accuracy of
review scores using sentiment analysis. By leveraging Selenium and Python, I scraped product reviews from both
websites and employed Natural Language Processing (NLP) techniques to assess the sentiment conveyed in these
reviews.
Data Collection
Using Selenium, I automated the extraction of product reviews from two distinct websites. This allowed me to gather a
substantial amount of review data for subsequent analysis.
Sentiment Analysis
Once the data was collected, I utilized NLP to analyze the sentiment of each review. Specifically, I employed two
Hugging Face models—VADER and Roberta—to process and evaluate the sentiment behind the review text.
The analysis revealed a notable discrepancy: one website appeared to artificially inflate its review scores. Despite
a broad range of reviews, the scores consistently fell between 4.1 and 5.0, suggesting that the website's scoring
system was skewed to present a more positive overall image.
Data Verification
To ensure the accuracy of the sentiment analysis, I manually compared several reviews to their actual text and
sentiment scores. This manual check confirmed that the website's review system was indeed skewed, as the lower bound
of their 4.1-5.0 scale mis-represented the range of a true 1-10 scale.
Analysis Results
In my analysis, I noticed a clear anomaly where nearly every review from one site (LivingSpaces) was rated between 4.1 and 5.0. This
range corresponded to what appeared to be an inflated rating system, skewing the perceived positivity of the
reviews.
Proof is in the pudding:
I also noticed LivingSpaces had Identical reviews on completely different SKUs, often totally unrelated items. This was not a one time occurence
Identical Rating Examples:
ID 333: SKU: 241131, Review: The body of the dresser is very sturdy, the drawers are not as sturdy. I bought this online and was hoping that the drawers would have been better.
ID 283: SKU: 81481, Review: The body of the dresser is very sturdy, the drawers are not as sturdy. I bought this online and was hoping that the drawers would have been better.
ID 313: SKU: 81478, Review: The body of the dresser is super sturdy, the drawers are not very sturdy. This is par with how everyone is making furniture anymore. I was hoping for better.
ID 357: SKU: 241125, Review: The body of the dresser is super sturdy, the drawers are not very sturdy. This is par with how everyone is making furniture anymore. I was hoping for better.
Key Insights
- One website's review scores were consistently high, indicating potential artificial inflation.
- Sentiment analysis using VADER and Roberta confirmed discrepancies in the review rating systems.
- Manual verification supported the findings of inflated review scores, as the lower end of the score range
represented an exaggerated positive scale.
Thank you for exploring this analysis!
Review sentiment analysis powered by Hugging Face NLP models
Full Jupyter notebook and code on GitHub