You should check it before buying a new Toyota Corolla Hybrid
You decided to buy a brand new car, and you are really excited, afterall it is such a big thing to own a car here (%200 tax). But is it really gonna worth it ? If you decide to buy one, would it be a reliable, strong on the second hand and make you happy ?
These are some questions bugging my head when I decided to buy my first brand new car. I already decided the car, it was Toyota Corolla Hybrid, it was looking reliable, fuel efficient and good size for trunk. Also it is exciting to try new tech so being hybrid was also influencing. It was the third most selling car so it should be rather easy to sell it if I wanted to. As most people do, I’ve went and checked eksisozluk which would have an opinon as most of the things. Bu there were various ideas, some liked some not and decided to check for keywords about chronic problems etc. There were some listed and you can find it when you Google, but I was wondering if I could find users opion in summary.
So I decided to crawl some of the most used web sites for product reviews and complains.
I aimed to answer these three questions:
- Can I find a useful information among those opions so that data can help me on my decision process?
- Can I build something generic out of it so I can use for other products or maybe for other products also.
- Can I find additional chronic problems or issues other than blogposts or googled answers?
I've created crawler for the eksisozluk.com and used a quick way to crawl complains from sikayetvar.com Toyota Corolla Hybrid has 236 comments on eksisozluk.com and 60 complaints on sikayetvar.com
eksi_toyota.head()
sikayetvar_toyota.head()
The raw data was bit dirty so I had to remove \n, \r and some special characters before working on it. Translated all comments from Turkish to English using Google Cloud Translate API.
After cleaning, removed specific keywords like, toyota, vehicle and lemmatized words.
Computed comment_length, word_count for each comment. Then used Textblob to predict the sentiment of each comment.
Since eksisozluk.com is a place where people can write good and bad experiences, we can see the balance from the sentiment distribution. Most of the comments are short.
On the other hand, people often visit sikayetvar.com when they experience some issue, so we can see that complaints length and sentiment are diverse.
On eksisozluk.com, people tend to write longer comments if they like the car.
On the other hand, on sikayetvar.com if users are frustrated or negative it looks like they write longer.
on eksisozluk.com, people are mostly talking about cars consumption and speed.
on sikayetvar.com, it seems like people complains more about service and warranty issues.
I've used several methods to find the topics, at the end, LDA was the best model among them. Lets see if we can divide those comments and complaints to 3 main topics.
Here is the formula of three topics:
Topic_0: 0.023"km" + 0.019"car" + 0.013"engine" + 0.009"liter" + 0.009"fuel" + 0.009"speed" + 0.008"use" + 0.007"drive" + 0.007"consumption" + 0.007"do"
Topic_1: 0.016"car" + 0.011"engine" + 0.010"km" + 0.009"like" + 0.007"drive" + 0.007"electric" + 0.007"use" + 0.006"road" + 0.006"gasoline" + '0.006"fuel"
- Topic_2: 0.023"car" + 0.013"engine" + 0.009"km" + 0.009"city" + 0.009"fuel" + 0.008"use" + 0.008"battery" + 0.008"drive" + 0.008"like" + 0.007"consumption"
Here we can see on eksisozluk.com topics are:
- Cars consumption efficienty
- Cars speed
- More or less similar as 1.
Here is the formula of three topics:
Topic 0: 0.044"car" + 0.033"km" + 0.024"engine" + 0.018"fuel" + 0.017"battery" + 0.015"consumption" 0.014"like" + 0.014"think" + 0.014"use" + 0.013"gasoline"
Topic 1: 0.038"km" + 0.037"car" + 0.021"drive" + 0.021"city" + 0.020"liter" + 0.020"use" + 0.017"fuel" + 0.016"do" + 0.016"engine" + 0.015"burn"
Topic 2: 0.036"engine" + 0.033"car" + 0.019"km" + 0.016"speed" + 0.016"like" + 0.016"battery" + 0.015"electric" + 0.013"buy" + 0.013"fuel" + 0.012"drive"
These are the words and their weight on building the corresponding topics.
Here we can see on sikayetvar.com topics are:
- First topic is generally about engine.
- Second one is about consumption.
- Third one is similar to second but also about battery.
It seems like I found answers to first two of my questions on beginning, so it seems like people liked the car and there are complaints about service and warranty. This approch can be transformed into a generic application so we can search for keywords and get a summary about products. However I could not find any chronic problem information on first glance. Maybe it can be derived with deeper analysis.
I've created a interactive dashboard using Streamlit and LDA with the top selling three cars of 2022 in Turkiye, so you can review from the application. You can get more information about the analysis from the Github repository.