Term Frequency Analysis – Part One
As Revinate’s Data Scientist, I’m looking forward to discussing how data science can assist hoteliers, both big and small, at the upcoming Hotel Data Conference in Nashville, of which Revinate is a Gold Sponsor.
In order to be able to provide insight into our customers’ data, we need to have the right technologies at our core. Namely, we need a data infrastructure that can handle the data needs of all our customers and partners, today and in the future.
A simple example of one of the new approaches our technology allows us to experiment with is Term Frequency Analysis, a really simple yet powerful way to get insight on trends across multiple properties, or even multiple markets. Term Frequency Analysis takes an online review and counts how often each word appears in it. We also record all the metadata associated with the review such as review score, subratings and sentiment so that we can see how frequently a word appears, segmented by any of the metadata mentioned above. We then repeat this process for thousands or millions of reviews, allowing us to see how popular a word is in online reviews and whether it appears more often in positive or negative reviews.
The interactive example below looks at ~2,500 of the most popular words that appear across approximately five million English language reviews from all over the world. We split the results by review score, which appear across the top of the graph. (Note: scores have been normalized and rounded to between 1 – 5). The left-hand side shows how popular a word is for that review score, with 1 the most popular and 4,000 very unpopular. The four text boxes at the top allow you to select words to search. The auto-complete list will show you which words you can search for. Once you have selected the words you want to graph, click the “Graph” button at the right hand side. You will then be able to see if there are any correlations between review scores and specific words being mentioned more/less frequently.
Wednesday, in part two, I will share some interesting graphs that I created, along with a brief commentary of why I found them interesting.