BitterBetter uses textual descriptions of beer to predict bitterness.
The use case is for brewers to calibrate their brand stories.
Consider the marketing copy for the following two beers.
A moderately dark rye ale brewed with traditional Bavarian Hefeweizen yeast to give a roasty and fruity finish.
Being a grown-up means responsibilities galore but it also means getting grown-up treats. And we don’t mean fruit snacks and goldfish. We’re talking big, bold, 8.5% treats. Our Daddy’s Juice Box Double IPA is a big, juicy, hop declaration of a treat. With a full nose of tropical aromas and a super smooth finish, this hop beast is sure to reward. So get your own juice box, kids. This one’s for daddy.
The description of the dark rye, with words like “Hefeweizen” and “roasty,” doesn't sound bitter at all, while Daddy's juice box, with “big, bold” and “hop beast” sounds like it will be pretty bitter, especially for a juicy IPA. But in fact, the IBU rating of the former is 70, which is quite bitter, and the IBU rating of the latter is only 25, indicating a low level of bitterness.
What's the result? Consumers who try these beers based on their descriptions might be disappointed, while consumers who would have been more likely to enjoy them might not bother trying them. This points to the importance of having a description that captures the right information about the bitterness level of the beer.
Beer bitterness is estimated via the International Bittering Unit or IBU scale, which measures the concentration of certain acids which arise during the brewing process. An IBU rating does not correspond precisely to perceived bitterness but has a high correlation with it.
The data for this project obtained from ratebeer.com using a custom-built distributed scraper. The database includes over 600,000 beer entries of which about 85,000 were usable.
BitterBetter uses natural language processing methods to predict the IBU rating of beers. Words are tokenized and both individual words and bigrams are one-hot-encoded. A combination of linear regression with ridge and lasso normalization and random forest ensembles are used to generate the predictions.
Some of the individual words and bigrams with the strongest contributions to the model are as follows:
The median absolute error is around 6.75 on a validation set. This is pretty good resolution, as IBU differences at that scale are overshadowed by other flavor components.
This project was implemented in python. Specific technologies and packages used in it included: