Beer Can Detection | Sarvesh Patil

The problem statement for this project was given by an Ad Agency operating primarily in South East Asian countries like Thailand, Cambodia, Vietnam, and Laos. I trained a YOLO-v5 model with a self-supervised learning loop to detect all can-shaped objects with few ground truth annotations (<20). I further web-scraped images of beer cans from the internet and trained a ResNet-50 only to get a 65% accuracy in classification across 19 beer brands (classes). In order to improve this score, I used the latest (in 2020) StyleGanv2 model to generate data conditioned on the web-scraped data, as some flavor of artificial data augmentation. It helped increase the classification accuracy from 65% to 85%.

Challenge:

The Ad Agency contract workers were the hoi polloi in their respective countries. Hence, most of the images they received were taken with low-quality smartphone cameras, with no consideration for reflection or external illumination. There were 2 main challenges due to this:

Detecting objects behind bright ambient illumination was hard due to occlusions.
Classification of detected images was hard due to external reflections in the environment adding noise to the image patches.

Motivation:

Different brands pay varying amount of money to show their products on the shelves in grocery stores. Ad agencies track their stocks across vendors and ensure that the payment is justified.