Avoiding overfitting in object detection problem

Posted on Tue 20 December 2016 • Tagged with Deep Learning, Code Snippets

Recently I took part in AI Hackaton (2nd prize btw!) in Minsk with a pretty interesting challenge. My team tried to build a model that detects weeds at fields aerophotos. The photos were taken with a special multispectrum camera and looked like this:

Weeds!

A sample of image with two types of weeds marked up.

We marked up the photos manually and fitted the model (it was based on this repository). Results looked promising before I tried to feed the model with my own photo. I was kind of surprised to see a lot of "weeds" on my face.

We expected some smart ass from the jury could try to do the same, so it was obvious problem to resolve. The solution was straigtforward - we needed to fit the model not only with the fields photos, but also with various crap - the more variety the better. At the same time we were limited by schedule, so we could not feed the network gigabytes of data trying to reach the variety.

So the solution was to create collages.

How to create collages

First of all you need to find wide variety of pictures. It's pretty easy. Open Dataset is our friend here. Find a link for "Image URLs and metadata" file and download it.

You need nothing specific but almost standard libraries - numpy, PIL, requests, pandas. Last two are not really that required, however I guess you have them installed.

Let's make pictures with 3*4 crops. These pics were good for the hackaton solution (quick and dirty), more mature solutions may need more random included.

from io import BytesIO
from uuid import uuid4

import numpy as np
import pandas as pd
import requests
from PIL import Image

images = pd.read_csv('img.csv')['OriginalURL']


def crop_image(img):
    try:
        return _crop_image(img)
    except:
        print('failed while parsing img')


def _crop_image(img):
    r = requests.get(img)
    print('fetching image')
    im = Image.open(BytesIO(r.content))
    half_the_width = im.size[0] / 2
    half_the_height = im.size[1] / 2

    img = im.crop(
        (
            half_the_width - 256,
            half_the_height - 256,
            half_the_width + 256,
            half_the_height + 256
        )
    )
    return img


def _unite_pics(pics):
    first = pics[:4]
    second = pics[4:8]
    third = pics[8:]
    first = np.hstack((np.asarray(i) for i in first))
    second = np.hstack((np.asarray(i) for i in second))
    third = np.hstack((np.asarray(i) for i in third))
    imgs_comb = np.vstack((first, second, third))
    img = Image.fromarray(imgs_comb)

    img.save('pic_{}.jpg'.format(uuid4()))
    print('image saved')


def unite_pics(pics):
    try:
        _unite_pics(pics)
        return 1
    except:
        return 0


def sample_img():
    return images.sample(1).tolist()[0]


def main():
    count = 0
    while count < 50:
        current = []
        while len(current) != 12:
            img = crop_image(sample_img())
            if img:
                current.append(img)
        if unite_pics(current):
            count += 1

The script generates pics 2048*1536 - that's what we needed. For most problems it's too big, so you may need to tune it a bit.

Result images may looks impressive.

This guy is watching you!

This guy is watching you!

A puppy wearing hood, a chart, fire and a naked man. Nice combo!

a puppy, a chart, fire and a naked man