Page 1 of 1

How to prepare different image sizes for classification?

PostPosted: Fri Mar 10, 2017 8:19 pm
by rmk60
I am using Activation Network with Backpropagation Learning.
I want to classify object images in the sky like airplanes, birds, clouds (all gray scaled of different size received from motion tracking).
Finally I need only 2 classes, e.g. it is a bird or it isn't.
The network input is of fixed size 32x32[pixel]=1024.
The object images are resized either by width or by height depending on which is larger to fit into the final image size of 32x32.

- For a first approach I set all missing pixel to "0" (black) resulting in black bars on top, bottom or left, right side.
Example: Image with "0"s on left and right side

I tried several network designs and decided for 1024 inputs, 49 neurons in layer 1, 24 neurons in layer 2, 2 output neurons.
Unfortunately I have only 230 learn data and around 200 test data. The result is 88% accuracy.
However I'm not sure that this approach will really work because it seems that the black bars have a strong influence on the weights?!

- Now the second approach was to fill the missing pixel with the mean gray pixel value of the original image.
However the learning algorithm does not find a global minimum.

Can anyone give some advise how to preprocess the images?
My idea was to use canny edge filtering getting a black background for all images or histogram equalization ?!?!?

Note: The approach should also work in darkness meaning dark background at night and light background at day.

Re: How to prepare different image sizes for classification?

PostPosted: Wed Feb 27, 2019 6:03 am
by vedika31
(1) Large-batch size
In theory, a larger mini-batch size should help the network converge to a better minimum and therefore better final accuracy. People usually get stuck here because of GPU memory, since the biggest consumer GPUs one can buy only go up to 12GB (for the Titan X) and 16GB on the cloud (for the V100). There are 2 ways we can get around that challenge:
(1) Distributed training: Split up your training over multiple GPUs. on each training step, your batch will be split up across the available GPUs. For example, if you have a batch size of 8 and 8 GPUs, then each GPU will process one image. You’ll then combine all the gradients and outputs at the end. You do take a small hit from the data transfer between GPUs, but still gain a big speed boost from the parallel processing. This functionality is supported right out of the box in many deep learning libraries, including Keras
(2) Changing the batch and image size during training: Part of the reason why many research papers are able to report the use of such large batch sizes is that many standard research datasets have images that aren’t very big. When training networks on ImageNet for example, most state-of-the-art network used crops between 200 and 350; of course they can have large batches with such small image sizes! In practice, due to current camera technology, most of the time we are working with images that are 1080p or at least not too far off from it.
To get around this small bump in the road, you can start off your training with smaller images and larger batch size. Do this by downsampling your training images. You’ll then be able to fit many more of them into one batch. With the large batch size + small images you should be able to already get some decent results. To complete the training of your network, fine tune it with a smaller learning rate and large images with a smaller batch size. This will get the network to re-adapt to the higher resolution and the lower learning rate keeps the network from jumping away from the good minimum found from the large batch. As a result, your network is able to get to a good minimum from the large batch training and works well on your high-resolution images from the fine tuning.