CS-E4580 exercise: Neural Networks

Overview Examples Details Tasks Hints

Overview

In this task, you will use a Convolutional Neural Network to classify images.

Examples

Input
 86.28% [483] castle
  3.24% [449] boathouse
  2.74% [497] church, church building
  2.54% [975] lakeside, lakeshore
  1.78% [698] palace
Input
 21.03% [968] cup
 16.04% [504] coffee mug
  7.45% [725] pitcher, ewer
  7.03% [470] candle, taper, wax light
  6.68% [868] tray

Detailed specification

Background

You will need some basic understanding of the concept of neural networks and, more specifically, convolutional neural networks. Here are some useful pointers:

VGG-19 network

In this task, you will not need to design a neural network or train it. We will use an existing pre-trained network that is freely available online.

The network that we will use is called VGG-19, developed by Karen Simonyan and Andrew Zisserman, and described in the following technical report:

Karen Simonyan and Andrew Zisserman (2014): Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.

The network is called “configuration E” in the above paper. The network is available for download under the CC BY 4.0 license.

The input of the network is an RGB image of dimensions 224 × 224, and the outputs of the network are classification labels; there are 1000 possible labels that the network can output.

The network consists of 19 layers, but each of them is very simple: There are convolution layers that simply calculate convolutions with a sliding window of size 3 × 3 and apply ReLU. There are maxpool layers that reduce the dimensionality by replacing each 2 × 2 block by the maximum value. And finally, there are fully connected layers that are familiar from usual neural networks.

Computation starts with a 3-dimensional array of dimensions 224 × 224 × 3. This is our input image; 224 × 224 pixels and 3 channels (RGB). Then we apply 64 different convolutions with a window of size 3 × 3 to obtain an array of dimensions 224 × 224 × 64. Another convolution layer maps this to a new array of dimensions 224 × 224 × 64, and a maxpool layer then reduces the dimensionality to 112 × 112 × 64. Note that we have increased the number of channels and decreased the image dimensions. Similar steps of interleaved convolution and maxpool layers are repeated until we are left with an array of dimensions 7 × 7 × 512, and this is finally interpreted as a flat array with 25088 elements. Now regular fully-connected neural networks are applied to map this to an array of 4096 elements, then another array of 4096 elements, and finally to an array of 1000 classification labels.

Baseline implementation

We have prepared a simple baseline implementation that solves the task in a straightforward manner; you can use this baseline implementation as a starting point in your work.

To get started, download the package nn.zip (511MB) and unzip it. In the package, you will find both the source code of the classifier and the weights of the VGG-19 network in a format that is easy to use in a C++ program. The classifier should compile in our classroom environment by running “make”. Here is a quick recipe that you can follow:

wget https://users.aalto.fi/~suomelj1/ppc-2017/nn.zip
unzip nn.zip
rm nn.zip
cd nn
make

To see that it works, try it out with two examples:

./cvgg examples/nn1.bin
./cvgg examples/nn2.bin

The first example is a picture with a castle, and the second example is a picture with some cups, and the classifier should identify these more or less correctly.

To run the classifier with your own image, you will need to convert it in a suitable format: a flat file with 224 × 224 × 3 floating point numbers. To convert your own image (input.png) in a suitable format (input.bin), you can use the following command:

./imgconv.py input.png input.bin

The conversion tool should work fine with both JPEG and PNG files; it is recommended to use a square image; otherwise the conversion tool will crop it. Once you have your image in the right format, you can run the classifier:

./cvgg input.bin

The output will list top 5 classes among the 1000 classes that the classifier recognises. The running time will be ca. 20s in our classroom environment.

Rules

You can freely build on top of our baseline implementation; you can use any parts of it and modify it in any way you want. You can also write your own implementation from scratch if you prefer that. You are free to modify Makefiles, add whatever compiler flags make sense.

Make sure that your implementation produces the same results as our baseline implementation. The goal here is not to improve the quality of the network, but the running time of the classifier.

Warning: large files

The network is very large; the first dense layer is a matrix with dimensions 25088 × 4096. The file that contains all weights is ca. 548 MB. Please do not try to e.g. push this file to a Git repository. Be careful with disk space and disk quotas.

Tasks

Check the course home page to see which tasks you are supposed to solve each week.

Task NN1 — CPU implementation (challenging)

Subdirectory: nn1 (no template provided).

Write an efficient CPU implementation that classifies images with the VGG-19 network. You are expected to use multithreading and vector operations in your implementation. Make sure it produces the same results as our baseline implementation. Measure the performance in our classroom environment and compare it with the baseline. At least factor-10 improvements are expected for full points.

Your submission has to contain a written report, as a PDF file with name report.pdf, and the source code of your implementation. Do not push the file vgg19-weights.bin to Github.

Task NN2 — GPU implementation (challenging)

Subdirectory: nn2 (no template provided).

Write an efficient GPU implementation that classifies images with the VGG-19 network. Make sure it produces the same results as our baseline implementation. Measure the performance in our classroom environment and compare it with the baseline. At least factor-10 improvements are expected for full points.

Your submission has to contain a written report, as a PDF file with name report.pdf, and the source code of your implementation. Do not push the file vgg19-weights.bin to Github.

Hints

References

Karen Simonyan and Andrew Zisserman (2014): Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.