ICS-E4020 exercise: Sorting

Overview Implementation Details Tasks Hints

Overview

Implement a parallel sorting algorithm that outperforms single-threaded std::sort.

Implementation

We will implement a benchmark tool so-test that performs the following steps:

Usage

The benchmark tool can be used as follows:

    ./so-test N

where N is the number of elements in the test data. The tool will run 5 tests, all with the same number of elements but with different input data distributions: 0 = large random elements, 1 = small random elements, 2 = constant input, 3 = increasing values, and 4 = decreasing values.

If you run, for example

    ./so-test 100000000

you will get an output that looks similar to the following:

    so  0  100000000  2.135  7.957       
    so  1  100000000  1.574  2.325       
    so  2  100000000  1.357  1.735       
    so  3  100000000  1.079  1.713       
    so  4  100000000  1.136  1.233       

Here column 2 identifies the input data distribution; the first row is the most interesting case, with large random elements. Column 4 is the running time of your implementation and column 5 is the running time of std::sort.

Template

You can find a working implementation in our shared code repository, in the subdirectories so*. There is one directory per task.

You only need to implement the subroutine psort() in file so*/so.cc. See so*/so.h for the description of the interface.

Once you have implemented the function correctly, you should be able to run make to compile your code, make test to check that it works correctly (at least for some test cases) and make benchmark to see how well it performs.

Detailed specification

You need to implement the following function:

    void psort(int n, data_t* data)

Here n is the size of the input array data. All input elements are of type data_t, i.e., unsigned long long, i.e., 64-bit unsigned integers.

Correct output

Your function should have the same behaviour as:

    std::sort(data, data+n);

That is, you need to sort this in place: modify the input array so that it will be in a sorted order. You are free to allocate some additional memory to keep temporary data if needed.

Rules

In your implementation, you are permitted and encourage to use all single-threaded features of the C++ standard library. In particular, you can freely use the single-threaded std::sort as a subroutine in your code, as well as all other features of the algorithms library.

For multi-threading, you can use the basic primitives of OpenMP, as usual.

Tasks

Check the course home page to see which tasks you are supposed to solve each week.

Task SO1

Subdirectory: so1, file to edit: so1/so.cc.

Implement an efficient parallel sorting algorithm, using the basic idea of merge sort.

Examples of a sufficient performance with classroom computers:

Note that the single-threaded std::sort takes approx. 8 seconds in this case.

If you run make benchmark2 and you get running times in the following ballpark, you can be happy:

    so  0  100000000  2.135  7.957       
    so  1  100000000  1.574  2.325       
    so  2  100000000  1.357  1.735       
    so  3  100000000  1.079  1.713       
    so  4  100000000  1.136  1.233       

Task SO2 — optional

Subdirectory: so2, file to edit: so2/so.cc.

Implement an efficient parallel sorting algorithm, using the basic idea of quicksort.

Examples of a sufficient performance with classroom computers:

Note that the single-threaded std::sort takes approx. 8 seconds in this case.

If you run make benchmark2 and you get running times in the following ballpark, you can be happy:

    so  0  100000000  2.161  7.973   
    so  1  100000000  0.772  2.313   
    so  2  100000000  0.118  1.805   
    so  3  100000000  0.686  1.750   
    so  4  100000000  1.483  1.273   

Hints

Task SO1

If you have p threads available, you can split your input data in p blocks. Sort each block with e.g. std::sort, simultaneously in parallel. Then merge the results. For example, first merge p/2 pairs of blocks in parallel so that you are left with p/2 sorted blocks, and repeat this until you are left with 1 sorted block.

Helpful OpenMP features:

Task SO2

Write a fairly normal recursive quicksort implementation. Then parallelise it using the following ideas:

Helpful OpenMP features: