Sorting Reference #

Sorting is a key aspect of many algorithms and is a foundational topic in computer science. There are many different sorting algorithms in existence, each with tradeoffs between complexity, required space, ease of implementation, or even prior knowledge about the data being sorted. In this post, I will first describe the requirements to begin sorting and implement several sorting algorithms. I also list the tradeoffs each algorithm makes.

Background #

How do we know that it is possible to sort a given set of elements? As long as there is at least a total order, then it is possible to compare two elements of a given set and establish an ordering. For example, integers are totally ordered in that there exists a strict ordering of numbers such that numbers that come first are always smaller than or equal to numbers that come after. However, floats are only partially ordered. Consider the case where NaN is part of our set to sort. There is no strict ordering of all the elements in the set because the NaN could technically be put anywhere in the sorted set. In order to sort floats, we must additionally define how to treat NaN’s and essentially establish a total order over the floats. Given a total ordering over the set to be sorted, one of the following sorting algorithms can be used.

There are several useful properties for sorting algorithms to have. A given sort will sort in-place if the additional space required is constant (i.e. \(\Theta(1)\) ). A stable sort means that if two elements a and b are equal, and were initially ordered so that b comes first, then the sorted output will always place b before a despite the fact that they are equal. An unstable sort is just the opposite of a stable sort. It makes no guarantees about the relative ordering of two equal elements before and after the sort. Adaptive sorts will perform better on sets where most elements are already sorted correctly. Lastly, an online sort is able to begin sorting even without the full set of input elements.

Many common sorting algorithms (merge sort, insertion sort, quick sort, or selection sorts) are comparison sorts. A comparison sort is a sorting algorithm that determines order based only on direct comparisons between pairs of input elements. Comparison sorts are great for general purpose sorting where there is no prior information about the unsorted elements. However, these sorts have a lower bound on their runtime. It is possible to view these comparison sorting algorithms as a tree where each node contains a comparison between two elements. The edges consists of the comparison operator, which in this case is either \(\leq\) or \(>\) . The leaves of a full decision tree must contain all possible permutations of the potentially sorted output. This number of possible permutations is \(n!\) . Therefore, by property of binary trees, the height of the tree must be at least \(\log (n!)\) . The lower bound of the height gives the best case number of comparisons required to reach some arrangement of the output. After simplifying the height, it can be shown that a comparison sort must perform at least \(\Omega(n \log n)\) comparisons in the best case.

graph TB A(("1:2")) -- "≤" --> B(("2:3")) A(("1:2")) -- ">" --> C(("1:3")) B(("2:3")) -- "≤" --> D["[1,2,3]"] B(("2:3")) -- ">" --> E(("1:3")) E(("1:3")) -- "≤" --> F["[1,3,2]"] E(("1:3")) -- ">" --> G["[3,1,2]"] C(("1:3")) -- "≤" --> H["[2,1,3]"] C(("1:3")) -- ">" --> I("2:3") I(("1:3")) -- "≤" --> J["[2,3,1]"] I(("1:3")) -- ">" --> K["[3,2,1]"]

If there is more information about the elements being sorted, such as a maximum and a minimum value, then it is possible to sort in linear time. Examples of such sorting algorithms include counting sort, radix sort, and bucket sort.

Insertion Sort #

An insertion sort will take data from the unsorted set and insert it in the correct location of the sorted output. Note that this still takes \(\Theta(1)\) additional space. During each iteration, the first element of the unsorted set is inserted into the correct location of the sorted output. This is not the smallest element of the unsorted set, but just the first one the insertion sort encounters.

void insertion_sort(int* array, int size) {
    for(int i = 1; i < size; i++) {
        int element = array[i];
        int j = i - 1;
        
        // Move greater elements up to make room
        while (j >= 0 && array[j] > key) {
            array[j + 1] = array[j]
            j -= 1;
        }

        // Insert the element into the correct spot
        array[j + 1] = element;
    }
}

Worst Case Comparisons	Average Case Comparisons	Best Case Comparisons	Space
\(O(n^2)\)	\(O(n^2)\)	\(O(n)\)	\(O(1)\)

Benefits

Stable
Online
Adaptive: Mostly sorted runtime is O(n)
In-Place

Drawbacks

Slow in the general case.

Selection Sort #

Selection sort is similar to insertion sort. However, instead of selecting the first element it encounters in the unsorted set, it picks the smallest element in the unsorted set puts it at the top of the sorted set.

void selection_sort(int* array, int size) {
    for (int i = 0; i < size; i++) {
        int smallest_idx = i;
        for (int j = i + 1; j < size; j++) {
            if (array[j] < array[smallest_idx]) {
                smallest_idx = j;
            }
        }
        
        swap(array, i, smallest_idx);
    }
}

Worst Case Comparisons	Average Case Comparisons	Best Case Comparisons	Space
\(O(n^2)\)	\(O(n^2)\)	\(O(n^2)\)	\(O(1)\)

Benefits

Constant additional space

Drawbacks

Slow. Worse than insertion sort.

Merge Sort #

Merge sort is a divide and conquer type algorithm. It divides the unsorted set into two or more equal partitions. Then, it recursively merge sorts each partition. Finally, merge the two sorted halves together into a complete sorted list. If the chunk size is small enough, it is usually more efficient to fall back to a sorting algorithm that works well on small inputs with a small constant factor.

void merge_sort(int* array, size_t left_idx, size_t right_idx) {
    if (left_idx < right_idx) {
        size_t center_idx = (left_idx + right_idx) / 2;

        merge_sort(array, left_idx, center_idx);
        merge_sort(array, center_idx + 1, right_idx);
    
        merge(array, left_idx, center_idx, right_idx);
    }
}

void merge(int* array, size_t left_idx, size_t center_idx, size_t right_idx) {
    int left_array_size = center_idx - left_idx + 1;
    int right_array_size = right_idx - center_idx;

    int left_array[left_array_size];
    int right_array[right_array_size];

    memcpy(left_array, array, left_array_size);
    memcpy(right_array, array + left_array_size, right_array_size);

    int left = 0, right = 0, dst = 0;
    while(left < left_array_size && right < right_array_size) {
        if (left_array[left] <= right_array[right]) {
            array[dst] = left_array[left];
            left += 1;
        } else {
            array[dst] = right_array[right];
            right += 1;
        }

        dst += 1;
    }

    while (left < left_array_size) {
        array[dst++] = left_array[left++];
    }

    while (right < right_array_size) {
        array[dst++] = right_array[right++];
    }
}

Worst Case Comparisons	Average Case Comparisons	Best Case Comparisons	Space
\(O(n \log n)\)	\(O(n \log n)\)	\(O(n \log n)\)	\(O(n)\)

Benefits

Can be stable
Parallelizable
Works well on arbitrary inputs
Merge sort on linked-lists only requires constant additional space.

Drawbacks

Requires a lot of additional space

Heap Sort #

A heapsort is similar to a selection sort but instead of linearly searching for the largest or smallest element, it takes advantage of a heap data structure to quickly find the largest or smallest element in the heap in \(O(\log n)\) time. A max-heap is a binary tree in which the root element is the largest element in the tree and all child nodes are less than or equal to the root element. A min-heap is the same except parent nodes are less than or equal to the child nodes. To sort a given set of elements, create a max-heap in-place in \(O(n)\) time. Swap the first element (i.e. the largest element) with the last element of the heap. Fix the heap by sifting the first element to its correct location and decrease the size of the heap by one. Once the heap is empty, everything has been sorted in-place.

void heap_sort(int* array, size_t size) {
    build_max_heap(array, size);

    size_t heap_size = size;
    while (heap_size > 1) {
        heap_size -= 1;
        swap(0, heap_size);
        fix_heap(array, heap_size, 1);
    }
}

void build_max_heap(int* array, size_t size) {
    for (int i = size / 2; i >= 1; i--) {
        fix_heap(array, size, i);
    }
}

void fix_heap(int* array, size_t heap_size, size_t index) {
    int left_child_idx = 2 * index;
    int right_child_idx = 2 * index + 1;

    int largest_idx = index;

    if (left_child_idx < heap_size &&
        array[left_child_idx - 1] > array[index - 1]) {
        largest_idx = left_child_idx;
    }

    if (right_child_idx < heap_size &&
        array[right_child_idx - 1] > array[largest_idx - 1]) {
        largest_idx = right_child_idx;
    }

    if (largest_idx != index) {
        swap(array, largest_idx - 1, index - 1);
        fix_heap(array, heap_size, largest_idx);
    }
}

Worst Case Comparisons	Average Case Comparisons	Best Case Comparisons	Space
\(O(n \log n)\)	\(O(n \log n)\)	\(O(n)\)	\(O(1)\)

Benefits

Fast and space efficient with no exponential worst case
In-place

Drawbacks

Usually slower in real life than quick sort
Not a stable sort

Quick Sort #

Quick sort is another divide and conquer type algorithm. Pick an arbitrary element q in the array called the pivot and divide the array into three sections. The first section contains all the elements less than q, the second section contains all the elements equal to q, and the last section contains all the elements larger than q. Run quicksort on the first and the last arrays. Then just merge the first, second, and third arrays by concatenating them together because the arrays are already in sorted order. If the pivot divides the array into two semi-balanced subsections, then this is a good pivot. The runtime will asymtopically be equivalent to merge sort. If the pivot is always bad and the subsections are not balanced, then quicksort devolves into a selection sort. The pivot is usually choosen randomly, so in the average case with random inputs, the pivot will be balanced most of the time.

void quick_sort(int* array, size_t left, size_t right) {
    if (left < right) {
        size_t q_idx = partition(array, left, right);
        quick_sort(array, 0, q_idx - 1);
        quick_sort(array, q_idx + 1, right);
    }
}

size_t partition(int* array, size_t left, size_t right) {
    size_t q_idx = random_between(left, right);
    swap(array, q_idx, right);

    int pivot = array[right];
    int i = left - 1;

    for (int j = left; j < right; j++) {
        if (array[j] <= pivot) {
            i += 1;
            swap(array, i, j);
        }
    }

    swap(array, i + 1, right);

    return i + 1;
}

Worst Case Comparisons	Average Case Comparisons	Best Case Comparisons	Space
\(O(n^2)\)	\(O(n \log n)\)	\(O(n)\)	\(O(\log n)\)

Benefits

Fast and space efficient with no exponential worst case
In-place

Drawbacks

Usually slower in real life than quick sort
Not a stable sort

Counting Sort #

In a counting sort, it is assumed that each of the \(n\) integers to sort fall within a range from \(0\) to \(k\) . The main idea is to count the number of elements less than a given element. If there are \(q\) elements smaller than an element, insert it at index \(q\) . Counting sort takes an auxillary array of size \(k\) to store the counts. Since it takes only two iterations to count and insert into the array. The runtime is only \(O(n + k)\) .

void counting_sort(int* array, int* dst, size_t size, int maximum_value) {
    int counts[maximum_value] = 0;

    for (int i = 0; i < size; i++) {
        counts[array[i]] += 1;
    }

    for (int i = 1; i < maximum_value; i++) {
        counts[i] += counts[i - 1];
    }    

    for (int i = size - 1; i >= 0; i--) {
        dst[counts[i]] = array[i];
        counts[array[i]] -= 1;
    }
}

Worst Case	Average Case	Best Case	Space
\(O(n)\)	\(O(n)\)	\(O(n)\)	\(O(k)\)

Benefits

Linear time
Stable

Drawbacks

If the range of values is large, the auxillary space required could be very large.
Only works on a restricted range of inputs.
Not an in-place sort

Radix Sort #

Consider an alphabetical sort. Intuitively, you might begin by arranging the books by the first letter. Once all the books have been grouped by the first letter, you would continue sorting by the second letter and so one until all the books have been sorted. Radix sort follows the same principle. Begin by stably sorting by the most significant digit. Then stably sort by the second most significant digit and continue until all digits have been sorted. Choosing a stable sort guarantees that orderings from previous sorts are maintained. Otherwise, a radix sort would lose information from previous sorts and will fail. If counting sort is choosen as the stable sort and there are \(k\) digits, then a radix sort would complete in \(O(kn)\) time and take \(O(n)\) additional space.

void radix_sort(int* array, size_t size, size_t num_digits) {
    for (size_t i = num_digits - 1; i >= 0; i--) {
        counting_sort_on(array, size, i);
    }
}

void counting_sort_on(int* array, size_t size, size_t digit) {
    int counts[10] = {0};
    int output[size];

    int exp = pow(10, digit);

    for (int i = 0; i < size; i++) {
        counts[(array[i] / exp) % 10] += 1;
    }

    for (int i = 1; i < 10; i++) {
        counts[i] += counts[i - 1];
    }

    for (int i = size - 1; i >= 0; i--) {
        output[counts[i]] = array[i];
        counts[array[i]] -= 1;
    }

    memcpy(array, output, size);
}

Worst Case	Average Case	Best Case	Space
\(O(kn)\)	\(O(kn)\)	\(O(kn)\)	\(O(n)\)

Benefits

Linear time

Drawbacks

Takes a lot of additional space
Not an in-place sort
May not necessarily be faster than a comparison sort in practical applications

Bucket Sort #

Bucket sort assumes that the input is drawn from a uniform distribution. Specifically, it assumes that input elements are distributed uniformly and independently over the interval \([0,1)\) . To sort an input of size \(n\) , create \(n\) buckets to insert the input into. Each linked-list buckets spans an interval of size \(\frac{1}{n - 1}\) . Sort each non-empty bucket (usually with insertion sort). Visit each bucket in order and add all the elements back to the original array.

void bucket_sort(int* array, size_t size) {
    node* buckets[size] = create_buckets(size);

    for (size_t i = 0; i < size; i ++) {
        size_t index = floor(size * array[i]);
        buckets[index]->append(array[i]);
    }

    for (int i = 0; i < size; i++) {
        insertion_sort(buckets[i]);
    }

    size_t index = 0;
    for (int i = 0; i < size; i++) {
        node* head = buckets[i];
        while (head != null) {
            array[index++] = head->data;
        }
    }
}

Worst Case	Average Case	Best Case	Space
\(O(n^2)\)	\(O\left(n\right)\)	\(O(n)\)	\(O(n)\)

Benefits

Sorts quickly even if the input is not perfectly distributed
Linear time

Drawbacks

Takes a lot of additional space
May not necessarily be faster than a comparison sort in practical applications