Technical

Timsort — “The Fastest Sorting Algorithm”

Timsort: A very fast, O(n log n), is a  hybrid stable sorting algorithm.  It was implemented by Tim Peters in 2002 for use in the Python programming language and now used in java Arrays.sort() as well.  Timsort first analyses the list it is trying to sort and then chooses an approach based on the analysis of the list. In the background, it’s basically working by making use of two famous sorting algorithm namely merge sort and insertion sort, in a very optimistic way.

Timsort’s time complexity is O(n log n)

Fig: showing a comparison of Timsort with merge and quick sort

Let’s analyze the asymptotic analysis of some well known and fast sorting algorithms namely: heap, merge, quick, insertion sort (yeah this guy too since it very useful when no. of elements  to be sorted is less, despite having the performance of O(n2) in the worst case)

Heapsort: Time complexity of heapify is O(logn). The time complexity of creating and building heap is O(n) and the overall time complexity of Heap Sort is O(N log N).

Merge sort: Merge sort also have the worst-case complexity of O(N log N). It makes O(N log N) comparisons to sort a list of n elements.

Quick Sort: In Quick Sort the worst case occurs when the partition process always picks the greatest or smallest element as pivot. If we consider the above partition strategy where the last element is always picked as a pivot, then its time complexity yields O(n2).On the other hand, in a case i.e best case, when the partition process always picks the middle element as pivot give time complexity as  O(N log N).

 

Now the natural question arises what makes Timsort so extraordinary Over other sorting algorithms.

Timsort uses insertion sort for very small amounts of data; this is typically more efficient than a pure merge sort because the benefits of merge sort over insertion sort are asymptotic. Merge sort is asymptotically faster than insertion sort, which means that there is a threshold N such that if n≥N then sorting n elements with merge sort is faster than with insertion sort. The numerical value of threshold depends on the specific implementations though. With typical optimized implementations, insertion sort beats merge sort for a small amount of data. (YES, you read it correctly).

Most sort routines in the real world are hybrid, using an O(N log N), the divide-and-conquer technique for large amounts of data and using a different technique (usually insertion sort)  they’ve broken down the data into small enough pieces(say runs). Thus a properly implemented timsort is faster on average than a pure merge sort.

S = ( 15, 11,   6, 4,    7, 12, 14, 25, 39,      3, 6, 10, 14, 15, 21, 22   , 20, 15, 10, 8, 5, 1   )

<–first run->   <–second run–>   < —- -third run– —>    < —fourth run ->

example:  A sequence S and its run decomposition computed by TimSort: for each run, the first two elements determine if it is increasing or decreasing, then it continues with the maximum number of consecutive elements that preserves the monotonicity.

Timsort is furthermore optimized to deal well with real-world data. Real-world data is not too much randomly distributed: it’s common to have sorted runs in the data to be sorted. Compared with a basic merge+insertion sort, timsort attempts to detect and make use of such runs.

Though this adds a small overhead of checking whether parts of the data are already sorted, with typical real-world data, this saves some actual sorting work which more than compensates.

 

Now let us understand how Timsort works (the in-depth algorithm)-

The Role of insertion sort:

This is most effective on smaller lists (i.e less data). It is quite slow at larger lists, but very fast with small lists.

Rule in timsort for using insertion sort: If our list has less than 64 elements in it, run insertion sort. Insertion sort has wonderful complexity (when we have a smaller list).

The gif above for an illustration of how Insertion sort works!

The “runs” of Timsort

If the list is larger than 64 elements, Timsort iterates over the data looking for natural runs of at least two elements that are either non-descending (each element is or strictly descending. The descending runs are later reversed, except equal elements in order to maintain the algorithm’s stability. A reference to each run is then pushed onto a stack.

Timsort calls these already sorted parts- runs.

There is another term called “minrun”

Minrun is a size that is determined based on the size of the array. Minrun is chosen from the range 32 to 64 inclusive, such that the size of the data, divided by minrun, is equal to, or slightly less than, a power of two.  The algorithm selects it so that most runs in a random array are, or become minrun, in length.

Timsort is adaptive, in nature it uses again insertion sort to combine runs smaller than the minimum run size (minrun), and merge sort otherwise. This is because merging is most efficient when the number of runs is equal to, or slightly less than, a power of two, and notably less efficient when the number of runs is slightly more than a power of two. Timsort chooses minrun to satisfy the former condition.

Timsort algorithm searches for minimum-size ordered sequences, min runs, to perform its sort

Now the merge part

Algorithm 1: TimSort.

Input: A sequence S to sort

Result: The sequence S is sorted into a single run, which remains on the stack.

Note: The function merge repeatedly pops the last two runs on the

stack R merges them and pushes the resulting run back on the stack.

1 runs <— a run decomposition of S

2 R  <— an empty stack

3 while runs !=0; do                                                                      // main loop of TimSort

4 remove a run r from runs and push r onto R

5 merge(R)

6 if height(R) != 1 then // the height of R is its number of runs

7 merge(R)

——————————————————————————————————–

Algorithm 2: The merge procedure.

Input: A stack of runs R

Result: The invariant of the Below Equations are established.

Note: The runs on the stack are denoted by R[1] . . .R[height(R)], from top to bottom.

The length of run R[i] is denoted by ri.

1 while height(R) > do

2 n  <—– height(R) − 2

3 if (n > 0 and r3 < = r2 + r1)  then

4 if r3 < r1 then

5 merge runs R[2] and R[3] on the stack

6 else merge runs R[1] and R[2] on the stack

7 else if r2 <= r1 then

8 merge runs R[1] and R[2] on the stack

9 else break

—————————————————————————————————

The runs are considered in the order given by the run decomposition and successively pushed onto a stack. If some conditions on the lengths of the topmost runs of the stack are not satisfied after a new run has been pushed, this can trigger a series of merges between pairs of runs at the top or right under. And at the end, when all the runs in the initial decomposition have been pushed, the last operation is to merge the remaining runs two by two, starting at the top of the stack, to get a sorted sequence. The conditions on the stack and the merging rules are implemented in the subroutine called merge ( in Algorithm 2)

Another strength of TimSort is the use of many effective heuristics to save time, such as ensuring that the initial runs are not too small by using a special technique called “galloping” to optimize the merges.

 

Now Let analyze Algorithm 2 which is a pseudo-code transcription of the merge procedure.===>>The  main  idea is that:

During merging we have to balance the run lengths as closely as possible while keeping a low bound on the number of runs we have to remember. To achieve this, the merging conditions of merge_collapse are designed to ensure that the following invariant  is true at the end of the procedure:

ri+2=|Z| > ri+1|Y| + ri |X| —>>>>(1)

ri+1 |Y|> ri.|X|—->>>(2)

 

This means that the runs lengths ri=X on the stack grow at least as fast as the Fibonacci numbers and, therefore, that the height of the stack stays logarithmic.

Timsort performs an almost in-place merge sort, as actual in-place merge sort implementations have a high overhead. First timsort performs a binary search to find the location in the first run of the first element in the second run, and the location in the second run of the last element in the first run. Elements before and after these locations are already in their correct place and may be removed from further consideration. This not only optimizes element movements and running time, but also allows the use of less temporary memory. Then the smaller of the remaining runs are copied into temporary memory, and elements are merged with the larger run, into the now free space. (if this is confusing then refer the example below)

Say, for example, two runs X and Y are to be merged, with X being the smaller run. In this case, a binary search examines X to find the first element xʹ larger than the first element of Y. Note that X and Y are already sorted individually. When xʹ is found, the algorithm can ignore elements before that position when merging. Similarly, the algorithm also looks for the first element yʹ in Y greater than the last element of X. The elements after yʹ can also be ignored when merging.

 

Fig: The runs are inserted in a stack. If |Z| ≤ |Y| + |X|, then X and Y are merged and replaced on the stack. In this way, merging is continued until all runs satisfy i)|Z| > |Y| + |X| and ii). |Y| > |X|.,where Z=ri+2, Y=ri+1 ,x=ri

 

Fig: To merge, timsort copies the elements of the smaller array (X in this illustration) to temporary memory, then sorts and fills elements in final order into the combined space of X and Y

 

Elements (pointed to by blue arrow) are compared and the smaller element is moved to its final position (pointed to by red arrow).

 

All red elements are smaller than blue (here, 21). Thus they can be moved in a chunk to the final array

 

Galloping mode:

In galloping mode, the algorithm searches for the first element of the former array into latter array by comparing that initial element of former with the (2 ^J − 1)th element of the latter (J=3,5,7,..) just to get the range for the initial element of former to lie. This reduces binary searching. In cases where galloping is found to be less efficient than binary search, galloping mode is exited.

This first initial threshold value for galloping  =7 (in java). To avoid the drawbacks of galloping mode, the merging functions adjust this threshold value. If the selected element is from the same array that returned an element previously, this threshold value is reduced by one else incremented by one, thus discouraging or we can say punishing to return to galloping mode.

In this way by optimistic use of insertion, merging, binary search; timsort reduces the time complexity of sorting data of the real world having some sorted chunks. Its a type of adaptive sort. If you are interested in knowing more about adaptive sort or timsort then go to the link below.

 

REFERENCES:

https://en.wikipedia.org/wiki/Timsort

https://www.geeksforgeeks.org/timsort/

https://www.youtube.com/watch?v=jVXsjswWo44

https://en.wikipedia.org/wiki/Adaptive_sort

 

IMAGES:

https://en.wikipedia.org/wiki/Timsort

 

3 thoughts on “Timsort — “The Fastest Sorting Algorithm””

  1. I came up with a single pass, O(n) algorithm called StalinSort. You iterate through the list, and any element out of order is eliminated. In the end, you’re left with a sorted list.

Leave a Reply

Your email address will not be published. Required fields are marked *