Algorithmic efficiency and Big O Notation

How to compare the efficiency of algorithms?

Let’s imagine you just wrote a new search algorithm. How would you measure its performance? How would you compare it with other already existing algorithms? One of the most important elements of every algorithm is its time complexity. Measuring time complexity allows us to predict how long it takes for an algorithm to complete its execution and this is crucial for every algorithm that has changeable input. To be able to compare different algorithms we use asymptotic notation. Big O, Big Ω and Big Θ are all examples of asymptotic notation, they belong to the family of Bachmann-Landua notations and they can precisely describe the complexity of algorithms. Although all three previously mentioned notations are accurate ways of describing algorithms, software developers tend to use only Big O notation.

There is also a difference between what academics and non-academics mean by Big O notation. In the academic environment Big O puts an upper bound on the algorithm. This means that an algorithm with an average time complexity of n can be described by Big O notation as $$O(n^2)$$ as the algorithm is not slower than $$n^2$$ (and it’s also not slower than $$n log n$$). Software developers however, tend to use the tightest bound possible when it comes to Big O notation. When your fellow programmer tells you that his algorithm has O(n) complexity it means that on average it executes in linear time not in logarithmic or constant time. In the academic setup that tight bound is described by Big Θ.

I know it’s a little bit confusing, but for some reason software developers use Big O as if it was Big Θ. It might be too late to change it now so I think we should just accept this discrepancy. It’s also worth mentioning that very often programmers just drop the “O” and just say that, for example, their algorithm has $$n^2$$ complexity (in most cases they just mean $$O(n^2)$$).

What is Big O notation?

As we mentioned before Big O is a tech industry standard for describing the complexity of algorithms. Big O notation can be used to describe both time and space complexity. In this article we’ll focus on time complexity, but it’s also important to understand space complexity. Time complexity describes how the runtime scales regarding the input parameters. On the other hand, space complexity describes how memory scales regarding the input parameter. It is useful to determine how much memory is needed for running our algorithms smoothly.

In Big O notation, the complexity of algorithms is described using well known mathematical functions. The most common ones are constant, logarithmic, linear, quadratic, exponential and factorial functions. Big O simplifies more complicated functions to show the growth rate of a function rather than every detail of it.

Average, best and worst cases

Big O notation can be used to describe best, worst and average case. In most cases, we just focus on the average and worst scenario as best execution is not that relevant. Let’s use bubble sort as an example. As we know bubble sort is a simple sorting algorithm which compares each pair of adjacent items and swaps them if they are in the wrong order. We can easily imagine a situation when our input array is already sorted. In that situation, our bubble sort goes through our input array only once as there are no swaps required. It means that it’s best time complexity is O(n). What’s the probability of this scenario? From basic combinatorics we can calculate that it’s $$1/n!$$ If we have an array with just 10 random elements our chances are $$1/3628800$$. That’s why we almost never focus on the best performance. What matters in most cases is the average and worst-case performance. Although the worst case might be also very unlikely, it’s important to be aware of it as it might degenerate our algorithm and cause huge trouble (going from O(log n) to O(n) for big enough input may freeze the whole system for a while). This is exactly why we focus mostly on the average and worst cases and not on the best one.

Now let’s look at common ways of simplifying Big O notation.

Dropping constants

When using Big O and other asymptotic notations it’s standard practice to drop all the constants. What’s the reason for this? Let’s have a look at an example. Let’s say our algorithm works in $$3n^2$$ time complexity. How does our constant “3” influence our algorithm? Let’s have a look at the table below.

nNumber of Operations for 3n^2number of operations for n^2
131
10300100
1003000010000

As we can see the constant can change the final result. However, we are interested in the rate of increase and the major factor for that increase is $$n^2$$, not the constant. To simplify Big O we omit constants.

Dropping non-dominant terms

Dropping non-dominant terms has exactly the same purpose as dropping constants. It simplifies our function and focuses on what is important which is the rate of increase. Again, let’s have a look at a quick example: $$n^2 + n$$

nnumber of operations for n^2+nnumber of operations for n^2
121
10110100
1001010010000

As we can see when input grows our non-dominant n term means less and less and becomes irrelevant. That’s why we can easily drop it.

Common pitfalls

Different input variables

Let’s take a function which accepts two different lists and for every element in the first list it iterates through every element of the second list.

Some developers might be tempted to classify this as $$O(n^2)$$ which is not true. Although we iterate through the internal array the input lists are not of the same size thus the complexity is O(ab) where a is the length of the first list and b is the length of the second one. This is a very common mistake while calculating algorithm’s performance and we need to be able to avoid it.

When to add and when to multiply

This is going to be our last rule for calculating Big O. Let’s imagine we have two loops in our algorithm. When should we add and when should we multiply their complexity? The rule is quite simple. If your algorithm processes one loop and goes to the other one while the processing of the first one is finished we add. If for every element in the first loop the algorithm goes to every single element of the second loop we multiply. Of course adding doesn’t really bring any value as we’re going to drop constants after that, so if we have 2 loops going through n elements our complexity is still O(n).

Different complexities with examples

Let’s have a look at the chart with all of our common functions plotted.

As we can see the difference between every single function is huge. Let’s have a closer look at each complexity.

O(1) – constant

O(1) means that complexity does not depend on the size of the input. No matter how big our input is it never changes the speed of our algorithm. On the chart, it’s represented as a horizontal line.

nNumber of Operations
15
105
1005

Although sometimes our constant might be quite big it’s still a constant and it doesn’t grow when the input changes.

Examples:

• Searching for an element in a HashMap (although it can degenerate to O(n) if hashes are not efficient)
• Adding two integers

O(log n) – logarithmic

O(log n) algorithms grow in a logarithmic time regarding the input. What’s the base of the logarithm? The answer is that it’s not really relevant as logarithms with different bases are different by a constant factor that we drop in Big O notation anyway. O(log n) logarithms are very often based on halving problems to find a result.

nNumber of Operations
10
10≈3.32
100≈6.64

Examples:

• Binary search
• Insertion, deletion and search in a binary tree (although it may degenerate to O(n))

O(n) – linear

O(n) algorithms have linear growth rate which means their complexity is strictly related to the input. Let’s have a look at the table below.

nNumber of Operations
11
1010
100100

Examples:

• finding an element in an array by iterating through the whole array
• printing all the elements of an array

O(n*log n) – quasilinear

O(n*log n) is a very common complexity of algorithms. It’s especially common amongst efficient sorting algorithms.

nNumber of Operations
10
10≈33.22
100≈664.39

Examples:

• the average case of Quick Sort
• fast Fourier transform

$$O(n^2)$$ – quadratic

$$O(n^2)$$ algorithms grow in a quadratic fashion in the relation to the input. This is where our algorithms start to become very inefficient and it is a very common practice to try to reduce quadratic complexity to O(n log n) if possible.

Let’s have a look at another table. We can easily see that the rate of growth is quite fast.

nNumber of Operations
11
10100
10010000

Examples:

• Bubble Sort
• Insertion Sort

$$O(2^n)$$ – exponential time

This is where our algorithms become really slow. Unfortunately, there is a set of problems that cannot be sped up any further and they end up having exponential time complexity.

nNumber of Operations
12
101024
1001.267651e+30

Examples:

• solving the travelling salesman problem using dynamic programming

O(n!) – factorial

The worst of them all. These types of algorithms are incredibly slow. Let’s have a look at the main chart. The orange line shoots up dramatically.

nNumber of Operations
11
103628800
1009.332622e+157

Example:

• solving travelling salesman problem by the brute force method
• generating all the possible permutations of a list

Why is it important to know all of this?

It’s all about being able to predict how well your algorithm performs when the input scales. When it comes to small numbers it may not really matter. Your bubble sort may work exactly the same as merge sort (or even faster), but when it comes to the big numbers,  complexity plays a huge role. Imagine that your algorithm accepts an array as its input. Let’s also imagine that the same algorithm processes every single element of that input array in one millisecond. Let’s pass an array with 1 million elements to our algorithm and let’s assume we just wrote 2 versions of that algorithm: one which works in O(log n) time and another one which works in O(n) time. This is how long it takes for 2 versions of our algorithm to complete its execution.

O(n): n=1 000 000 ms = 1000 seconds = 16 minutes

O(log n): log(1 000 000ms) = log(1000 seconds) = 9.96 seconds

The numbers speak for themselves. There is nothing really to add here. Let me just quote a famous proverb “time is money”. It may cost our company a fortune if we decide to use inefficient algorithms.

Summary

As you can see being able to determine the speed of an algorithm is a crucial skill for a software developer (and any other technologist too). There is a huge performance difference between particular algorithms and we always need to be aware of that. I hope I convinced you in this article that the complexity of algorithms and Big O notation matter and it’s extremely important to understand them.

What are robo-advisors?

What are robo-advisors? Have you ever come across a term robo-advisor? Have you ever wondering what that is? The first thing that […]