Data structures and algorithms is a very important topic in computer science, it studies various algorithms and their performance, teaching programmers how to develop fast and efficient algorithms.
The book I have been reading to learn about this topic, which I highly recommend, is "Grokking Algorithms". The book smoothly covers the fundamentals in a simple manner. Even though the topic is usually mathematically dense, the book does not require deep mathematical knowledge and allows anyone to grasp the core concepts.
These are my personal notes about the book so that I have all the concept by hand when I need them. I hope that you can find them useful as well, in case you have any suggestions or corrections please contact me.
An algorithm is a precise set of actions to follow to accomplish a task. Algorithms are everywhere in our everyday life, a simple example of an algorithm colud be a recipe to cook a certain dish. In the context of computer science, algorithms are a set of instructions that a computer needs to follow to solve a problem or accoplish a task. We instruct the computer to perform a set of steps by translating the algorithm into a program.
When we face a problem we can find multiple ways to solve it, and most of the time, one way is better than the other. Therefore, we can think of multiple algorithms to accomplish the same task, but their performance can vary significantly.
The usual way to keep track of an algorithm performance is by checking the maximum number of steps in which an algorithm solves the problem relative to the size of the problem we are facing. For example, the simple search algorithm, which is linear, in the worst-case scenario takes n steps to find the element inside of an array of n elements. Therefore, its performance is of order n, which in big O notation is written as O(n).
The big O notation essentially means "of order" and it describes how the number of steps required to solve the problem grow with respect to the size of the problem n. This notation helps us understand how the problem scales regardless of the specific details of the problem.
Some recurrent types of performance in terms of big O notation are: - O(n): linear (ex. simple search) - O(log(n)): logarithmic (ex. binary search) - O(n^2): quadratic (ex. selection sort) - O(n *log(n)) (ex. quicksort) -_O(n!): factorial (ex. traveling salesperson)
Now, let's look at a specific algorithm: binary search. This algorithm is fast and solves the problem of searching for an item in an ordered list. In our case we will think about looking for a person inside of the phonebook.
To appreciate the power of binary search let's first consider simple search. When searching for a person in a phonebook using simple search, one would start turning the pages one at a time from the beginning until finding the name. In this case the algorithm's performance is O(n) since, in the worst case scenario, every page of the book needs to be checked. Thus, as the pages of the phonebook grow, the steps needed to perform the search grows linearly.
However, simple search is not very efficient. In fact, at least intuitively, when looking for a specific name in the phonebook, one would open it around the middle and move forward or backward based on the starting letter.
A more efficient algorithm is binary search. To simplify the explanation we will imagine ripping in half the phonebook at each step and keeping only the half the name could be into. In this case the "recipe" for this algorithm would be ripping the phonebook in half over and over again until finding the name or determining it's not present.
In this case the performance is much better, in fact, if you double the phonebook pages, the number of additional steps required to solve the probelm is just one. Translating this consideration in big O notation the algorithm performance is of logarithmic order O(log(n)).
Here is an example of a binary search Python function that looks for a number in an ordered list of integers.
def binary_search(number, mylist):
iterations = 0
start = 0
end = len(mylist) - 1
while start <= end:
iterations += 1
mid = (start + end) // 2
if mylist[mid] == number:
return [number, iterations]
elif mylist[mid] < number:
start = mid + 1
else:
end = mid - 1
return None
Let's now explore an algorithm that performs bad. The problem to solve in this case is the following: a salesman needs to visit
a certain number n of cities to sell his goods. To save time and fuel he would like to know the order in which he should visit the cities in order to travel
the least amount of kilometers possible. To find the best route he needs to compare all the routes made of all the possible ordered
combinations between the cities, which is n!.
Thus the performance of the algorithm solving this problem is O(n!) which is really bad and this is the best algorithm known up to now to
solve the problem, unless approximate solutions are used.