ALGORITHM

Sorting Algorithm
Sorting 分类：稳定排序和不稳定排序
Quick Sort
Merge Sort - Divide and Conquer
Binary Search
Insertion Sort
Bubble Sort
Graph Algorithm
BFS 广度优先搜索
DFS 深度优先搜索
Dijkstra - 最短路径算法 - TODO
二叉树遍历算法 - 基本思想：递归
最小生成树算法
Dynamic Programming
基本思想
经典动态规划问题总结
KMP算法

Sorting Algorithm

Sorting 分类：稳定排序和不稳定排序

稳定排序：排序前后两个相等的数相对位置不变，则算法稳定。
e.g. 插入排序，冒泡排序，归并排序。
不稳定排序：排序前后两个相等的数相对位置发生了变化，则算法不稳定。
e.g. 希尔排序、直接选择排序、堆排序、快速排序

Quick Sort

与merge sort一样，快速排序也是一种divide and conquer algorithm。它选择一个元素作为pivot，并围绕所选pivot对给定数组进行partition。有以下选择pivot的方式：
- 始终选择第一个元素作为pivot。
- 始终选择最后一个元素作为pivot。
- 选择一个随机元素作为pivot。
- 选择中位数作为pivot。
quickSort 中的关键过程是 partition()。partition的目标是，给定一个数组和数组的一个元素 x 作为 pivot，将 x 放在排序数组中的正确位置并将所有较小的元素（小于 x）放在 x 之前，并将所有更大的元素（大于比 x) 在 x 之后。所有这些都应该在线性时间内完成。
- 一种partition的逻辑为，从最左边的元素开始，跟踪较小（或等于）元素的索引为 i。在遍历时，如果我们找到一个更小的元素，我们将当前元素与 arr[i] 交换。否则，我们忽略当前元素。

Pseudo Code

Quicksort(A, left, right)
if |A| = 0 then return //A is empty 
end if
split = Partition(A, left, right) 
Quicksort(A, left, split − 1)
Quicksort(A, split + 1, right)

Initial call: Quicksort(A, 1, n)

Partition(A, left, right)
	pivot = A[right]
	split = left − 1

	for j=left to right−1 do
		if A[j] ≤ pivot then 
			swap(A[j], A[split + 1]) 
			split = split + 1
		end if 
	end for

	swap(pivot, A[split + 1])        //place pivot after A[split] (why?) 
	return split + 1                 //the final position of pivot

Implementation by 选择最后一个元素作为pivot

# Python3 implementation of QuickSort
# Function to find the partition position
def partition(array, low, high):

	# Choose the rightmost element as pivot
	pivot = array[high]

	# Pointer for greater element
	i = low - 1

	# Traverse through all elements
	# compare each element with pivot
	for j in range(low, high):
		if array[j] <= pivot:
			# If element smaller than pivot is found
			# swap it with the greater element pointed by i
			i = i + 1

			# Swapping element at i with element at j
			(array[i], array[j]) = (array[j], array[i])

	# Swap the pivot element with
	# e greater element specified by i
	(array[i + 1], array[high]) = (array[high], array[i + 1])

	# Return the position from where partition is done
	return i + 1

# Function to perform quicksort


def quick_sort(array, low, high):
	if low < high:

		# Find pivot element such that
		# element smaller than pivot are on the left
		# element greater than pivot are on the right
		pi = partition(array, low, high)

		# Recursive call on the left of pivot
		quick_sort(array, low, pi - 1)

		# Recursive call on the right of pivot
		quick_sort(array, pi + 1, high)

Time complexity - O(n*logn)

Merge Sort - Divide and Conquer

Merge sort 的工作原理是将数组划分为更小的子数组，对每个子数组进行排序，然后将已排序的子数组合并回一起以形成最终的排序数组。
可以把它想象成一种递归算法，不断地将数组分成两半，直到无法进一步分割为止。

Pseudo Code

step 1: start

step 2: declare array and left, right, mid variable

step 3: perform merge function.
    if left == right
        return
    mid= (left+right)/2
    mergesort(array, left, mid)
    mergesort(array, mid+1, right)
    merge(array, left, mid, right)

step 4: Stop

merge (A, left, right, mid)
	L = A[left, mid]
	R = A[mid + 1, right]
	Maintain two pointers p_L, p_R, initialized to point to the first
	elements of L, R, respectively
	while both lists are nonempty do
		Let x, y be the elements pointed to by pL, pR
		Compare x, y and append the smaller to the output
		Advance the pointer in the list with the smaller of x, y
	end while
	Append the remainder of the non-empty list to the output.

Remark: the output is stored directly in A[left, right], thus the
subarray A[left, right] is sorted after merge(A, left, right, mid).

Implementation

# Python program for implementation of MergeSort
def mergeSort(arr):
	if len(arr) > 1:

		# Finding the mid of the array
		mid = len(arr)//2

		# Dividing the array elements
		L = arr[:mid]

		# into 2 halves
		R = arr[mid:]

		# Sorting the first half
		mergeSort(L)

		# Sorting the second half
		mergeSort(R)

		i = j = k = 0

		# Copy data to temp arrays L[] and R[]
		while i < len(L) and j < len(R):
			if L[i] <= R[j]:
				arr[k] = L[i]
				i += 1
			else:
				arr[k] = R[j]
				j += 1
			k += 1

		# Checking if any element was left
		while i < len(L):
			arr[k] = L[i]
			i += 1
			k += 1

		while j < len(R):
			arr[k] = R[j]
			j += 1
			k += 1

Time complexity
$T(n) = 2T(n/2) + cn$ - $O(nlogn)$ by Master Theorem

Binary Search

Given a sorted array arr[] of n elements, write a function to search a given element x in arr[] and return the index of x in the array.
Algorithm
1. 将array升序排列。
2. 将第一个元素设为low index，最后一个元素设为high index。
3. 将middle index设为 (high + low) / 2
4. 如果middle index上的元素就是target，则返回middle。
5. 如果target小于middle index上的元素，则将 high 设为 middle - 1。
6. 如果target大于middle index上的元素，则将 low 设为 middle + 1。
7. 重复3-6直至找到target或者no such target found when no more element。

Recursive Method

# Python3 Program for recursive binary search.

# Returns index of x in arr if present, else -1

def binarySearch(arr, l, r, x):

	# Check base case
	if r >= l:

		mid = l + (r - l) // 2

		# If element is present at the middle itself
		if arr[mid] == x:
			return mid

		# If element is smaller than mid, then it
		# can only be present in left subarray
		elif arr[mid] > x:
			return binarySearch(arr, l, mid-1, x)

		# Else the element can only be present
		# in right subarray
		else:
			return binarySearch(arr, mid + 1, r, x)

	else:
		# Element is not present in the array
		return -1

Iteration Method

def binarySearch(v, To_Find):
	lo = 0
	hi = len(v) - 1

	# This below check covers all cases, so need to check
	# for mid=lo-(hi-lo)/2
	while hi - lo > 1:
		mid = (hi + lo) // 2
		if v[mid] < To_Find:
			lo = mid + 1
		else:
			hi = mid

	if v[lo] == To_Find:
		print("Found At Index", lo)
	elif v[hi] == To_Find:
		print("Found At Index", hi)
	else:
		print("Not Found")

Running Time
The time complexity of the binary search algorithm is O(log n). The best-case time complexity would be O(1) when the central index would directly match the desired value. Binary search worst case differs from that. The worst-case scenario could be the values at either extremity of the list or values not in the list.

Insertion Sort

过程

该算法通过重复从数组的未排序部分中取出一个元素并将其插入数组已排序部分中的正确位置来对项目数组进行排序。

Input: Array A
n = length(A)
1. 外部 for loop 从索引“1”开始并运行“n-1”次迭代，其中“n”是数组的长度。
2. 内部 while loop从外部 for 循环的当前索引 i 开始，
	 并将每个元素与其左邻居进行比较。 如果一个元素小于它的左邻居，则交换元素。
3. 内部 while loop 继续将元素向左移动，直到它小于其左侧的元素停止while loop。
4. 内部 while loop 完成后，当前索引"j"处的元素即为数组已排序部分中的正确位置。
5. 外层的 for 循环继续遍历数组，直到所有元素都在正确的位置并且数组已完全排序。

insertionSort(A: list of sortable items)
   n = length(A)
   for i = 1 to n - 1 do
       j = i
       while j > 0 and A[j-1] > A[j] do
           swap(A[j], A[j-1])
           j = j - 1
       end while
   end for
end algorithm

Time complexity - O(N^2)

Bubble Sort

比较相邻的元素，如果第一个比第二个大就进行交换，对每一对相邻元素做同样的工作。排序算法稳定，时间复杂度 O(n2)，空间复杂度 O(1)。

Graph Algorithm

BFS 广度优先搜索

Breadth-first search (BFS): explore G starting from s outward in all possible directions, adding reachable nodes one layer at a time.
- First add all nodes that are joined by an edge to the source: these nodes form the first layer. If G is unweighted, these are the nodes at distance 1 from s.
- Then add all nodes that are joined by an edge to a node in the first layer: these nodes form the second layer.
- If G is unweighted, these are the nodes at distance 2 from s.
- And so on and so forth.
Facts of BFS
- $L_i$ is the set of nodes that are at distance $i$ from s. Equivalently, the length of the shortest s-v path for all $v ∈ L_i$ equals $i$ .

Pseudo Code with explanation

BFS(G = (V, E), s ∈ V )
	# discovered array表示一个vertex是否已被访问过
	# 如果被访问过则记为1，没有则记为0
	array discovered[V] initialized all vertices to 0 ---- O(n)

	# The dist array记录了source到每一个vertex的最短路径
	# 除了source自身，所有vertex的初始路径都为无穷大
	array dist[V] initialized to infinity array ---- O(n)

	# The parent array记录了每一个vertex的parent
	parent[V] initialized to NIL ---- O(n)

	# q记录正在被处理或者当前被访问的vertices
	queue q ---- O(1)

	# Initialize with source ---- O(1)
	discovered[s] = 1
	dist[s] = 0
	parent[s] = NIL

	# 因为source是第一个visited的vertex, 将当前要处理的vertex s加入queue
	enqueue(q, s) ---- O(1)

	# 当当前需要处理的queue不为空时
	while size(q)>0 do
	# 取出第一个vertex 'u'并检索所有从'u'延伸出的edges(u, v)
	u =dequeue(q)
		# 对于每一个与'u'相连的vertex 'v'
		for (u, v) ∈ E do
		# 如果'v'还未被访问过，即discovered[v] == 0
			if discovered[v] == 0 then 
				# 则标记'v'为已访问过，即discovered[v] == 1
				discovered[v] = 1
				# 对于unweighted graph，source到'v'的距离 = source到'u'的距离 + 1
				dist[v] = dist[u] + 1
				# 更新当前vertex 'v'的parent为'u'
				parent[v] = u
				# 把'v'添加进q作为下一个layer要检索的vertex
				enqueue(q, v)
				# 如果vertex'v'已被访问过，则进入下一个for loop检索未被访问的'v'
			end if 
		end for
	end while

Time complexity analysis
前面初始化部分的时间复杂度为O(n)，因为只需要遍历每个节点一次。对于while循环来讲，q的最大数量就是vertices的总数，所以while loop不会超过|V|次迭代，即O(|V|)。对于for loop，检索每一个和’u’临近的节点v，显然不会超过|E|次迭代，这一步的时间实际上与’u’的degrees成比例。因此，检查图中与所有顶点相邻的所有边所花费的总时间与所有顶点的degrees之和成正比，即 2|E| (因为每条边都对恰好两个顶点)。
综上所述，BFS的时间复杂度为O(|V| + |E|)。

Implementation - without tracking the distance and parent

# Python3 Program to print BFS traversal
# from a given source vertex. BFS(int s)
# traverses vertices reachable from s.
from collections import defaultdict

# This class represents a directed graph
# using adjacency list representation


class Graph:

	# Constructor
	def __init__(self):

		# default dictionary to store graph
		self.graph = defaultdict(list)

	# function to add an edge to graph
	def addEdge(self, u, v):
		self.graph[u].append(v)

	# Function to print a BFS of graph
	def BFS(self, s):

		# Mark all the vertices as not visited
		visited = [False] * (max(self.graph) + 1)

		# Create a queue for BFS
		queue = []

		# Mark the source node as
		# visited and enqueue it
		queue.append(s)
		visited[s] = True

		while queue:

			# Dequeue a vertex from
			# queue and print it
			s = queue.pop(0)
			print(s, end=" ")

			# Get all adjacent vertices of the
			# dequeued vertex s. If a adjacent
			# has not been visited, then mark it
			# visited and enqueue it
			for i in self.graph[s]:
				if visited[i] == False:
					queue.append(i)
					visited[i] = True

应用

1. A natural way to solve 2-colorability(bipartite-ness)

If a graph contains an odd-length cycle, then it is not 2-colorable.

bipartite(G = (V,E))
	1. Start BFS from any vertex; color it red.
	2. Color white all nodes in the first layer L1 of the BFS tree.
	If there is an edge between two nodes in L1, output no and stop.
	3. Otherwise, continue from layer L1, coloring red the vertices
	in even layers and white in odd layers.
	4. If BFS terminates and all nodes in V have been explored 
	(hence 2-colored), output yes.

2. 找到所有的SCC

AllConnectedComponents(G = (V, E))
	1. Start with an arbitrary node s; run BFS(G, s) and output
	the resulting BFS tree as one connected component.
	2. Continue with any node u that has not been visited by
	BFS(G, s); run BFS from u and output the resulting BFS
	tree as one connected component.
	3. Repeat until all nodes in V have been visited.

DFS 深度优先搜索

Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.
Basic idea: Start from the root or any arbitrary node and mark the node and move to the adjacent unmarked node and continue this loop until there is no unmarked adjacent node. Then backtrack and check for other unmarked nodes and traverse them. Finally, print the nodes in the path.

Algorithm with explanation

# array explored来记录一个vertex是否已被访问
DFS(G = (V, E)) 
	# Initialize the global variable time to 0
	time = 0
	# 对于每一个V中的vertex，标记每一个vertex的初始状态为未被访问
	for u ∈ V do
		explored[u] = 0 
	end for
	
	# 对于每一个V中的vertex u
	for u ∈ V do
		# 如果'u'尚未被访问，则call检索function search(u)
		if explored[u] == 0 then 
			Search(u) 
		end if
	end for

Search(u) 
	previsit(u)
	#更改'u'的标记为已被访问
	explored[u] = 1 
	# 对于每一个与'u'相连的vertex 'v'
	for (u, v) ∈ E do
		# 如果'v'未被访问，则call search(v)
		if explored[v] == 0 then 
			Search(v)
		end if 
	end for
	postvisit(u)

# start(u) is the time when u is pushed in the stack
# finish(u) is the time when u is popped from the stack

previsit(u)
	time += 1
	start(u) = time
	push(stack, u)

postvisit(u)
	time += 1
	finish(u) = time
	pop(stack, u)

Here is a flow chart of how this algorithm works.

Time complexity analysis
The time complexity of the DFS algorithm is O(|V| + |E|), where |V| is the number of vertices in the graph, and |E| is the number of edges in the graph. The initialization of the 'explored' array takes O(|V|) time, and the Search function is called for each vertex in the graph, which takes O(|V|) time. The total time taken to examine all edges in the graph is proportional to the sum of the degrees of all vertices, which is 2|E|. Therefore, the time complexity of examining all edges in the graph is O(|E|). Overall, the time complexity of the DFS algorithm is O(|V| + |E|).

Implementation

# Python3 program to print DFS traversal
# from a given graph
from collections import defaultdict

# This class represents a directed graph using
# adjacency list representation


class Graph:

	# Constructor
	def __init__(self):

		# default dictionary to store graph
		self.graph = defaultdict(list)

	# function to add an edge to graph
	def addEdge(self, u, v):
		self.graph[u].append(v)

	# A function used by DFS
	def DFSUtil(self, v, visited):

		# Mark the current node as visited
		# and print it
		visited.add(v)
		print(v, end=' ')

		# Recur for all the vertices
		# adjacent to this vertex
		for neighbour in self.graph[v]:
			if neighbour not in visited:
				self.DFSUtil(neighbour, visited)

	# The function to do DFS traversal. It uses
	# recursive DFSUtil()
	def DFS(self, v):

		# Create a set to store visited vertices
		visited = set()

		# Call the recursive helper function
		# to print DFS traversal
		self.DFSUtil(v, visited)

应用
- 1. 检测cycle: G = (V, E) has a cycle if and only if DFS(G) yields a back edge.
- 2. Topological sorting in DAGs (DAG - Directed acyclic graph)

Dijkstra - 最短路径算法 - TODO

Given a graph and a source vertex in the graph, find the shortest paths from the source to all vertices in the given graph.

Pseudocode with an explanation - Greedy Algorithm 贪心算法

Dijkstra-v1(G = (V, E, w), s ∈ V) 

	Initialize(G, s)

	# Initialize the set S with only source vertex s.
	S = {s}

	# This loop continues until S contains all vertices in the graph.
	while S != V do
		# This loop iterates over all vertices x 
		# that are not in S and have at least one edge from S.
		For every x ∈ V − S with at least one edge from S compute
			d(x) = min {dist[u] + w_{ux}}
					u ∈ S,(u,x) ∈ E
		Select v such that d(v) = min d(x)
														x ∈ V−S
		S = S ∪ {v} 
		dist[v] = d(v) 
		prev[v] = u
	end while

# This function initializes the distances and predecessors of all vertices in the graph. 
# It sets the distance of the source vertex s to 0 and the distance of all other vertices to infinity.
Initialize(G, s) 
	for v ∈ V do
		dist[v] = ∞
		prev[v] = NIL 
	end for
	dist[s] = 0

二叉树遍历算法 - 基本思想：递归

前序遍历 - Preorder - 根结点 ---> 左子树 ---> 右子树

def preorderTraversal(root):
    if root != None:
        # Visit the root node
        print(root.val)
        # Recursively traverse the left subtree
        preorderTraversal(root.left)
        # Recursively traverse the right subtree
        preorderTraversal(root.right)

中序遍历 - Inorder - 左子树 ---> 根结点 ---> 右子树

def inorderTraversal(root):
    if root != None:
        # Recursively traverse the left subtree
        inorderTraversal(root.left)
        # Visit the root node
        print(root.val)
        # Recursively traverse the right subtree
        inorderTraversal(root.right)

后序遍历 - Postorder - 左子树 ---> 右子树 ---> 根结点

def postorderTraversal(root):
    if root != None:
        # Recursively traverse the left subtree
        postorderTraversal(root.left)
        # Recursively traverse the right subtree
        postorderTraversal(root.right)
        # Visit the root node
        print(root.val)

最小生成树算法

最小权重生成树（minimum weight spanning tree）的简称。对于有 n 个结点的原图，生成原图的极小连通子图，使其包含原图中的所有 n 个结点，并且有保持图连通的最少的边，或者权重和最小的边。
普里姆算法：取图中任意一个顶点 v 作为生成树的根，之后往生成树上添加新的顶点 w。在添加的顶点 w 和已经在生成树上的顶点v 之间必定存在一条边，并且该边的权值在所有连通顶点 v 和 w 之间的边中取值最小。之后继续往生成树上添加顶点，直至生成树上含有 n-1 个顶点为止。
克鲁斯卡尔算法：先构造一个只含 n 个顶点的子图 SG，然后从权值最小的边开始，若它的添加不使 SG 中产生回路，则在 SG 上加上这条边，如此重复，直至加上 n-1 条边为止。

Dynamic Programming

基本思想

讲一个问题拆解成子问题，直到子问题可以解决，把子问题的答案保存起来，从而减少重复计算。随后根据子问题的答案进行反推，得到原问题解。

经典动态规划问题总结

KMP算法

前缀：不包含尾字符的所有子串

后缀：不包含首字符的所有子串

KMP：找到最长的相等前后缀

前缀表

e.g. aabaaf

例题：匹配串：aabaabaaf；模式串：aabaaf

从匹配串的首开始匹配，到模式串中的f出现时，与匹配串中的b不匹配，那么不需要一个一个向后移动匹配串中的指针，而是直接跳到模式串中最长相等前缀的下一个位置继续匹配，此题中相当于index=2，即b字符的位置。