List
Note: This is the first post in this blog.
I will investigate basic data structures and algorithms to understand how they can be implemented and how the implementation affect their performance. The contents is mainly based on Data Structures and Algorithm Analysis (https://people.cs.vt.edu/shaffer/Book/). In the book the implementations are written in Java, however, I will use Scala instead. Also, my own analyses are added to the contennts. Enough talking, let's begin.
The list is arguably the most fundamental data structure in computer science. It is just a simple sequence that contain multiple data. However, there are more than one implementation for it. The space complexity of a list and the time complexities of the operations on it is dependent on the implementation. Put implementation aside, we can define the operations we want a list to perform. Let's call it ADT (abstract data types). ADT consists of operations the data structure must perform however they are implemented. It can have the smallest number of operations so that combining the operations can do more complicated operations. First, let's consider it for the list.
List "barebone" ADT
trait BareBoneListADT[E] {
def append(item: E): Unit
def getValue: Option[E]
}
However, straightforward implementation of this ADT cause performance issue for some of the operations. I will detail it later. Instead, we can come up with ADT with more operation to perform better for the real world requirements.
List "realistic" ADT
trait ListADT[E] {
def clear: Unit
def insert(item: E): Unit
def append(item: E): Unit
def remove: Option[E]
def moveToStart: Unit
def moveToEnd: Unit
def prev: Unit
def next: Unit
def length: Int
def currPos: Int
def moveToPos(pos: Int): Unit
def getValue: Option[E]
}
Note that We have done important design decision in this ADT that we make it hold the current position in a list. There are two distinctive ways we can implement a list. One is called array-based list, and the other is called linked list.
Array-based list implementation
import scala.reflect.ClassTag
class AList[E: ClassTag](size: Int = 10) extends ListADT[E] {
private val maxSize: Int = size
private var listSize: Int = 0
private var curr: Int = 0
private val listArray: Array[E] = new Array[E](maxSize)
def clear: Unit = {
listSize = 0
curr = 0
}
def insert(item: E): Unit = {
for (i <- listSize until curr by -1) listArray(i) = listArray(i - 1)
listArray(curr) = item
listSize += 1
}
def append(item: E): Unit = {
listSize += 1
listArray(listSize) = item
}
def remove: Option[E] = {
curr match {
case _ if curr < 0 || curr >= listSize => None
case _ =>
val item = listArray(curr)
for (i <- curr until listSize - 1) listArray(i) = listArray(i + 1)
listSize -= 1
Some(item)
}
}
def moveToStart: Unit = curr = 0
def moveToEnd: Unit = curr = listSize
def prev: Unit = if (curr != 0) curr -= 1
def next: Unit = if (curr < listSize) curr += 1
def length: Int = listSize
def currPos: Int = curr
def moveToPos(pos: Int): Unit = {
assert(pos >= 0 && pos <= listSize)
curr = pos
}
def getValue: Option[E] = {
// use pattern match to return None if curr is out of range
curr match {
case _ if curr >= 0 && curr < listSize => Some(listArray(curr))
case outOfRange => None
}
}
}
Linked list implementation
class Link[E](item: Option[E], nextVal: Option[Link[E]]) {
private var linkElement: Option[E] = item
private var nextLink: Option[Link[E]] = nextVal
def next: Option[Link[E]] = nextLink
def setNext(nextVal: Option[Link[E]]): Unit = nextLink = nextVal
def element: Option[E] = linkElement
def setElement(item: Option[E]): Unit = linkElement = item
}
class LList[E](size: Int = 0) extends ListADT[E] {
// ignore the input size which is there for the consistency with AList
private var head: Link[E] = new Link[E](None, None)
private var tail: Link[E] = head
private var curr: Link[E] = head
private var listSize: Int = 0
def clear: Unit = {
head.setNext(None)
head = new Link[E](None, None)
curr = head
tail = head
listSize = 0
}
def insert(item: E): Unit = {
curr.setNext(Some(new Link[E](Some(item), curr.next)))
if (tail == curr) tail = curr.next.get
listSize += 1
}
def append(item: E): Unit = {
tail.setNext(Some(new Link[E](Some(item), None)))
tail = tail.next.get
listSize += 1
}
def remove: Option[E] = {
curr.next match {
case None => None
case Some(link) =>
val optionItem = link.element
if (tail == link) tail = curr
curr.setNext(link.next)
listSize -= 1
optionItem
}
}
def moveToStart: Unit = curr = head
def moveToEnd: Unit = curr = tail
def prev: Unit = {
if (curr == head) return
var temp: Link[E] = head
// Since temp is between head and curr, temp.next is not None
while (temp.next.get != curr) temp = temp.next.get
curr = temp
}
def next: Unit = if (curr != tail) curr = curr.next.get
def length: Int = listSize
def currPos: Int = {
var temp: Link[E] = head
var i = 0
while (i < listSize) {
if (curr == temp) return i
temp = temp.next.get
i += 1
}
i
}
def moveToPos(pos: Int): Unit = {
assert (pos >= 0 && pos <= listSize)
curr = head
for (i <- 0 until pos) curr = curr.next.get
}
def getValue: Option[E] = {
curr.next match {
case Some(link) => link.element
case None => None
}
}
}
This implementation holds several design decision for convenience. The most important one is that the curr (current position) points to the element preceding to the current position. It allows the linked list can perform some of the operation efficiently.
Since this linked list is singly-linked, that is, only have the pointer to the next element, some of the operations such as prev takes $\Theta(n)$ time. One Obvious improvement of singly-linked list is the doubly-linked list. Doubly-linked list lowers time complexity for moving the current position to one previous position though it takes more space on memory.
Comparison of list implementations
Time complexity
Operation | Array | Singly-linked | Doubly-linked |
---|---|---|---|
clear | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
insert | $\Theta(n)$ | $\Theta(1)$ | $\Theta(1)$ |
append | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
remove | $\Theta(n)$ | $\Theta(1)$ | $\Theta(1)$ |
moveToStart | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
moveToEnd | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
prev | $\Theta(1)$ | $\Theta(n)$ | $\Theta(1)$ |
next | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
length | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
currPos | $\Theta(1)$ | $\Theta(n)$ | $\Theta(n)$ |
moveToPos | $\Theta(1)$ | $\Theta(n)$ | $\Theta(n)$ |
getValue | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ |
Space complexity and other limitations
Since array-based list needs to know what the maximum number of elements is and takes space for them, it takes $\Omega(n)$ space.
Linked list does not need to specify the number of elements when it is initialized and it takes $\Theta(n)$ space. However, linked list needs one or two pointers for each element. It is significant disadvantage compared to array-based list when most of the list is empty.