Lists

List

Note: This is the first post in this blog.

I will investigate basic data structures and algorithms to understand how they can be implemented and how the implementation affect their performance. The contents is mainly based on Data Structures and Algorithm Analysis (https://people.cs.vt.edu/shaffer/Book/). In the book the implementations are written in Java, however, I will use Scala instead. Also, my own analyses are added to the contennts. Enough talking, let's begin.

The list is arguably the most fundamental data structure in computer science. It is just a simple sequence that contain multiple data. However, there are more than one implementation for it. The space complexity of a list and the time complexities of the operations on it is dependent on the implementation. Put implementation aside, we can define the operations we want a list to perform. Let's call it ADT (abstract data types). ADT consists of operations the data structure must perform however they are implemented. It can have the smallest number of operations so that combining the operations can do more complicated operations. First, let's consider it for the list.

List "barebone" ADT

In [6]:
trait BareBoneListADT[E] {
    def append(item: E): Unit
    
    def getValue: Option[E]
}
Out[6]:
defined trait BareBoneListADT

However, straightforward implementation of this ADT cause performance issue for some of the operations. I will detail it later. Instead, we can come up with ADT with more operation to perform better for the real world requirements.

List "realistic" ADT

In [1]:
trait ListADT[E] {
    def clear: Unit
    
    def insert(item: E): Unit
    
    def append(item: E): Unit
    
    def remove: Option[E]
    
    def moveToStart: Unit
    
    def moveToEnd: Unit
    
    def prev: Unit
    
    def next: Unit
    
    def length: Int
    
    def currPos: Int
    
    def moveToPos(pos: Int): Unit
    
    def getValue: Option[E]
}
Out[1]:
defined trait ListADT

Note that We have done important design decision in this ADT that we make it hold the current position in a list. There are two distinctive ways we can implement a list. One is called array-based list, and the other is called linked list.

Array-based list implementation

In [2]:
import scala.reflect.ClassTag

class AList[E: ClassTag](size: Int = 10) extends ListADT[E] {
    private val maxSize: Int = size
    private var listSize: Int = 0
    private var curr: Int = 0
    private val listArray: Array[E] = new Array[E](maxSize)
    
    def clear: Unit = {
        listSize = 0
        curr = 0
    }
    
    def insert(item: E): Unit = {
        for (i <- listSize until curr by -1) listArray(i) = listArray(i - 1)
        listArray(curr) = item
        listSize += 1
    }
    
    def append(item: E): Unit = {
        listSize += 1
        listArray(listSize) = item
    }
    
    def remove: Option[E] = {
        curr match {
            case _ if curr < 0 || curr >= listSize => None
            case _ =>
                val item = listArray(curr)
                for (i <- curr until listSize - 1) listArray(i) = listArray(i + 1)
                listSize -= 1
                Some(item)
        }
    }
    
    def moveToStart: Unit = curr = 0
    
    def moveToEnd: Unit = curr = listSize
    
    def prev: Unit = if (curr != 0) curr -= 1
    
    def next: Unit = if (curr < listSize) curr += 1
    
    def length: Int = listSize
    
    def currPos: Int = curr
    
    def moveToPos(pos: Int): Unit = {
        assert(pos >= 0 && pos <= listSize)
        curr = pos
    }
    
    def getValue: Option[E] = {
        // use pattern match to return None if curr is out of range
        curr match {
            case _ if curr >= 0 && curr < listSize => Some(listArray(curr))
            case outOfRange => None
        }
    }
}
Out[2]:
import scala.reflect.ClassTag


defined class AList

    Linked list implementation

    In [10]:
    class Link[E](item: Option[E], nextVal: Option[Link[E]]) {
        private var linkElement: Option[E] = item
        
        private var nextLink: Option[Link[E]] = nextVal
        
        def next: Option[Link[E]] = nextLink
        
        def setNext(nextVal: Option[Link[E]]): Unit = nextLink = nextVal
        
        def element: Option[E] = linkElement
        
        def setElement(item: Option[E]): Unit = linkElement = item
    }
    
    class LList[E](size: Int = 0) extends ListADT[E] {
        // ignore the input size which is there for the consistency with AList
        private var head: Link[E] = new Link[E](None, None)
        
        private var tail: Link[E] = head
        
        private var curr: Link[E] = head
        
        private var listSize: Int = 0
        
        def clear: Unit = {
            head.setNext(None)
            head = new Link[E](None, None)
            curr = head
            tail = head
            listSize = 0
        }
        
        def insert(item: E): Unit = {
            curr.setNext(Some(new Link[E](Some(item), curr.next)))
            if (tail == curr) tail = curr.next.get
            listSize += 1
        }
        
        def append(item: E): Unit = {
            tail.setNext(Some(new Link[E](Some(item), None)))
            tail = tail.next.get
            listSize += 1
        }
        
        def remove: Option[E] = {
            curr.next match {
                case None => None
                case Some(link) =>
                    val optionItem = link.element
                    if (tail == link) tail = curr
                    curr.setNext(link.next)
                    listSize -= 1
                    optionItem
            }
        }
        
        def moveToStart: Unit = curr = head
        
        def moveToEnd: Unit = curr = tail
        
        def prev: Unit = {
            if (curr == head) return
            var temp: Link[E] = head
            // Since temp is between head and curr, temp.next is not None
            while (temp.next.get != curr) temp = temp.next.get
            curr = temp
        }
        
        def next: Unit = if (curr != tail) curr = curr.next.get
        
        def length: Int = listSize
        
        def currPos: Int = {
            var temp: Link[E] = head
            var i = 0
            while (i < listSize) {
                if (curr == temp) return i
                temp = temp.next.get
                i += 1
            }
            i
        }
        
        def moveToPos(pos: Int): Unit = {
            assert (pos >= 0 && pos <= listSize)
            curr = head
            for (i <- 0 until pos) curr = curr.next.get
        }
        
        def getValue: Option[E] = {
            curr.next match {
                case Some(link) => link.element
                case None => None
            }
        }
    }
    
    Out[10]:
    defined class Link
    defined class LList

    This implementation holds several design decision for convenience. The most important one is that the curr (current position) points to the element preceding to the current position. It allows the linked list can perform some of the operation efficiently.

    Since this linked list is singly-linked, that is, only have the pointer to the next element, some of the operations such as prev takes $\Theta(n)$ time. One Obvious improvement of singly-linked list is the doubly-linked list. Doubly-linked list lowers time complexity for moving the current position to one previous position though it takes more space on memory.

    Comparison of list implementations

    Time complexity

    Operation Array Singly-linked Doubly-linked
    clear $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    insert $\Theta(n)$ $\Theta(1)$ $\Theta(1)$
    append $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    remove $\Theta(n)$ $\Theta(1)$ $\Theta(1)$
    moveToStart $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    moveToEnd $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    prev $\Theta(1)$ $\Theta(n)$ $\Theta(1)$
    next $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    length $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
    currPos $\Theta(1)$ $\Theta(n)$ $\Theta(n)$
    moveToPos $\Theta(1)$ $\Theta(n)$ $\Theta(n)$
    getValue $\Theta(1)$ $\Theta(1)$ $\Theta(1)$

    Space complexity and other limitations

    Since array-based list needs to know what the maximum number of elements is and takes space for them, it takes $\Omega(n)$ space.

    Linked list does not need to specify the number of elements when it is initialized and it takes $\Theta(n)$ space. However, linked list needs one or two pointers for each element. It is significant disadvantage compared to array-based list when most of the list is empty.