Database Fundamentals

Databases, Algorithms, Indexing, Scaling

What is a Database?

  • A database is a collection of data that is stored in a computer system.
  • Databases allow their users to retrieve, update, and manage data.
  • Generally databases use persistent storage to store data. Although, some databases use in-memory storage.

How databases work?

  1. Transport layer recieves a query.
  2. The query is parsed and validated.
  3. The query is optimized.
  4. The query is executed with the help of the storage engine.
  5. The result is sent back to the client.

A simple database

{"id":1, "title": "Todo X", "ownerId": 1, "done": true}
{"id":2, "title": "Todo Y", "ownerId": 2, "done": false}
{"id":3, "title": "Todo Z", "ownerId": 3, "done": true}
{"id":1, "title": "Todo U", "ownerId": 2, "done": true}
{"id":2, "title": "Todo V", "ownerId": 2, "done": false}
{"id":3, "title": "Todo W", "ownerId": 1, "done": true}
  • We can store our data in a plain text file.
  • Each line is a valid JSON object.
  • We can read the file line by line and parse each line as a JSON object.
  • It is simple but not efficient.

Time Complexity

  • Time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm.
  • Since an algorithm's running time may vary among different inputs of the same size, one commonly considers the worst-case time complexity, which is the maximum amount of time required for inputs of a given size.

Big O Notation

Definition 1: if and only if there exist positive constants and such that for all .

Definition 2: as

  • : Constant time
  • : Logarithmic time
  • : Linear time
  • : Linearithmic time
  • : Quadratic time
  • : Polynomial time
  • : Exponential time

Time complexity examples

  • :
    function add(a: number, b: number): number {
      return a + b;
    }
    
  • :
    function sum(arr: number[]): number {
      let sum = 0;
      for (let i = 0; i < arr.length; i++) {
        sum += arr[i];
      }
      return sum;
    }
    

Time complexity examples

  • :
    function bubbleSort(arr: number[]): number[] {
      for (let i = 0; i < arr.length; i++) {
        for (let j = 0; j < arr.length - i - 1; j++) {
          if (arr[j] > arr[j + 1]) {
            const temp = arr[j];
            arr[j] = arr[j + 1];
            arr[j + 1] = temp;
          }
        }
      }
      return arr;
    }
    

Complexity of Searching and Sorting Algorithms

center

Complexity of our simple database

{"id":1, "title": "Todo X", "ownerId": 1, "done": true}
{"id":2, "title": "Todo Y", "ownerId": 2, "done": false}
{"id":3, "title": "Todo Z", "ownerId": 3, "done": true}
{"id":1, "title": "Todo U", "ownerId": 2, "done": true}
{"id":2, "title": "Todo V", "ownerId": 2, "done": false}
{"id":3, "title": "Todo W", "ownerId": 1, "done": true}
  • Get item with its id:
  • Get items with a specific owner id:
  • Get items with a specific done status:
  • Write an item:
  • Update an item:
  • Delete an item:

Hash Tables

center

Hash Index

  • A hash index is an index that is built on a hash table.
  • Hash indexes are used to quickly locate a record in a database table.
  • Hash indexes has time complexity for read and write operations.
  • Hash indexing only supports equality queries.

Binary Trees

center

BTrees

center

Indexing

Joins

CAP Theorem

  • Consistency: All nodes see the same data at the same time.
  • Availability: Every request receives a response.
  • Partition Tolerance: The system continues to operate despite network partitions.

ACID Properties

  • Atomicity: All operations in a transaction are successful or none of them are.
  • Consistency: The database is always in a consistent state.
  • Isolation: Transactions are isolated from each other.
  • Durability: Once a transaction is committed, it is permanent.

SQL vs NoSQL

  • SQL databases are relational databases.
  • NoSQL means "Not Only SQL".
  • NoSQL databases may be relational or non-relational.
  • Also SQL databases may be non-relational.

SQL Databases

  • Generally uses vertical scaling and replication.
  • Generally uses BTree for indexing but also supports other types of indexes.
  • Generally uses transactions.
  • Generally they are ACID compliant.
  • Generally they are not fault tolerant.
  • Generally they are not elastic.
  • Some examples of SQL databases are MySQL, PostgreSQL, and Amazon Aurora.

NoSQL Database Types

  • Key-Value Stores
  • Document Stores
  • Column-Family Stores
  • Graph Databases

Key-Value Stores

  • Key-Value stores are the simplest type of NoSQL databases.
  • Key-Value stores are used to store data in a key-value format.
  • Key-Value stores are used to store data in a distributed system.
  • Key-Value stores are used to store data in a fault tolerant system.
  • Some examples of Key-Value stores are Redis, Memcached, and Dynamo DB.

Document Stores

  • Document stores are used to store data in a document format.
  • Document stores are used to store data in a distributed system.
  • Some examples of Document stores are MongoDB, Couchbase, and CouchDB.

Column-Family Stores

  • Column-Family stores are used to store data in a column-family format.
  • Column-Family stores are used to store data in a distributed system.
  • Some examples of Column-Family stores are Cassandra, HBase, and ScyllaDB.

Graph Databases

  • Graph databases are used to store data in a graph format.
  • Graph databases are used to store data in a distributed system.
  • Some examples of Graph databases are Neo4j, ArangoDB, and Amazon Neptune.

Database Scaling

  • Scaling a database is an hard task compared to scaling computation.
  • There are two main ways to scale a database: vertical scaling and horizontal scaling.
  • Vertical scaling is adding more resources to a single machine.
  • Horizontal scaling is adding more machines to a system.

center

Vertical Scaling

Pros

  • Vertical scaling is simple.
  • Does not require any changes to the application.

Cons

  • Vertical scaling is adding more resources to a single machine.
  • Vertical scaling is limited by the capacity of the machine.
  • Vertical scaling is expensive.
  • Vertical scaling is not fault tolerant.
  • Vertical scaling is not elastic.

Horizontal Scaling

Pros

  • Horizontal scaling is adding more machines to a system.
  • Horizontal scaling is fault tolerant.
  • Horizontal scaling is elastic.
  • Horizontal scaling is cheaper than vertical scaling.

Cons

  • Horizontal scaling is complex.
  • Horizontal scaling requires changes to the application.

Horizontal Scaling: Replication

  • Replication is a type of horizontal scaling.
  • We are writing to leader and reading from followers.
  • Replication is used to increase read performance and fault tolerance.
  • Replication does not increase write performance.

center

Horizontal Scaling: Sharding

  • Sharding is a type of horizontal scaling.
  • We are partitioning the data and storing each partition on a different machine.
  • Sharding is used to increase read and write performance.