Implementing Efficiency Techniques

Leader Election

Leader election is the process of nodes in a cluster elect a leader to perform the primary functions of the service. That way all the nodes in the system know who the leader is and can elect a new leader if the current leader dies. They do this by using a Concensus Algorithm such as Paxos or Raft, and using a third-party key-value service such as Etcd and Zookeeper.

Python Implementation using Etcd

python

import etcd
import sys
import time
from threading import Event

LEADER_KEY = 'LEADER_KEY'

def main(server_name):
    client = etcd.client(host="localhost", port=2379)

    while True:
        is_leader, lease = leader_election(client, server_name)
        if is_leader:
            print("I am the leader")
            on_leadership_gained(lease)
        else:
            print("I am a follower")
            wait_for_next_election(client)

def leader_election(client, server_name):
    print("New leader election happening")
    lease = client.lease(5) # Must renew lease every 5 seconds or new leader is elected
    is_leader = try_insert(client, LEADER_KEY, server_name, lease)
    return is_leader, lease

def try_insert(client, key, server_name, lease):
    insert_succeeded = client.transaction(
        failure=[],
        success=[client.transaction.put(key, server_name, lease)],
        compare=[client.transaction.version(key) == 0]
    )
    return insert_succeeded

def on_leadership_gained(lease):
    while True:
        try:
            print("Refreshing lease, still the leader")
            lease.refresh()
            do_work()
        except Exception:
            lease.revoke()
            return
        except KeyboardInterrupt:
            lease.revoke()
            sys.exit(1)

def wait_for_next_election(client):
    election_event = Event()

    def watch_callback(resp):
        for event in resp.events:
            if isinstance(event, etcd.events.DeleteEvent):
                print("Leader election required")
                election_event.set()

    watch_id = client.add_watch_callback(LEADER_KEY, watch_callback)

    try:
        while not election_event.is_set():
            time.sleep(1)
    except KeyboardInterrupt:
        client.cancel_watch(watch_id)
        sys.exit(1)

    client.cancel_watch()


def do_work():
    time.sleep(1)

Polling and Streaming

Polling is the act of requesting data updates at a regular interval. This is typically done when the server has a REST API. Streaming is the act of getting continuous data updates fed from the server through an open connection. This is achieved using web sockets, which keeps an open connection between the server and client to allow for either party to send information at either time. Streaming is preferred when the information is time sensitive or when you would want the data update as soon as it happened.

Pub Sub

Pub Sub, or publsihing and subscribing, is a method of dividing streamed data by topics that clients can subscribe to. Then, when a new event is published for that topic, all of the clients subscribed will receive the update. These systems often come with guarantees such as at-least-one delivery, persistent storage/queues, ordering of messages, and replayability of messages. These messages also typically have to be idempotent operations, which means the outcome has the be the same regardless of how many times the event takes place. If the same message is sent multiple times on a pub sub framework, it must typically have the same effect on all clients. Some popular Pub Sub frameworks include Apache Kafka and Cloud Pub/Sub.

Configuration

Configuration is a set of variables/parameters that determine certain behaviors within the application. Static configuration is hard coded and shipped with the application, and Dynamic configuration is kept outside of the system and can be edited easier. Typical static configuration languages include JSON and YAML, which dynamic configurations use third-party key-value stores.

Rate Limiting

Rate Limiting is the process of limiting the number of requests that can be made to the system. This is typically done by IP address to prevent people from abusing the server and taking up all of the resources. This type of attack is know as a DoS Attack, or denial of service. If this attack is performed from multiple machines, that is a DDoS Attack, or distributed denial of service, and is much harder to defend against. This is typically done with a key-value store such as Redis to keep track of how many times a particular IP accesses a service.

Implementing Efficiency Techniques ​

Leader Election ​

Python Implementation using Etcd ​

Polling and Streaming ​

Pub Sub ​

Configuration ​

Rate Limiting ​