THREAD-LOCAL STORAGE

Braden Steffaniak - Feb 22, 2017

WHAT IS THREAD-LOCAL STORAGE (TLS)?permalink

Thread-local storage5 is global and static memory, but instead of being shared between threads, it is local to individual threads. To better explain this, I will briefly go over the types of memory in a program6:

  • Data: preinitialized modifiable static and global data
  • BSS: uninitialized static and global data
  • Heap: shared static and global data
  • Stack: local variables and parameters
  • Registers: really fast CPU memory

The types of memory that are shared among threads include: Data, BSS, and Heap.

This poses a problem for when you want a variable that is static or global, but not shared among threads (i.e. each thread has its own copy of the variable). Different languages have different solutions to this problem.

LANGUAGE IMPLEMENTATIONSpermalink

PThread Implementationpermalink

Pthread implementations in C have pthread_key_create and pthread_key_delete to allocate and deallocate space on a thread, and pthread_getspecific and pthread_setspecific to retrieve and set the data.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUMTHREADS 4

pthread_key_t glob_var_key;

void do_something()
{
  //get thread specific data
  int* glob_spec_var = pthread_getspecific(glob_var_key);
  printf("Thread %d before mod value is %d\n", (unsigned int)pthread_self(), *glob_spec_var);
  *glob_spec_var += 1;
  printf("Thread %d after mod value is %d\n", (unsigned int)pthread_self(), *glob_spec_var);
}

void* thread_func(void *arg)
{
  int *p = malloc(sizeof(int));
  *p = 1;
  pthread_setspecific(glob_var_key, p);
  do_something();
  do_something();
  pthread_setspecific(glob_var_key, NULL);
  free(p);
  pthread_exit(NULL);
}

int main(void)
{
  pthread_t threads[NUMTHREADS];
  int i;

  pthread_key_create(&amp;glob_var_key,NULL);
  for (i=0; i < NUMTHREADS; i++)
    pthread_create(threads+i,NULL,thread_func,NULL);

  for (i=0; i < NUMTHREADS; i++)
    pthread_join(threads[i], NULL);

  return 0;
}

Swift Implementationpermalink

Swift's implementation uses a dictionary called threadDictionary.

public func checkThreadLocal<T: AnyObject>(key: String, create: () -> T) -> T {
  let threadDictionary = NSThread.currentThread().threadDictionary

  if let cachedObject = threadDictionary[key] as T? {
    return cachedObject
  } else {
    let newObject = create()
    threadDictionary[key] = newObject
    return newObject
  }
}

func getFormatter() -> NSDateFormatter {
  return checkThreadLocal("SomeName") {
    println("This block will only be executed once")
    let enUSPOSIXLocale = NSLocale(localeIdentifier: "en_US_POSIX")
    let formatter = NSDateFormatter()
    formatter.locale = enUSPOSIXLocale
    formatter.dateFormat = "yyyy'-'MM'-'dd'T'HH':'mm':'ss'Z'"
    formatter.timeZone = NSTimeZone(forSecondsFromGMT: 0)
    return formatter
  }
}

let x1 = getFormatter()
let x2 = getFormatter()
let x3 = getFormatter()

In the above code example, the checkThreadLocal function checks the threadDictionary to see if an entry with the given key value has been added to the current thread's threadDictionary. If it has, then it simply returns the already created object. Otherwise it creates the object (using the create() lambda) and returns it. x1, x2, and x3 all reference the same variable in the current running thread's memory.

However, if you were to run getFormatter() once again in a new thread, you would create a separate instance of the NSDateFormatter:

let x1 = getFormatter() // NEW INSTANCE CREATED!!
let x2 = getFormatter() // References the instance created by x1
let x3 = getFormatter() // References the instance created by x1

DispatchQueue.main.async {
  let x4 = getFormatter() // NEW INSTANCE CREATED!!
  let x5 = getFormatter() // References the instance created by x4
  let x6 = getFormatter() // References the instance created by x4
}

let x7 = getFormatter() // References the instance created by x1

This is because inside the DispatchQueue.main.async block, you are inside a separate thread with its own threadDictionary.

Java Implementationpermalink

Java's implementation uses a data structure that is near identical to what Flat uses. Java uses a ThreadLocal object that takes a generic argument for the type of data that is being stored, and you use get/set/remove functions to manage that memory within the thread.

public class ThreadLocalExample {
  public static class MyRunnable extends Thread {
    private static ThreadLocal<Integer> threadLocal = ThreadLocal<Integer>();

    public void run() {
      threadLocal.set((int)(Math.random() * 100));

      try {
          Thread.sleep(2000);
      } catch (InterruptedException e) {}

      System.out.println(threadLocal.get());
    }
  }

  public static void main(String[] args) {
    Thread thread1 = MyRunnable();
    Thread thread2 = MyRunnable();

    thread1.start();
    thread2.start();

    thread1.join(); // wait for thread 1 to terminate
    thread2.join(); // wait for thread 2 to terminate
  }
}

Flat Implementationpermalink

The same code in Flat would look like:

class ThreadLocalExample {
  static class MyRunnable extends Thread {
    static ThreadLocal<Int> threadLocal = ThreadLocal()

    public run() {
      threadLocal.set((Int)(Math.random() * 100))

      Thread.sleep(2000)

      Console.writeLine(threadLocal.get())
    }
  }

  public static main(String[] args) {
    let thread1 = MyRunnable()
    let thread2 = MyRunnable()

    thread1.start()
    thread2.start()

    thread1.join() // wait for thread 1 to terminate
    thread2.join() // wait for thread 2 to terminate
  }
}

They are nearly identical. However, this is not the only way that Flat allows you to declare thread locals. With a little syntax sugar from the thread_local modifier (or [ThreadLocal] annotation), you are able to achieve the same result with minimal change to how you would program it if it were not necessary to be local to the thread:

class ThreadLocalExample {
  static class MyRunnable extends Thread {
    static thread_local Int threadLocal

    public run() {
      threadLocal = (Int)(Math.random() * 100)

      Thread.sleep(2000)

      Console.writeLine(threadLocal)
    }
  }

  public static main(String[] args) {
    let sharedRunnableInstance = MyRunnable()

    thread1.start()
    thread2.start()

    thread1.join() // wait for thread 1 to terminate
    thread2.join() // wait for thread 2 to terminate
  }
}

The thread_local modifier also allows for more platform specific optimizations to take place. For instance, when compiled to C, the compiler will output the thread_local fields using the __thread modifier, which offers a performance boost. Flat's thread_local modifier will be available in version 0.3.7 and up.

Conclusionpermalink

Considering the different approaches that the different languages in this article showcased, Flat decided to use a ThreadLocal data structure because of its clean implementation with the thread_local modifier. There needed to be a seamless way to define thread-local data without using some sort of key/value pair to keep track of it. Instead of requiring you to keep track of the thread ID and the variable itself, the thread_local modifier consolidates the user's focus on the variable itself.

Footnotes:permalink

5. More information on thread-local storage can be found here.

6. More information can be found here.

POSTS DOCUMENTATION