Multithreading and File Descriptors

      No Comments on Multithreading and File Descriptors

As we all know, in multithreading, concurrency control is important, because unexpected problems will occur because of data races.

What is data race?  — Data race is condition that multiple threads in the same process accessing the same memory, and at least one of the threads writes to this memory without synchronizing with all other accessing processes.

To tackle with data races, people use mutexes or something like to temporarily block access to critical memory pieces.  But what is often forgotten is the underlying system calls.  A representative example is file descriptors.

File descriptors can be thought as indexes into a table of file information about all the files that open by the process.  This table can not be directly accessed by application, but manipulated through system calls, such as open, write, read, close, dup, accept, etc.  Under multithreading, this table is also under the data race threat.  Several system calls that access the file descriptor table, may change or read values in the table simultaneously, which make value of the table entries undefined.  Sometimes, you will get a file descriptor that looks valid, but when you try to access it, it will be sentenced as a bad file descriptor.  Or you can get a file descriptor for some file that is not what you requested.  These are all caused by uncontrolled accesses to file descriptors.

In order to avoid such data races, we have to control such concurrency accesses.  Often there can be two way to accomplish it.  One method is to put all file descriptor changing system calls in critical regions. For example, when we are going to open a file, we first create a global mutex, lock it, open the file, and unlock the mutex.  For all other like system calls, we follow the same steps.  So that they are all interleaved according to the same mutex.  This method is simple and effective.  But be aware to avoid too many system calls depending on the same mutex, which can slow your application down by waiting for mutex too often.  Avoid accessing the same file descriptor in multiple threads.  We can always duplicate file descriptor and distributing them to threads.

The other method is to put all such calls in a dedicated thread.  All file descriptor accessing operations are done in a single thread.  This method will be complex to implement.  But if you need advanced storage access scheduling, this may be your choice.  Even you can created synchronized interfaces for each file separately.  In such scheme, one main thread is for allocating and destroying file descriptors, and several threads for files.  One thread for controlling one file descriptor.  When some other threads need file access, they will send requests to file descriptor controlling threads, and wait for the requests getting done.  Good aspect of this method is that you may use some algorithm to reduce seek time of disks, or do some lazy managements of data blocks.

There may be other methods, depend on you need, choose a suitable one.  Just do not forget tightly control you file descriptor accesses.

Leave a Reply

Your email address will not be published. Required fields are marked *