Autonomic ReplicatIon of Software Transactional memorieS :: Project Summary
Software Transactional Memories (STMs) have garnered considerable interest of late due to the recent technological trend that has made multi-core and many-core CPUs the architecture-of-choice for mainstream computing. STMs represent an attractive solution to spare programmers from the pitfalls of conventional explicit lock-based thread synchronization, leveraging on proven concurrency-control concepts used for decades by the database community to simplify the mainstream concurrent programming. When using STMs, the programmers are simply required to specify which operations on shared data structures are to be executed within the scope of an atomic and isolated transaction. By relishing the programmer from the burden of managing locks or other error-prone low-level concurrency control mechanisms, STMs have been shown to enable a sensible boost in productivity, as well as in code reliability.
The large research effort that STMs have attracted over these last years has led to the investigation of a wide range of alternative approaches as well as to the development of the first complex STM-based applications. This allows us to draw two main considerations:
- The search for a “one size fits all” solution in the vast multi-dimensional design space of STMs has eventually turned out to be inconclusive. Several independent research works have in fact clearly highlighted that no single “panacea” solution exists that is able to maximize the performances of any STM workload.
- As STMs make their way out of research labs and start to be adopted in real world applications, they are faced with harsh scalability and dependability challenges which can’t be effectively tackled, due to the current lack of efficient STM replication schemes. This is the case, for instance, of the FenixEDU system, a complex web-based application, used in one of the largest Portuguese universities, which extensively relies on the STM technology. With a population of more than 14000 users and a steadily increasing traffic volume, the FenixEDU system is currently urging for efficient replication mechanisms capable of ensuring adequate scalability and fault-tolerance levels.
At the light of the above considerations, this project will aim at pursuing a twofold goal:
- Extending the conventional notion of STM, traditionally confined within the boundaries of a single multi-processor machine, to seek a convergence with the distributed computing paradigm, and introduce a novel programming abstraction which combines the simplicity of STMs with the scalability and failure resiliency achievable by leveraging the resource redundancy proper of large scale cluster environments.
- Given the impossibility to conceive a single, universally optimal solution even for the simpler scenario of non-distributed STMs, we will focus on the design and implementation of an autonomic, self-optimizing distributed STM platform, ARISTOS (Autonomic ReplicatIon of Software TransactiOnal memorieS). The ARISTOS platform will autonomously monitor the workload generated by the user level applications and seek optimal performances by transparently adapting the mechanisms used both 1) to regulate concurrency between local transactions (i.e. at the STM level), and 2) to detect conflicts originated by transactions executing at different nodes (i.e. at the replication protocol level).
To achieve these results, several challenging issues shall be addressed during this project, including:
- Architecting scalable and fault-tolerant replication mechanisms explicitly tailored to meet the unique requirements of STM systems
- Developing effective workload characterization strategies and performance forecast models for automatically identifying the optimal choice of contention management strategies to be adopted
- Designing and implementing flexible and efficient mechanisms allowing to dynamically alter both the local and the global contention management schemes while preserving the system’s consistency