Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress

Published

Author(s)

Vladimir V. Marbukh

Abstract

Recent research has demonstrated benefits of replication of requests with canceling, which initiates multiple concurrent replicas of a request and uses the first successful result immediately removing the remaining replicas of the completed request from the system. This paper suggests that benefits of replication may come at the risk of abrupt system transition to an undesirable highly congested equilibrium. To expose, evaluate, and ultimately manage these risk/benefit trade-offs, we generalize replication strategy by: (a) accounting for possible inefficiency of “remote” service, (b) allowing replication only when static routing fails to identify idle “local” server, and (c) requiring one or more replicas of the same request to be completed to improve fault tolerance using majority rule decision. Due to intractability of the Markov performance model, our analysis is based on mean-field and fluid approximations. Future research should evaluate accuracy of assertions based on these approximations, and ultimately develop practical solutions for optimization of various performance trade-offs in distributed systems with replication.
Proceedings Title
IEEE SERVICES 2018
Conference Dates
July 2-7, 2018
Conference Location
San Fransisco, CA

Keywords

Dynamic job replication, fault tolerance, latency, economic efficiency, risk/benefit trade-offs.

Citation

Marbukh, V. (2018), Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress, IEEE SERVICES 2018, San Fransisco, CA, [online], https://doi.org/10.1109/SCC.2018.00043 (Accessed January 2, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created September 6, 2018, Updated May 14, 2020