An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress
Published
Author(s)
Vladimir V. Marbukh
Abstract
Recent research has demonstrated benefits of replication of requests with canceling, which initiates multiple concurrent replicas of a request and uses the first successful result immediately removing the remaining replicas of the completed request from the system. This paper suggests that benefits of replication may come at the risk of abrupt system transition to an undesirable highly congested equilibrium. To expose, evaluate, and ultimately manage these risk/benefit trade-offs, we generalize replication strategy by: (a) accounting for possible inefficiency of remote service, (b) allowing replication only when static routing fails to identify idle local server, and (c) requiring one or more replicas of the same request to be completed to improve fault tolerance using majority rule decision. Due to intractability of the Markov performance model, our analysis is based on mean-field and fluid approximations. Future research should evaluate accuracy of assertions based on these approximations, and ultimately develop practical solutions for optimization of various performance trade-offs in distributed systems with replication.
Marbukh, V.
(2018),
Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress, IEEE SERVICES 2018, San Fransisco, CA, [online], https://doi.org/10.1109/SCC.2018.00043
(Accessed January 2, 2025)