Cutting Latency Tail: Analyzing and Validating Replication without Canceling

Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous st...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2017
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/24250
Acceso en línea:
https://doi.org/10.1109/TPDS.2017.2706268
https://repository.urosario.edu.co/handle/10336/24250
Palabra clave:
Application programs
Benchmarking
Computer software
Computer software selection and evaluation
Legacy systems
MATLAB
Web services
Effective solution
Matrix analytic methods
Response time distribution
Response time variability
Service time
Software applications
Software quality engineering
Speculative computing
Response time (computer systems)
Correlated service times
Matrix analytic methods
Software quality engineering
Speculative computing
Rights
License
Abierto (Texto Completo)
Description
Summary:Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios. © 2017 IEEE.