Online applications adoption and success is driven by a multitude of factors among them the service response time, this is natural as users tend to prefer a faster service than a slower. However, it is challenging to deliver consistently fast response times due to performance variability inherent to the infrastructure running the application, this performance variability causes a fraction of user re- quests to experience unusual latency called tail latency. The tail latency assumes a more preponderant effect as the application infrastructure scales out, creating an additional delay point as an additional server is added, hence it is critical to track server’s response time in order to prefer the faster server when possible, this is called replica selection
Replica selection algorithms have been proved to help decrease tail latency in key value datastores, however has these datas- tores evolved to support more sophisticated data models and query languages, previously proposed methods become unusable as they have been designed with certain assumptions about the datastores that no longer hold. In this work, a Linear Regression Based Replica Selection Algorithm is proposed. The regression model helps to estimate the how long a specific query is going to take to be serviced, and based on this information a server with more or less resources is chosen to service the query. The proposed approach is successful in reducing the higher percentiles (p999) latency up to 20% while not impacting negatively the throughput.