Friday, January 4, 2013

Back to the Future: Adaptive Load Management for Internet Services




First blog one of the series " Back To Future", Here is summary of one interesting design approach i came across for dynamically managing load of Internet Services(I too don't like the term Web Services :-) ).

Overload management is a critical design goal for Internet-based systems and services. Most service designs take overload into account, by treating the problem as one of capacity planning rather than designing the service to behave gracefully under extreme load. Feedback-driven control, rather than static resource limits, should be the basis for detecting and controlling overload. Adaptive admission controllers for meeting performance targets, such as 90th percentile response time can be used.

Most operating systems adhere to the principle of resource virtualization to simplify application development. Unfortunately, this approach makes it difficult for applications to be aware of, or adapt to,real resource limitations.The programming models used for developing Services generally fail to express resource constraints in a meaningful way.For example, Java RMI calls can throw a generic exception due to any type of failure, but there is typically little that an application can do when this occurs: should the application fail, retry the operation, or invoke an alternate interface? A service consisting of several independent components communicating through a common protocol such as SOAP/HTTP. When one component becomes a resource bottleneck, the only overload management technique generally used is for the service to refuse additional TCP connections. While effectively shielding that service from load, other participants face very long connection delays (e.g., due to TCP’s exponential SYN re-transmit back-off behavior), causing the bottleneck to propagate through the entire distributed application.

An interesting framework for building Internet services that are inherently robust to load, using two simple techniques:
dynamic resource management and fine-grained admission control on a software architecture called the staged event-driven architecture (or SEDA), which decomposes an Internet service into a network of event-driven stages connected with explicit event queues. Load management in SEDA is accomplished by introducing a feedback loop that observes the behavior and performance of each stage, and applies resource control and admission control to effectively manage overload.

Dynamic Overload Management
The classic approach to resource management in Internet services is static resource containment, in which a priori resource limits are imposed on an application or service to avoid over commitment. Various kinds of resource limits are used: bounding the number of processes or threads within a server is a common technique, as is limiting the number of client socket connections to the service. Both of these approaches have the fundamental problem that it is generally not possible to know what the ideal resource limits should be. Setting the limit too low under utilizes resources, while setting the limit too high can lead to over saturation and serious performance degradation under overload. Refusing to accept additional TCP connections under heavy load is inadvisable as it causes clients to re-transmit the initial SYN packet with exponential backoff, leading to very long response times. This approach is also too coarse grained in the sense that even a single client can consume all of the resources in the system; imposing process or connection limits does not solve the more general resource management issue.

Another style of resource containment is that typified by a variety of real-time and multimedia systems. In this approach,resource limits are typically expressed as reservations or shares,
as in “process P gets X percent of the CPU.” In this model,the operating system must be careful to account for and control the resource usage of each process. Applications are given
a set of resource guarantees, and the system prevents guarantees from being exceeded through scheduling or forced termination.Reservation- and share-based resource limits techniques work well for real-time and multimedia applications,which have relatively static resource demands that can be expressed as straightforward, fixed limits. coarse-grained entities. In an Internet service, the focus is on individual requests, for which it is permissible (and often desirable) to meet statistical performance targets over a large number of requests, rather than to enforce guarantees for particular requests.

Desirable properties for overload management:

Exposure of the request stream: Event queues make the request stream within the service explicit, allowing the application (and the underlying runtime environment) to observe and control the performance of the system, e.g.,through reordering or filtering of requests.

Focused, application-specific admission control: A stage that consumes many resources can be conditioned to load by throttling the rate at which events are admitted to just that stage, rather than refusing all new requests in a generic fashion.

In SEDA, each stage is subject to dynamic resource control,which attempts to keep each stage within its ideal operating regime by tuning parameters of the stage’s operation. For example,
one such controller adjusts the number of threads executing within each stage based on an observation of the stage’s offered load (incoming queue length) and performance (throughput).
This approach frees the application programmer from manually setting “knobs” that can have a serious impact on performance.

Each stage has an associated admission controller that guards access to the event queue for that stage.

When the admission controller rejects a request, the corresponding en-queue operation fails, indicating to the originating stage that there is a bottleneck in the system. Applications are
therefore responsible for reacting to these “overload signals” in some way. The simplest response is to block until the downstream stage can accept the request, which leads to back-pressure
within the graph of stages. Another response is to drop the request,possibly sending an error message to the client or using the HTTP redirect mechanism to bounce the request to another
server.

The key is that the architecture is explicit about signaling overload conditions and allows the application to participate in load management decisions.

Response time controller:

 The controller consists of several components. A monitor measures response times for each request passing through a stage. The measured 90th percentile response time over some interval is passed to the controller which adjusts the admission control parameters based on the administrator supplied response-time target. The basic overload control algorithm makes use of additive-increase/multiplicative decrease tuning of the token bucket rate based on the current observation of the 90th percentile response time processed. This implies that the overload controller will not run when the token bucket rate is low; the algorithm therefore “times out” and performs a recalculation of the 90th percentile response time after a certain interval. When the 90th percentile response time estimate is above a high-water mark (e.g., 10% above the administrator-specified target), the token bucket rate is reduced by a multiplicative factor (e.g., dividing the admission rate by 2).When the estimate is below a low-water mark, the token bucket rate is increased by a small additive factor.

Rather than rejecting requests,  applications may degrade the quality of delivered service in order to admit a larger number of requests given a response-time target. If service degradation is ineffective (say, because the load is too high to support even the lowest quality setting),the stage can re-enable admission control to cause requests to be rejected.

Likewise, by prioritizing requests from certain users over others,a application can implement various policies related to class-based service level agreements. A common example is
to give better service to requests from “gold” customers (who might pay more money for the offered service or the one who has nore items in the shopping cart ). Another controller that can be designed could be the one which more aggressively rejects lower-class requests than higher-class requests when a stage is overloaded.

Design and tuning of efficient control mechanisms: Introducing feedback as a mechanism for overload control raises a number of questions. For example, how should controller parameters be
tuned? We have relied mainly on a heuristic approach to controller design, though more formal, control-theoretic techniques are possible . Control theory provides a valuable framework
for designing and evaluating feedback-driven systems, though many of the traditional techniques rely upon good mathematical models of system behavior, which are often unavailable for complex
software systems. The interaction between multiple levels of control in the system — for example, the interplay between queue admission control and tuning per-stage thread pool sizes.

Useful Links
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services http://www.eecs.harvard.edu/~mdw/talks/seda-sosp01-talk.pdf
http://www.eecs.harvard.edu/~mdw/proj/seda/
http://en.wikipedia.org/wiki/Staged_event-driven_architecture