Applications built in the form of microservices requires virtual machines or containers as a shared computer resource. As these microservices shares the resources, they should be able to handle the load and multiplex the resources to maintain the Service Level Objectives (SLOs). Sudden changes in load on the microservice instance or the conflict on the access to the shared resources violates the SLOs. When microservice violates the SLOs, it become the critical case in terms of resource management. Re-current provisioning and autoscaling these two are traditional ways for the SLOs violation problem where more CPUs are allocated to microservice instances using policies generated by algorithms. But traditional ways have two drawbacks, first one is it fails to do efficient multiplexing at granular level like cache, memory, I/O channels. Another one is human expert & training is needed for deployment of huge microservice system. This paper states the solution for above problem : FIRM a resource management framework who manages shared resources of microservices at granular level with the help of machine learning. It reduces the resource’s access conflicts and improves the performance.
Main Contributions
The paper provides the first proposal for a SLO violation mitigation framework for microservices using ML models. FIRM uses machine learning to adopt the resources usage and manages the load changes efficiently. Support Vector Machine (SVM) driven detection and Reinforcement Learning these two ML models are used by FIRM. The first one is an efficient approach to restricting the microservices with use of some anomaly detection. The large state problem is handled by the second model i.e., RL violation mitigation. A fine-grained collection of shared resources is controlled and analyzed by the FIRM. It uses the RL to improve the resource management policies for long-term reward. Reinforcement learning overcomes the problem of an existing model like model reconstruction & retraining problems. Observation taken from experiments states that timed out / dropped user requests number is reduced by 8 times, also overall requested CPU limit is lowered by 29-62%. Performance is improved by 6x and 11x which means few SLO violations only.
Questions for the paper
After implementing FIRM in microservice, will it provide the same overall performance as before?
Can adaptive FIRM handle the problem related to the network in microservice?
Limitations
The Author states that FIRM is adaptive in nature and scales up the resources if needed but allocating more resource will harm the overall performance of microservice. Also, the FIRM does not detect any bugs or unpredictable situations which can cause failures. The scalability of FIRM is limited as per the centralized graph DB.
Significance of the paper
The paper is first of in the industry to provide the solutions for the resources management in dynamic microservice applications. I would like to rate this paper as 5 significant. The problem with existing resources management is stated properly and an efficient resource management framework called FIRM is defined. FIRM is very adaptive in nature and made intelligent with help of machine learning to control & analyze the shared resources with microservice. Though there are some limitations like it cannot detect the bugs and misconfigurations in microservice, it outperforms both the baseline by 6x & 11x. FIRM detects the SLO violated precisely and mitigates the resource contention before violations.