9.1 S3 Overview
Simple Storage Service is a web service for storing and retrieving data from any-where and at any time. S3 allows to store any kind of data, both structured and unstructured. Amazon itself uses S3 as a primary storage to store many kinds of data such as EC2 AMIs, RDS backups, configuration files, MapReduce scripts and logs. Additionally, S3 can even be used to host a static website on. AWS give total control over S3 to the user through minimalistic interface that promotes simplicity and robustness.
Data on S3 is organized as objects and stored in containers called buckets. Each object is contained in a bucket. This offers many advantages such as simple and user-friendly addressing, organization of namespaces, common access control for the objects as well as common usage reporting. Each bucket has to have unique name and can be created in a specific region. Then, every time a new object is added to the bucket, it can be accessed with the URL of the bucket and its object key (typically file name). Additionally, Amazon S3 allows the user to configure buckets to generate unique version IDs for every added object and that way to obtain higher granularity in distinguishing objects. S3 objects are the fundamental entities containing both object data and metadata. Although AWS can’t access object data, it stores information about objects in a form of name-value pairs. This information can be anything from the last modification date to Content-Type. As mentioned before, each object has associated a key that uniquely identifies it. Each object address is then formed with the following format – (bucket name).s3.amazonaws.com/(key) – which simulates standard file system. S3 even allows folder nesting by mapping paths to the files in the way operating systems do.
Any update operations (PUT, DELETE requests) are atomic even though they might not be instantaneous. This means that the user won’t ever receive corrupted data, however it might take a little while to propagate changes and as such a read operation closely following an update might still return old data. Additionally, by default only an account used to create the bucket has access rights to it. As such, the user has to explicitly specify which accounts can access its content [24].
The process of bucket creation is relatively straight-forward. First the user has to specify bucket name and choose bucket region. S3 verifies whether the name is available and if so, new bucket is created. Optionally, the user can specify whether he wants to enable logging for his bucket. If he chooses to do so, he has to specify target bucket and target prefix where logs are to be stored. Logs cannot be stored in the same bucket they refer to. Once the bucket is created, it becomes available for manipulation. The user is free to start uploading objects, creating folder structures or granting access rights.
9.2 CloudFront Overview
CloudFront is another web service that greatly increases distribution speed of both static and dynamic content. CloudFront is essentially a type of Content Delivery Network that uses Amazon data centers spread worldwide in order to deliver content while minimizing delay. This is achieved by forwarding the content to the data center, otherwise also known as edge point, with the lowest possible latency. If his content already resides in such an edge point CloudFront can deliver it instantaneously. Otherwise it has to be first fetched from an Amazon S3 bucket or HTTP server where the source content is stored.
In order to speed up content delivery using CloudFront, an initial configuration is necessary. The first step is to specify origin server. These are essentially the servers containing actual content. They store the original, definitive version of the content and can be either Amazon S3 buckets or EC2 hosted HTTP servers. After origin server is specified, CloudFront allows choosing from two types of delivery method based on the distribution protocol. Web delivery method being default option enables speed up of static or dynamic content through both HTTP and HTTPS protocol. It additionally supports real time streaming. RTMP delivery method on the other hand corresponds to fast distribution of live media using Adobe Flash Media Server’s RTMP protocol. Once the user chooses both origin server and delivery method, CloudFront is prepared to create new distribution. The user can first configure additional settings such as allowed HTTP methods, type of object caching, cookies and query strings management, security certificates and request logging. Once everything is set and a new distribution is built, CloudFront proceeds to send its configuration to all its edge location. From now on, every single object within the distribution is available to the user both under its original URL as well as CloudFront domain name.
Once the user submits a request for specific object, DNS server routes the request to the nearest CloudFront edge location in order to maximize delivery performance. If the requested object cannot be found in edge location’s cache, user request is forwarded to the origin server (for example S3 bucket). This server responds to the request by sending object to the CloudFront edge location. Once the data begins to arrive, it is immediately being redirected to the user. Next time the user requests the same object from the edge location the response is immediate as now the requested file also resides in location’s internal cache [25].
9.3 S3 and CloudFront in Kentico AWS Manager
Both S3 and CloudFront management windows are accessible from the main menu of Kentico AWS Manager. Once the user selects any of these, KAM redirects him to a separate view containing table with respective service details. For Simple Storage Service view the user can list all his buckets belonging to his currently defined region. Similar to RDS, this view contains information about bucket names and dates of creation and allows the user to refresh this information. Once again all the operations are asynchronous and provided by S3Manager object. This view additionally allows the user to manage his buckets. By selecting Create Bucket option a window is opened that lets the user specify bucket name. After this action is performed, the wizard verifies that provided name is unique as this is a requirement by AWS and a new bucket is successfully launched and configured. Additionally, S3Manager allows the user to perform cleanup by deleting user-specified buckets.
CloudFront view is in many aspects similar to S3 view. As mentioned before, CloudFront distributions use S3 buckets as their origin servers and as such they require more configuration and store more information than S3. As such, the user can view id, domain name, status, state and origin for every single of his distributions. These are asynchronously provided by KAM’s CloudFrontManager and can be actualized by the user at any time. Since the process of creating the distribution is quite complex and requires several steps, I decided to use embedded web browser control to take care of new distribution creation process for now. This way the user has free hand to fully customize his distributions and thus benefit from rich interface that Amazon provides. Once the distribution is created, the main CloudFront view gets updated to reflect the current state of distributions. Since CloudFront service also enables to configure HTTP servers as origin servers for data CloudFrontManager filters the results of queries and only displays those distributions that have S3 bucket as their origin. Only these can be used by Kentico CMS to store web content to. In order to delete a distribution, it has to be disabled first. KAM, through CloudFrontManager, provides an option to disable a distribution, which causes every edge location on AWS network to ignore all requests for distribution’s content. Once this operation is performed, the distribution can be easily removed.
In order to use S3 storage with CloudFront delivery network, several keys need to be added to Kentico CMS Web.config. KAM provides the user with an option to associate S3 bucket to new EC2 instance of Kentico CMS and optionally specify CloudFront distribution to provide the bucket’s content. Both of these features are available as part of CMS configuration process. First the user chooses his bucket. Once this action is performed, both bucket name and bucket URL are stored in the session object to be passed to the newly-launched instance as user data. Then the user can decide to use CloudFront to distribute content of the bucket instead of doing so directly with bucket URL. KAM lets the user to choose any distribution that has his chosen bucket set as origin server. Once he chooses the right distribution, its domain name is also stored in the session. Additionally, both access keys are also passed to the new instance as these are required to run AWS API from Kentico CMS. Once the instance is launched and Kentico CMS is installed, it is KAI’s responsibility to create respective keys with user-provided values in CMS Web.config. As such, KAI creates keys for S3 bucket name, Amazon endpoint, access key id and secret key as well as specifies CMS.AmazonStorage to be used as provider class by Kentico CMS. After this is done, CMS possesses everything that is necessary to use Amazon storage.
Share with your friends: |