Igor Sokolov's Blog

IT blog

A holistic view on implementation of a REST API backend using Serverless Framework

Posted at — Aug 26, 2019

Introduction

Many examples available in the Internet demonstrate how quickly and easy anyone can build a REST API backend using Serverless Framework. Based on my experience, outcomes of those examples can be hardly called ‘REST API’. That is because they don’t implement the RESTful principals in full or even violate some basics of RESTful style. In this article I make an attempt to take one existing example and show how the example should be enhanced to step closer to REST API according best practices.

I’ve been watching for Serverless Framework for a while. Unfortunately, I didn’t have a chance to get my hands dirty in it. Recently I decided that if the mountain will not come to Mohammed, Mohammed must go to the mountain. So, I just started looking at examples and I found out that many examples don’t implement or even sometimes simply violate basics of RESTful style. I understand it could be simply because the major goal is to give an initial context how to use Serverless Framework, but I can’t help thinking that this reminds me a picture: How to draw a horse in 5 steps

Those examples might give a sense of false confidence that everything is easy and quick. But “The devil is in the detail”.

The last highlight before start talking about the subject: I’m not a ‘native speaker’ in Node.js and JavaScript. I have pretty good knowledge and broad experience in API design and Amazon Web Services so the major focus in this article is exactly on those fields.

The chosen example

Usually when I start learning a new topic I use a principal Monkey see, monkey do. One of the first example I found was an example about a ToDo application from Serverless Framework creators. All other examples I observed later didn’t look so attractive for this article due their additional complexity or additional knowledge the reader may require having. The ToDo application doesn’t have a complex data model so ORM frameworks like Amazon DynamoDB DataMapper for JavaScript or Dynamoose are not needed. The To-Do business model includes just five fields which can be described using Swagger notation as the following:

ToDo:
  type: object
  properties:
    id:
      type: string
      format: uuid
      description: "An ID. UUIDv4"
    text:
      type: string
      description: "Text of the To-Do"
    checked:
      type: boolean
      description: "Completed or not"
    createdAt:
      type: string
      format: timestamp
      description: "When the To-Do was created using unix timestamp format"
    updatedAt:
      type: string
      format: timestamp
      description: "When the To-Do was last time updated using unix timestamp format"
  required:
    - id
    - text
    - checked
    - createdAt
    - updatedAt

It’s worth to highlight that Swagger (a.k.a. OpenAPI v2) doesn’t support the unix timestamp format but because the format is an open value field it is still a valid definition.

The next section of the article is focused on several improvements in the following logical sequence:

The final source code can found on GitHub: serverless-rest-api-with-dynamodb.

REST API

One of the key benefits of RESTful architecture style is its nature focus on interoperability. This is achieved through relying on HTTP standard(s). The center document that describes HTTP protocol is the RFC-7231 “HTTP/1.1: Semantics and Content”. It defines PUT method as the following:

4.3.4. PUT

The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response…

In other words,

  1. the PUT method is supposed to transfer the whole resource state and shouldn’t be used for partial updates. For partial updates there is a PATCH method.
  2. PUT is an idempotent method, so that means each consequent request with the same body should lead to the same state of the resource.

If we take a look at the implementation of the PUT method in (original update.js), it expects only two fields text of the type of string and checked of the type of boolean. Additionally, each request will lead to a different value for the UpdatedAt field. Both violates the PUT definition cited above.

Thoughts provided on the thread “should I use PUT method for update, if I also update a timestamp attribute” on stack-overflow can be used a source of the solution for the highlighted problem. Specifically, both timestamps - the creation time or the last update time are in fact meta-information related to the resource but not a part of the resource itself. Meta-information is usually passed in HTTP using Headers. Thus, the suggested approach to fix the problem is to extract the createdAt and updatedAt fields from the body and transform them into custom headers.

It is wort to emphasize a critical assumption. The considering use case supposes the server calculates timestamps but not client. Another possible use case is a client controls the creation and the modification time and the server should accept the proposed time. For example, it might be a progressive application application (PWA) - an eCommerce storefront places an order but the backend server is unavailable so the PWA preserves creation time because promotion depends on order creation time. In such cases creation and modification timestamps should be considered as part of resources (i.e. an order).

Using Swagger notation, the desired view can be described as the following:

...

  /todos/{id}:
    get:
      summary: "Get To-Do By id"
      produces:
      - "application/json"
      parameters:
      - name: "id"
        in: "path"
        description: "An ID of To-Do"
        required: true
        type: "string"
      responses:
        200:
          description: "A To-Do was found"
          schema:
            $ref: "#/definitions/ToDo"
          headers:
            Todo-UpdatedAt:
              type: "string"
              description: "Date-time when ToDo was updated."
            Todo-CreatedAt:
              type: "string"
              description: "Date-time when ToDo was created."
...

definitions:
  ToDo:
    type: "object"
    required:
    - "checked"
    - "id"
    - "text"
    properties:
      checked:
        type: "boolean"
      id:
        type: "string"
        format: "uuid"
      text:
        type: "string"

A critical highlight: the proposed custom headers doesn’t start with the X- prefix. There is still a lot of confusion about iе. Historically, there was a recommendation to begin custom headers with X- so that a client is aware that a specific header is custom or not. According to RFC 6648: Deprecating the “X-” Prefix and Similar Constructs in Application Protocols publish already a quite long time ago in 2012, the usage of ‘X-’ prefix was deprecate:

Creators of new parameters to be used in the context of application protocols SHOULD NOT prefix their parameter names with “X-” or similar constructs.

The HTTP protocol already includes a header called Last-Modified. This fact supports the decision of using headers for timestamps. However, because there is no standard header with the ‘Created’ semantic two custom headers were introduced for the sake of consistency.

One may consider a tempting approach to change the HTTP method for the update operation to PATCH. From experience, I would recommend avoiding this. The PATCH method requires a special PATCH format for the body. This statement is described quite narratively in a brilliant article: Please. Don’t Patch Like An Idiot.. Even if JSON Merge Patch is decided to use I would still prefer using the PUT method for update and passing the whole state of the resource. The reason is in a dramatic difference in amount of work and complexity the PATCH-like requests require. This is critical especially in highly concurrent environments because:

Another topic to discuss is error handling due to failure in an external system. The example code (i.e. original get.js) uses a error code returned by DynamoDB or 501 if the first one is not available. The API best practice is exposure no information about an internal infrastructure (i.e. what managed service is used or what is DB engine). Also usage of 501 is not hardly justified for such case. 501 Not Implemented clearly doesn’t match to the required semantic. When an application depends on an external resource which is temporary unavailable it is recommended to use HTTP 502 status. For example, AWS experts recommend that in an article of the articles published in the AWS blog: Error Handling Patterns in Amazon API Gateway and AWS Lambda.

The last point is the proposed approach to move all syntax validation out of handlers and make AWS API Gateway be responsible for that. This should make handlers focus on the business logic and keep the validation automatically in sync with the documentation. This is the reason why the validation was removed from the code. More details about API documentation and validation using AWS API Gateway are provided in appropriate section below.

Create - POST

201 Created is usually a recommended for REST API response code for resource creation. This response code requires return the Location header with a URL pointing to a just created resource. Therefore, the resource is allowed to be a relative path (see RFC 7231: 7.1.2. Location) which might simplify frequently implementation.

Another typical question about POST request I heard was if the response should return the state of just created object. Personally, I find a response at stackoverflow worth reading and specifically the statements from the article makes sense:

To prevent an API consumer from having to hit the API again for an updated representation, have the API return the updated (or created) representation as part of the response.

The last consideration for the create operation is usage of ETag header. For the such small application as a ToDo list the ETag is not so critical for caching, but from the perspective of concurrent modifications ETag plays a crucial role. The approach to handle concurrent modification with help of If-Match and ETag are uncovered the next section.

Given all those consideration the source code for the create handler is the following (final create.js):

'use strict';

const uuid = require('uuid');
const AWS = require('aws-sdk'); // eslint-disable-line import/no-extraneous-dependencies
const common = require('./common');

const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.create = (event, context, callback) => {
    console.debug('Input event: ' + JSON.stringify(event));
    
    const timestamp = new Date().getTime();
    const data = JSON.parse(event.body);

    const todo = {
        id: uuid.v4(),
        text: data.text,
        checked: false,
    };

    const params = {
        TableName: process.env.DYNAMODB_TABLE,
        Item: {
            ...todo,
            etag: common.calculateEtag(todo),
            updatedAt: timestamp,
            createdAt: timestamp
        },
    };

    // Add an item to the database
    dynamoDb.put(params, (error) => {
        // handle potential errors
        if (error) {
            common.handleDynamoDbError(error, callback);
            return;
        }

        const item = params.Item;

        // create a response
        const response = {
            statusCode: 201,
            headers: {
                'Location': event.resource + "/" + item.id,
                ...common.prepareHeaders(item), 
            },
            body: JSON.stringify(common.convertDynamoItem(item)),
        };
        callback(null, response);
    });
};

Major highlights about the implementation:

Read - GET

Because the DynamoDB client returns an empty object if no item found by a provided primary key, the GET handler from the example (original get.js) responses with 200 and empty body. The expected behavior according to best practices should be returning 404 Not Found. From implementation perspective an additional check if the Item field exists in the result object from the DynamoDB client should be added.
The final code is presented below (final get.js):

'use strict';

const uuid = require('uuid');
const AWS = require('aws-sdk'); // eslint-disable-line import/no-extraneous-dependencies
const common = require('./common');

const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.get = (event, context, callback) => {
    console.debug('Input event: ' + JSON.stringify(event));
    
    const params = {
        TableName: process.env.DYNAMODB_TABLE,
        Key: {
            id: event.pathParameters.id,
        },
    };

    // fetch an item from the database
    dynamoDb.get(params, (error, result) => {
        // handle potential errors
        if (error) {
            common.handleDynamoDbError(error, callback);
            return;
        }

        // if ToDo is not found
        if (!('Item' in result)) {
            common.handleItemNotFound(event.pathParameters.id, callback);
            return;
        }

        const item = result.Item;

        const response = {
            statusCode: 200,
            body: JSON.stringify(common.convertDynamoItem(item)),
            headers: common.prepareHeaders(item)
        };

        callback(null, response);
    });
};

Update - PUT

The most critical part of any modification operation running in a concurrent environment is prevention of a race condition. The HTTP/1.1 standard defines a number of precondition headers to provide the optimistic lock capability (see the section 3. Precondition Header Fields). The overall approach is presented on a good level with examples and sequence diagrams in the article Optimistic Locking in a REST API. The code in final update.js includes only implementation for the If-Match header, the opposite If-None-Match header can be also added looking by the example.

From the implementation perspective, DynamoDB conditional expressions can be nicely leveraged to determine if a provided ETag value matches to the current state of the resource. So, if there is a presence of the If-Match header in the request the logic in Node.js handler adds up an additional condition to the DynamoDB condition expression and pass the provided ETag value.

Another issue with the initial example is that the PUT method will insert a new item into DynamoDB with no createdAt field if the operation is applied for a non-existing ID. This is because DynamoDB Update works using the Upsert logic (see From SQL to NoSQL: Modifying Data in a Table:

… UpdateItem behaves like an “upsert” operation: The item is updated if it exists in the table, but if not a new item is added (inserted).

To overcome the highlighted problem, another condition should be added to the DynamoDB request. The condition checks if the updated item has an id field equal to the one from the request (see, for example, a discussion here).

This leads in its turn to the following issue: if both conditions are applied in the same DynamoDB request then in case of any of them fails there is no way to recognize which of those caused an error. This is a known limitation of DynamoDB. In this case I was not able to find any better option than to make another attempt to retrieve the item. If the item is not found then the failed condition was exactly due nonexistence of the item. Even if there is race condition and the item was presented during the first update request now it is definitely gone and 404 can be returned.

For the PUT method both 200 and 204 response code are acceptable. The preference depends an additional details. Particularly, if there were any optional fields with default values or calculated on the server-side ones then it makes sense to return 200 and the resource state. Another reason preferring 200 is the one discussed in the post REST lesson learned: Avoid 204 responses by Mark Seemann. I wouldn’t agree with all the statements made of the author, for instance according to the description for 204 No Content the server still may return ETag header with no body. Given the fact that there is no default or calculated fields in the resource state 204 No Content should work well.

The final view of the update handler is provided below (final update.js):

'use strict';

const uuid = require('uuid');
const AWS = require('aws-sdk'); // eslint-disable-line import/no-extraneous-dependencies
const common = require('./common');

const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.update = (event, context, callback) => {
    console.debug('Input event: ' + JSON.stringify(event));

    const timestamp = new Date().getTime();
    const data = JSON.parse(event.body);

    const item = {
        ...data,
        id: event.pathParameters.id,
        etag: common.calculateEtag(data),
        updatedAt: timestamp
    }

    var params = {
        TableName: process.env.DYNAMODB_TABLE,
        Key: {
            id: item.id,
        },
        UpdateExpression: 'SET #todo_text = :text, checked = :checked, updatedAt = :updatedAt, etag = :newEtag',
        ExpressionAttributeNames: {
            '#todo_text': 'text',
        },
        ConditionExpression: 'id = :id',
        ExpressionAttributeValues: {
            ':id': item.id,
            ':text': item.text,
            ':checked': item.checked,
            ':updatedAt': item.updatedAt,
            ':newEtag': item.etag
          },
        ReturnValues: 'ALL_NEW', // return all after update
    };

    // if there is 'If-Match' header in the request add conditional update to the request
    if (event.headers && 'If-Match' in event.headers) {
        params.ConditionExpression += ' and etag = :etag';
        params.ExpressionAttributeValues[':etag'] = event.headers['If-Match'];
    }

    // update the ADR in the database
    dynamoDb.update(params, (error, result) => {
        // handle potential errors
        if (error) {
            // if the failure is due to the conditional check
            if (error.code === 'ConditionalCheckFailedException') {
                handleConditionalCheckFailedException(event, item, callback)
            } else { // something else was wrong
                common.handleDynamoDbError(error, callback);
            }

            return;
        }
        
        const newItem = result.Attributes;

        console.debug(JSON.stringify(newItem));
        
        // create a response
        const response = {
            statusCode: 204,
            headers: common.prepareHeaders(newItem),
        };
    
        callback(null, response);

        console.debug(JSON.stringify(response));
    });
};

function handleConditionalCheckFailedException(event, item, callback) {
    // if no 'If-Match' header then clearly the item was not found
    if (!(event.headers && 'If-Match' in event.headers)) {
        common.handleItemNotFound(item.id, callback);
        return;   
    }

    // in another case there is no nice way to check which condition failed 
    // except only to try to retrieve item by ID
    const params = {
            TableName: process.env.DYNAMODB_TABLE,
            Key: {
                id: item.id,
            },
        };
    
    // fetch an TODO from the database
    dynamoDb.get(params, (error, result) => {
        // handle potential errors
        if (error) {
            common.handleDynamoDbError(error, callback);
            return;
        } 

        // ToDo is not found
        if (!('Item' in result)) {
            common.handleItemNotFound(item.id, callback);
            return;
        }

        console.debug(JSON.stringify(result.Item));

        // the item was found so it was etag condition
        common.handleOptimisticLockFailed(callback);
    });
}

Delete - DELETE

A similar issue with 2xx response if the item is not found exists in the original delete.js. The cause is again in the DynamoDB behavior, if the item doesn’t exist DynamoDB just returns Ok with no warning or error. The fix is to ask DynamoDB to provide all fields of a just deleted item (ReturnValues: 'ALL_OLD'). If the item is not found then the field Attributes in the response is not presented. In this case the handler returns 404 Not Found.

The DELETE method usually returns no body, so 204 No Content should be fine. For the sake of simplicity, the If-Match header was not implemented. It can be easily added using the same ideas presented for the update operation.

The final code is presented below (final delete.js):

'use strict';

const uuid = require('uuid');
const AWS = require('aws-sdk'); // eslint-disable-line import/no-extraneous-dependencies
const common = require('./common');

const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.delete = (event, context, callback) => {
    console.debug('Input event: ' + JSON.stringify(event));

    const params = {
        TableName: process.env.DYNAMODB_TABLE,
        Key: {
            id: event.pathParameters.id,
        },
        ReturnValues: 'ALL_OLD'
    };

    // delete the todo from the database
    dynamoDb.delete(params, (error, result) => {
        // handle potential errors
        if (error) {
            common.handleDynamoDbError(error, callback);
            return;
        }

        if (!('Attributes' in result)) {
            common.handleItemNotFound(event.pathParameters.id, callback);
            return;
        }
        
        // create a response
        const response = {
            statusCode: 204,
        };
        callback(null, response);
    });
};

List - GET

The initial implementation of the list operation (list.js) is a bit useless. The typical purpose of GET applied on the list of resources in REST API is all items retrieval using some filtering (and pagination, ideally). For demonstration purposes a filtering by checked status was added. Now API clients can request all not completed ToDo using /todos?checked=false.

Another critical consideration is that according to the DynamoDB documentation “… a query and scan operation returns a maximum 1 MB of data in a single operation.. " so if there is a huge list multiple scans should be performed. This can be done using an algorithm described in the article Working with Queries in DynamoDB: Paginating the Results.

'use strict';

const AWS = require('aws-sdk'); // eslint-disable-line import/no-extraneous-dependencies
const common = require('./common');

const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.list = (event, context, callback) => {
    console.debug('Input event: ' + JSON.stringify(event));

    var params = {
        TableName: process.env.DYNAMODB_TABLE,
    };

    if (event.queryStringParameters && event.queryStringParameters.checked) {
        params.FilterExpression = 'checked = :checked';
        params.ExpressionAttributeValues = {':checked' : event.queryStringParameters.checked == 'true'};
    }

    // fetch all ToDo by creteria from the database
    dynamoDb.scan(params, (error, result) => {
        // handle potential errors
        if (error) {
            common.handleDynamoDbError(error, callback);
            return;
        }

        // create a response
        const response = {
            statusCode: 200,
            body: JSON.stringify(result.Items.map(common.convertDynamoItem)),
        };
        
        callback(null, response);
    });
};

API Documentation

Any good API should come with a documentation whether the API is open or internal one. Serverless Framework has a growing number of plugins. One of them, Serverless AWS Documentation plugin can be used to attach a specification to AWS API Gateway. The plugin configuration is provided in the same serverless.yml file in a section custom.documentation. The example file serverless.yml provided by the creator and the documentation give all the required information to add quickly documentation capabilities to the application if you know Swagger notation pretty like me.

A nice trick to remove duplication in the serverless.yml file was suggested by the plugin author. The repeated parts were extracted as additional elements in the custom documentation section and then they were injected in each necessary place using the directive ${}. This way all error responses and common headers were specified:

custom:
  documentation:
..
    
    commonHeaders:
      TodoCreatedAt:
        name: Todo-CreatedAt
        description: Date-time when ToDo was created.
        type: string
        format: date-time

      TodoUpdatedAt:
        name: Todo-UpdatedAt
        description: Date-time when ToDo was updated.
        type: string
        format: date-time

      ETag: 
        name: ETag
        description: ETag header as it is defined in https://tools.ietf.org/html/rfc7232
        type: string
    commonErrorResponses:
      502Error:
        statusCode: '502'
        description: An error happened in an external persistance service 
        responseModels:
          "application/json": ErrorResponce
      400Error:
        statusCode: '400'
        description: A validation error occured
        responseModels:
          "application/json": ErrorResponce  
      404Error:
        statusCode: '404'
        description: A To-Do was not found
        responseModels:
          "application/json": ErrorResponce
      412Error:
        statusCode: '412'
        description: An conditional update failed
        responseModels:
          "application/json": ErrorResponce
..
functions:
  create:
    handler: todos/create.create
    events:
      - http:
          path: todos
          method: post
          cors: true
          reqValidatorName: 'RequestValidator'
          documentation:
            summary: Create a new To-Do
            requestModels:
               "application/json": ToDoCreate
            methodResponses:
              -
                statusCode: '201'
                description: A resource was created successfully
                responseModels:
                  "application/json": ToDo
                responseHeaders:
                  -
                    name: Location
                    description: Link to the created resource
                    type: string
                  - ${self:custom.documentation.commonHeaders.TodoCreatedAt}
                  - ${self:custom.documentation.commonHeaders.TodoUpdatedAt}
                  - ${self:custom.documentation.commonHeaders.ETag}
              - ${self:custom.documentation.commonErrorResponses.502Error}
              - ${self:custom.documentation.commonErrorResponses.400Error}
..

The following command can be used to generate Swagger file:

$ serverless downloadDocumentation --outputFileName=swagger.yaml

The file can be published, for example, on a Developer portal.

It is worth to draw attention there is a known bug. If CORS configuration is enabled the plugin generates a documentation for OPTIONS method, but the generated Swagger doesn’t include required path parameters. The standard Swagger editor shows an error massage about that but it doesn’t prevent it from displaying the documentation.

Validation on the API Gateway

AWS API Gateway has a built-in capability to perform message validation based on an attached JSON schema. That helps to remove the boilerplate validation logic from handlers and make them focus on the business logic. Although the Serverless Framework has added a validation capability (see AWS API Gateway: Request Schema Validation but the configuration follows the same approach as AWS API Gateway. Particularly, each method should include a JSON schema (or a file with it). That makes a developer who uses the Serverless AWS Documentation plugin do double work: first define the models in serverless.yml using a Swagger-like notation and then add for each method JSON schema description additionally. Lucky, there is another plugin for Serverless Framework that utilizes the same documentation for validation too. The plugin is called Serverless Reqvalidator Plugin. It requires minimum configuration after the installation:

..
    RequestValidator:  
      Type: "AWS::ApiGateway::RequestValidator"
      Properties:
        Name: 'my-req-validator'
        RestApiId: 
          Ref: ApiGatewayRestApi
        ValidateRequestBody: true
        ValidateRequestParameters: true
..

The configuration requires enabling the request body and parameters validation. After that any method can be marked to use the defined validator:

..
  list:
    handler: todos/list.list
    events:
      - http:
          path: todos
          method: get
          cors: true
          reqValidatorName: 'RequestValidator'
..

Conclusion

The article demonstrates what should be added to a Serverless Framework-based backend to consider it as a RESTful web service implemented according to best practices. Those steps are only a part but not everything for a production-ready REST API. Other topics not discussed in the article but should be addressed are the following:

  1. Authentication and Authorization. For a Serverless application preferably to use SaaS products like Amazon Cognito or Auth0.
  2. Other security aspects like spike arrests (see an article for AWS API Gateway called Throttle API Requests for Better Throughput), or the application level protection from common web exploits and here AWS Web Application Firewall can be leveraged (see an article Protecting your API using Amazon API Gateway and AWS WAF — Part I).
  3. Automated REST API testing (for example, using Karate).
  4. API backward compatibility testing - assure that there is no introduced breaking changes. (another good article Backward Compatibility Check for REST APIs.

It should be noted that although the overall number of lines of code was increased, but with help of Serverless Framework the application is still compact, easy to read and extremely easy to deploy. The Serverless Framework plugin system allows to add nice features like demonstrated above API documentation and validation based on it.

comments powered by Disqus