All the Ops you need to know to Dev Serverless Chris Munns – Principal Developer Advocate – AWS Serverless © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A presentation at Serverless Days - Toronto in October 2018 in Toronto, ON, Canada by Chris Munns
All the Ops you need to know to Dev Serverless Chris Munns – Principal Developer Advocate – AWS Serverless © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
About me: Chris Munns - munns@amazon.com, @chrismunns • Principal Developer Advocate - Serverless • New Yorker • Previously: • AWS Business Development Manager – DevOps, July ’15 - Feb ‘17 • AWS Solutions Architect Nov, 2011- Dec 2014 • Formerly on operations teams @Etsy and @Meetup • Little time at a hedge fund, Xerox and a few other startups • Rochester Institute of Technology: Applied Networking and Systems Administration ’05 • Internet infrastructure geek © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why are we here today? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/mgifford/4525333972
Serverless means… No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless applications EVENT SOURCE Changes in data state Requests to endpoints Changes in resource state © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FUNCTION Node.js Python Java C# Go SERVICES (ANYTHING)
Two common cohorts of new serverless users Developers who need to learn operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations folks who need to learn development
Y-Hack 2013 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/psd/4389135567/
Two common cohorts of new serverless users Developers I’m more one who need to of these learn folks! ! operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations folks who need to learn development
Two common cohorts of new serverless users Developers who need to learn operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations Going to folksmore who focus need today to on learn development these folks
4 key operational areas • • • • Availability Networking Security, Governance, Auditing Monitoring, Metrics, Logs, Performance © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Basic Serverless API technology stack AWS Mobile/Web apps Internet © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon API Gateway AWS Lambda functions Amazon DynamoDB
Is this application available? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Then our application gets some traffic… © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is this application available? Ok! 100% Available! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is this application available? For 16 invocations for about 1 second… © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is this application available? Availability is also a shared responsibility between AWS and you • • If you misconfigure API Gateway and Lambda is fine, what is your availability of Lambda? If your downstream services are DDOS’d what layer’s fault is it? • • More importantly, where do you resolve it? If you run into concurrency limits but everything else is fine, is that an availability issue? • Concurrent executions is very much a soft limit! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concurrency controls • • • • Concurrency is a shared pool by default Separate using per function concurrency settings • Acts as reservation Also acts as max concurrency per function • Especially critical for data sources like RDS “Kill switch” – set per function concurrency to zero © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is this application available? Availability means something different in serverless applications than it does for traditional “server-full” applications: • • Availability only exists at the time in invocation and so availability becomes a % related to total invocations vs. a time period based metric The failure of downstream service(s) need to be handled/reported in a way to potentially lead to retries/safe handling • Retries can further confuse this (depending on invoke source) • ie. if I fail twice and the second retry succeeds, what’s my availability? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking
Lambda API API provided by the Lambda service SDK clients
Lambda networking Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda networking Invocations can only come in via the AWS Lambda API Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda networking Invocations can only come in via the AWS Lambda API Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today that API is available publicly in the region Lambda is running
Lambda networking with a customer configured VPC Lambda function execution environment elastic network interface AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda networking with a customer configured VPC Customer configured/managed VPC. Customer controls Security elastic network ACLs, Route Groups, Network interface Tables Completely managed by the Lambda AWS Lambda team function execution environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda networking with a customer configured VPC Invocations still can only Lambda come in via the AWS Lambda function API execution elastic network interface environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda networking with a customer configured VPC Invocations still can only Lambda come in via the AWS Lambda function API execution Even with a private API elastic Gateway endpoint or a VPC network Endpoint provided service interface environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Do I need to put my functions in an Amazon VPC? Putting your functions inside of a VPC provides little extra security benefit to your AWS Lambda functions © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Do I need a VPC? Should my Lambda function be in a VPC? Does my function need to access any specific resources in a VPC? No Don’t put the function in a VPC © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yes Does it also need to access resources or services in the public internet? No Put the function in a private subnet Yes Put the function in a subnet with a NAT’d route to the internet
Do I need a VPC? Should my Lambda function be in a VPC? Do I need to restrict outbound access from my function to the internet? No Don’t put the function in a VPC © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yes Put the function in a private subnet
Basic VPC Design NAT per <——- AZ ——-> VPC Lambda Subnets ————-> Other Subnets ————-> VPC NAT gateway VPC NAT gateway Subnet Subnet Subnet Subnet Availability Zone A Availability Zone B
Basic VPC Design • ALWAYS configure a minimum of 2 Availability Zones • Give your Lambda functions their own subnets • Give your Lambda subnets a large IP range to handle potential scale • If your functions need to talk to a resource on the internet, you need a NAT! • ENIs are a pain, we know, we’re working on it ! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Securing your Serverless Infrastructure © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Markus Spiske on Unsplash
Lambda permissions model Fine grained security controls for both execution and invocation: Execution policies: • Define what AWS resources/API calls can this function access via IAM • Used in streaming invocations • E.g. “Lambda function A can read from DynamoDB table users” Function policies: • • • Used for sync and async invocations E.g. “Actions on bucket X can invoke Lambda function Z” Resource policies allow for cross account access
“Action”: “s3:*” makes puppies cry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Matthew Henry on Unsplash
Meet SAM!
AWS SAM Templates <-THIS BECOMES THIS-> From: https://github.com/awslabs/aws-serverless-samfarm/blob/master/api/saml.yaml © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS SAM Policy Templates MyQueueFunction: Type: AWS::Serverless::Function Properties: … Policies: # Gives permissions to poll an SQS Queue - SQSPollerPolicy: queueName: !Ref MyQueue … MyQueue: Type: AWS::SQS::Queue … © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SAM Policy Templates 40+ predefined policies All found here: https://bit.ly/2xWycnj © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IAM + Lambda best practices • Where/when possible try to leverage the pre-created managed policies that exist today • If you are doing “service:*” be REALLY REALLY REALLY sure that’s what you should and need to do • Keep tight lockdown on who/what can invoke functions © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BONUS: Hardcoded secrets make fish cry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Julieann Ragojo on Unsplash
Lambda Environment Variables • Key-value pairs that you can dynamically pass to your function • Available via standard environment variable APIs such as process.env for Node.js or os.environ for Python • Can optionally be encrypted via AWS Key Management Service (KMS) • Allows you to specify in IAM what roles have access to the keys to decrypt the information • Useful for creating environments per stage (i.e. dev, testing, production) © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Systems Manager – Parameter Store Centralized store to manage your configuration data • supports hierarchies • plain-text or encrypted with KMS • Can send notifications of changes to Amazon SNS/ AWS Lambda • Can be secured with IAM • Calls recorded in CloudTrail • Can be tagged • Available via API/SDK Useful for: centralized environment variables, secrets control, feature flags from future import print_function import json import boto3 ssm = boto3.client(‘ssm’, ‘us-east-1’) def get_parameters(): response = ssm.get_parameters( Names=[‘LambdaSecureString’],WithDe cryption=True ) for parameter in response[‘Parameters’]: return parameter[‘Value’] def lambda_handler(event, context): value = get_parameters() print(“value1 = ” + value) return value # Echo back the first key value
AWS Systems Manager – Parameter Store from future import print_function Centralized store to manage your import json configuration data import boto3 ssm = boto3.client(‘ssm’, ‘us-east-1’) • supports hierarchies • plain-text or encrypted with KMS def get_parameters(): response = ssm.get_parameters( • Can send notifications of changes Names=[‘LambdaSecureString’],WithDe to Amazon SNS/ AWS Lambda cryption=True • Can be secured with IAM ) for parameter in • Calls recorded in CloudTrail response[‘Parameters’]: return parameter[‘Value’] • Can be tagged • Available via API/SDK def lambda_handler(event, context): value = get_parameters() Useful for: centralized environment print(“value1 = ” + value) variables, secrets control, feature return value # Echo back the first key #somuchawesome flags value
Weee eeeee Fun with logs and metrics © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/ocarchives/5333790414
Metrics and logging are a universal right! CloudWatch Metrics: • 7 Built in metrics for Lambda • • • Invocation Count, Invocation duration, Invocation errors, Throttled Invocation, Iterator Age, DLQ Errors, Concurrency Can call “put-metric-data” from your function code for custom metrics 7 Built in metrics for API-Gateway • • API Calls Count, Latency, 4XXs, 5XXs, Integration Latency, Cache Hit Count, Cache Miss Count Error and Cache metrics support averages and percen,les © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Metrics and logging are a universal right! CloudWatch Logs: • • API Gateway Logging • • 2 Levels of logging, ERROR and INFO Optionally log method request/body content • Set globally in stage, or override per method Lambda Logging • • • Log Pivots • • • Logging directly from your code with your language’s equivalent of console.log() Basic request information included Build metrics based on log filters Jump to logs that generated metrics Export logs to AWS ElastiCache or S3 • Explore with Kibana or Athena/QuickSight
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/joeross/6544781203
Dashboards © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/joeross/6544781203
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dashboarding tips Make all metrics available – Good news, CloudWatch makes it easy! Focus main landing/“on tv” dashboards on core user/business driven metrics • “If this metric goes up, does it directly correlate with a user having a problem?” Make as many metrics available across team/function as possible • You can now embed CloudWatch “snapshots” in emails and other places! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dashboarding tips Make all metrics available – Good news, CloudWatch makes it easy! Focus main landing/“on tv” dashboards on core user/business driven metrics • “If this metric goes up, does it directly correlate with a user having a problem?” • Make as many metrics available across team/function as possible • You can now embed CloudWatch “snapshots” in emails and other places! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tweak your function’s computer power Lambda exposes only a memory control, with the % of CPU core and network capacity allocated to a function proportionally Is your code CPU, Network or memory-bound? If so, it could be cheaper to choose more memory.
Smart resource allocation Match resource allocation (up to 3 GB!) to logic Stats for Lambda function that calculates 1000 times all prime numbers <= 1000000 128 MB 11.722965sec $0.024628 256 MB 6.678945sec $0.028035 512 MB 3.194954sec $0.026830 1024 MB 1.465984sec $0.024638 Green==Best Red==Worst © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Smart resource allocation Match resource allocation (up to 3 GB!) to logic Stats for Lambda function that calculates 1000 times all prime numbers <= 1000000 128 MB 11.722965sec $0.024628 256 MB 6.678945sec $0.028035 -10.256981sec +$0.00001 512 MB 3.194954sec $0.026830 1024 MB 1.465984sec $0.024638 Green==Best Red==Worst © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Impact of a memory change 50% increase in memory 95th percentile changes from 3s to 2.1s https://blog.newrelic.com/2017/06/20/lambda-functions-xray-traces-custom-serverless-metrics/
Multithreading? Maybe! • <1.8GB is still single core • CPU bound workloads won’t see gains – processes share same resources • >1.8GB is multi core • CPU bound workloads will gains, but need to multi thread • I/O bound workloads WILL likely see gains • e.g. parallel calculations to return © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS X-Ray Integration with Serverless • API Gateway inserts a tracing header into HTTP calls as well as reports data back to X-Ray itself • Lambda instruments incoming requests for all supported languages and can capture calls made in code var AWSXRay = require(‘aws-xray-sdk-core‘); AWSXRay.middleware.setSamplingRules(‘sampling-rules.json’); var AWS = AWSXRay.captureAWS(require(‘aws-sdk’)); S3Client = AWS.S3(); © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
X-Ray Trace Example © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do I figure out what’s wrong? These tools are here, so use them! 1. Turn on X-Ray now 1. look at wrapping your own calls with it via the X-Ray SDKs 2. Don’t underestimate the power of logging in Lambda 1. Simple “debug: in functionX” statements work great and are easy to find in CloudWatch Logs 3. The most valuable metrics are the ones closest to your customer/use-case 1. How many gizmos did this function call/create/process/etc © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIN/ACK 4 key operational areas: • Availability • Networking • Security, Governance, Auditing • Monitoring, Metrics, Logs, Performance © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Chris Munns munns@amazon.com @chrismunns © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://www.flickr.com/photos/theredproject/3302110152/
? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/dullhunk/202872717/