All the Ops you need to know to Dev Serverless

A presentation at Serverless Days - Toronto in October 2018 in Toronto, ON, Canada by Chris Munns

Slide 1

Slide 1

All the Ops you need to know to Dev Serverless Chris Munns – Principal Developer Advocate – AWS Serverless © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 2

Slide 2

About me: Chris Munns - munns@amazon.com, @chrismunns • Principal Developer Advocate - Serverless • New Yorker • Previously: • AWS Business Development Manager – DevOps, July ’15 - Feb ‘17 • AWS Solutions Architect Nov, 2011- Dec 2014 • Formerly on operations teams @Etsy and @Meetup • Little time at a hedge fund, Xerox and a few other startups • Rochester Institute of Technology: Applied Networking and Systems Administration ’05 • Internet infrastructure geek © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 3

Slide 3

Why are we here today? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/mgifford/4525333972

Slide 4

Slide 4

Serverless means… No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 5

Slide 5

Serverless applications EVENT SOURCE Changes in data state Requests to endpoints Changes in resource state © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FUNCTION Node.js Python Java C# Go SERVICES (ANYTHING)

Slide 6

Slide 6

Two common cohorts of new serverless users Developers who need to learn operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations folks who need to learn development

Slide 7

Slide 7

Y-Hack 2013 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/psd/4389135567/

Slide 8

Slide 8

Two common cohorts of new serverless users Developers I’m more one who need to of these learn folks! ! operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations folks who need to learn development

Slide 9

Slide 9

Two common cohorts of new serverless users Developers who need to learn operations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operations Going to folksmore who focus need today to on learn development these folks

Slide 10

Slide 10

4 key operational areas • • • • Availability Networking Security, Governance, Auditing Monitoring, Metrics, Logs, Performance © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 11

Slide 11

Basic Serverless API technology stack AWS Mobile/Web apps Internet © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon API Gateway AWS Lambda functions Amazon DynamoDB

Slide 12

Slide 12

Is this application available? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 13

Slide 13

Then our application gets some traffic… © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 14

Slide 14

Is this application available? Ok! 100% Available! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 15

Slide 15

Is this application available? For 16 invocations for about 1 second… © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 16

Slide 16

Is this application available? Availability is also a shared responsibility between AWS and you • • If you misconfigure API Gateway and Lambda is fine, what is your availability of Lambda? If your downstream services are DDOS’d what layer’s fault is it? • • More importantly, where do you resolve it? If you run into concurrency limits but everything else is fine, is that an availability issue? • Concurrent executions is very much a soft limit! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 17

Slide 17

Concurrency controls • • • • Concurrency is a shared pool by default Separate using per function concurrency settings • Acts as reservation Also acts as max concurrency per function • Especially critical for data sources like RDS “Kill switch” – set per function concurrency to zero © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 18

Slide 18

Is this application available? Availability means something different in serverless applications than it does for traditional “server-full” applications: • • Availability only exists at the time in invocation and so availability becomes a % related to total invocations vs. a time period based metric The failure of downstream service(s) need to be handled/reported in a way to potentially lead to retries/safe handling • Retries can further confuse this (depending on invoke source) • ie. if I fail twice and the second retry succeeds, what’s my availability? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 19

Slide 19

Networking

Slide 20

Slide 20

Lambda API API provided by the Lambda service SDK clients

  1. Lambda directly invoked via invoke API Lambda function Used by all other services that invoke Lambda across all models Supports sync and async Can pass any event payload structure you want Client included in every SDK © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 21

Slide 21

Lambda networking Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 22

Slide 22

Lambda networking Invocations can only come in via the AWS Lambda API Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 23

Slide 23

Lambda networking Invocations can only come in via the AWS Lambda API Lambda function execution environment AWS Lambda VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today that API is available publicly in the region Lambda is running

Slide 24

Slide 24

Lambda networking with a customer configured VPC Lambda function execution environment elastic network interface AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 25

Slide 25

Lambda networking with a customer configured VPC Customer configured/managed VPC. Customer controls Security elastic network ACLs, Route Groups, Network interface Tables Completely managed by the Lambda AWS Lambda team function execution environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 26

Slide 26

Lambda networking with a customer configured VPC Invocations still can only Lambda come in via the AWS Lambda function API execution elastic network interface environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 27

Slide 27

Lambda networking with a customer configured VPC Invocations still can only Lambda come in via the AWS Lambda function API execution Even with a private API elastic Gateway endpoint or a VPC network Endpoint provided service interface environment AWS Lambda VPC Customer VPC region © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 28

Slide 28

Do I need to put my functions in an Amazon VPC? Putting your functions inside of a VPC provides little extra security benefit to your AWS Lambda functions © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 29

Slide 29

Do I need a VPC? Should my Lambda function be in a VPC? Does my function need to access any specific resources in a VPC? No Don’t put the function in a VPC © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yes Does it also need to access resources or services in the public internet? No Put the function in a private subnet Yes Put the function in a subnet with a NAT’d route to the internet

Slide 30

Slide 30

Do I need a VPC? Should my Lambda function be in a VPC? Do I need to restrict outbound access from my function to the internet? No Don’t put the function in a VPC © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yes Put the function in a private subnet

Slide 31

Slide 31

Basic VPC Design NAT per <——- AZ ——-> VPC Lambda Subnets ————-> Other Subnets ————-> VPC NAT gateway VPC NAT gateway Subnet Subnet Subnet Subnet Availability Zone A Availability Zone B

Slide 32

Slide 32

Basic VPC Design • ALWAYS configure a minimum of 2 Availability Zones • Give your Lambda functions their own subnets • Give your Lambda subnets a large IP range to handle potential scale • If your functions need to talk to a resource on the internet, you need a NAT! • ENIs are a pain, we know, we’re working on it ! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 33

Slide 33

Securing your Serverless Infrastructure © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Markus Spiske on Unsplash

Slide 34

Slide 34

Lambda permissions model Fine grained security controls for both execution and invocation: Execution policies: • Define what AWS resources/API calls can this function access via IAM • Used in streaming invocations • E.g. “Lambda function A can read from DynamoDB table users” Function policies: • • • Used for sync and async invocations E.g. “Actions on bucket X can invoke Lambda function Z” Resource policies allow for cross account access

Slide 35

Slide 35

“Action”: “s3:*” makes puppies cry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Matthew Henry on Unsplash

Slide 36

Slide 36

Meet SAM!

Slide 37

Slide 37

AWS SAM Templates <-THIS BECOMES THIS-> From: https://github.com/awslabs/aws-serverless-samfarm/blob/master/api/saml.yaml © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 38

Slide 38

AWS SAM Policy Templates MyQueueFunction: Type: AWS::Serverless::Function Properties: … Policies: # Gives permissions to poll an SQS Queue - SQSPollerPolicy: queueName: !Ref MyQueue … MyQueue: Type: AWS::SQS::Queue … © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 39

Slide 39

SAM Policy Templates 40+ predefined policies All found here: https://bit.ly/2xWycnj © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 40

Slide 40

IAM + Lambda best practices • Where/when possible try to leverage the pre-created managed policies that exist today • If you are doing “service:*” be REALLY REALLY REALLY sure that’s what you should and need to do • Keep tight lockdown on who/what can invoke functions © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 41

Slide 41

I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events I will turn on CloudTrail, Config, and CloudTrail Data Events © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 42

Slide 42

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 43

Slide 43

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 44

Slide 44

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 45

Slide 45

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 46

Slide 46

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 47

Slide 47

BONUS: Hardcoded secrets make fish cry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Julieann Ragojo on Unsplash

Slide 48

Slide 48

Lambda Environment Variables • Key-value pairs that you can dynamically pass to your function • Available via standard environment variable APIs such as process.env for Node.js or os.environ for Python • Can optionally be encrypted via AWS Key Management Service (KMS) • Allows you to specify in IAM what roles have access to the keys to decrypt the information • Useful for creating environments per stage (i.e. dev, testing, production) © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 49

Slide 49

AWS Systems Manager – Parameter Store Centralized store to manage your configuration data • supports hierarchies • plain-text or encrypted with KMS • Can send notifications of changes to Amazon SNS/ AWS Lambda • Can be secured with IAM • Calls recorded in CloudTrail • Can be tagged • Available via API/SDK Useful for: centralized environment variables, secrets control, feature flags from future import print_function import json import boto3 ssm = boto3.client(‘ssm’, ‘us-east-1’) def get_parameters(): response = ssm.get_parameters( Names=[‘LambdaSecureString’],WithDe cryption=True ) for parameter in response[‘Parameters’]: return parameter[‘Value’] def lambda_handler(event, context): value = get_parameters() print(“value1 = ” + value) return value # Echo back the first key value

Slide 50

Slide 50

AWS Systems Manager – Parameter Store from future import print_function Centralized store to manage your import json configuration data import boto3 ssm = boto3.client(‘ssm’, ‘us-east-1’) • supports hierarchies • plain-text or encrypted with KMS def get_parameters(): response = ssm.get_parameters( • Can send notifications of changes Names=[‘LambdaSecureString’],WithDe to Amazon SNS/ AWS Lambda cryption=True • Can be secured with IAM ) for parameter in • Calls recorded in CloudTrail response[‘Parameters’]: return parameter[‘Value’] • Can be tagged • Available via API/SDK def lambda_handler(event, context): value = get_parameters() Useful for: centralized environment print(“value1 = ” + value) variables, secrets control, feature return value # Echo back the first key #somuchawesome flags value

Slide 51

Slide 51

Weee eeeee Fun with logs and metrics © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/ocarchives/5333790414

Slide 52

Slide 52

Metrics and logging are a universal right! CloudWatch Metrics: • 7 Built in metrics for Lambda • • • Invocation Count, Invocation duration, Invocation errors, Throttled Invocation, Iterator Age, DLQ Errors, Concurrency Can call “put-metric-data” from your function code for custom metrics 7 Built in metrics for API-Gateway • • API Calls Count, Latency, 4XXs, 5XXs, Integration Latency, Cache Hit Count, Cache Miss Count Error and Cache metrics support averages and percen,les © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 53

Slide 53

Metrics and logging are a universal right! CloudWatch Logs: • • API Gateway Logging • • 2 Levels of logging, ERROR and INFO Optionally log method request/body content • Set globally in stage, or override per method Lambda Logging • • • Log Pivots • • • Logging directly from your code with your language’s equivalent of console.log() Basic request information included Build metrics based on log filters Jump to logs that generated metrics Export logs to AWS ElastiCache or S3 • Explore with Kibana or Athena/QuickSight

Slide 54

Slide 54

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/joeross/6544781203

Slide 55

Slide 55

Dashboards © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/joeross/6544781203

Slide 56

Slide 56

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 57

Slide 57

Dashboarding tips Make all metrics available – Good news, CloudWatch makes it easy! Focus main landing/“on tv” dashboards on core user/business driven metrics • “If this metric goes up, does it directly correlate with a user having a problem?” Make as many metrics available across team/function as possible • You can now embed CloudWatch “snapshots” in emails and other places! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 58

Slide 58

Dashboarding tips Make all metrics available – Good news, CloudWatch makes it easy! Focus main landing/“on tv” dashboards on core user/business driven metrics • “If this metric goes up, does it directly correlate with a user having a problem?” • Make as many metrics available across team/function as possible • You can now embed CloudWatch “snapshots” in emails and other places! © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 59

Slide 59

Tweak your function’s computer power Lambda exposes only a memory control, with the % of CPU core and network capacity allocated to a function proportionally Is your code CPU, Network or memory-bound? If so, it could be cheaper to choose more memory.

Slide 60

Slide 60

Smart resource allocation Match resource allocation (up to 3 GB!) to logic Stats for Lambda function that calculates 1000 times all prime numbers <= 1000000 128 MB 11.722965sec $0.024628 256 MB 6.678945sec $0.028035 512 MB 3.194954sec $0.026830 1024 MB 1.465984sec $0.024638 Green==Best Red==Worst © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 61

Slide 61

Smart resource allocation Match resource allocation (up to 3 GB!) to logic Stats for Lambda function that calculates 1000 times all prime numbers <= 1000000 128 MB 11.722965sec $0.024628 256 MB 6.678945sec $0.028035 -10.256981sec +$0.00001 512 MB 3.194954sec $0.026830 1024 MB 1.465984sec $0.024638 Green==Best Red==Worst © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 62

Slide 62

Impact of a memory change 50% increase in memory 95th percentile changes from 3s to 2.1s https://blog.newrelic.com/2017/06/20/lambda-functions-xray-traces-custom-serverless-metrics/

Slide 63

Slide 63

Multithreading? Maybe! • <1.8GB is still single core • CPU bound workloads won’t see gains – processes share same resources • >1.8GB is multi core • CPU bound workloads will gains, but need to multi thread • I/O bound workloads WILL likely see gains • e.g. parallel calculations to return © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 64

Slide 64

AWS X-Ray Integration with Serverless • API Gateway inserts a tracing header into HTTP calls as well as reports data back to X-Ray itself • Lambda instruments incoming requests for all supported languages and can capture calls made in code var AWSXRay = require(‘aws-xray-sdk-core‘); AWSXRay.middleware.setSamplingRules(‘sampling-rules.json’); var AWS = AWSXRay.captureAWS(require(‘aws-sdk’)); S3Client = AWS.S3(); © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 65

Slide 65

X-Ray Trace Example © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 66

Slide 66

How do I figure out what’s wrong? These tools are here, so use them! 1. Turn on X-Ray now 1. look at wrapping your own calls with it via the X-Ray SDKs 2. Don’t underestimate the power of logging in Lambda 1. Simple “debug: in functionX” statements work great and are easy to find in CloudWatch Logs 3. The most valuable metrics are the ones closest to your customer/use-case 1. How many gizmos did this function call/create/process/etc © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 67

Slide 67

FIN/ACK 4 key operational areas: • Availability • Networking • Security, Governance, Auditing • Monitoring, Metrics, Logs, Performance © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Slide 68

Slide 68

Chris Munns munns@amazon.com @chrismunns © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://www.flickr.com/photos/theredproject/3302110152/

Slide 69

Slide 69

? © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://secure.flickr.com/photos/dullhunk/202872717/