This document discusses scaling push messaging for millions of Netflix devices. It covers building a push architecture using Zuul servers, operating the push servers, and best practices for auto-scaling the push cluster. Key components include using a push registry like Dynomite to track client connections, Kafka queues to process messages asynchronously, and auto-scaling the server fleet based on open connections.
An edge gateway is an essential piece of infrastructure for large scale cloud based services. This presentation details the purpose, benefits and use cases for an edge gateway to provide security, traffic management and cloud cross region resiliency. How a gateway can be used to enhance continuous deployment, and help testing of new service versions and get service insights and more are discussed. Philosophical and architectural approaches to what belongs in a gateway vs what should be in services will be discussed. Real examples of how gateway services, built on top of Netflix's Open source project, Zuul, are used in front of nearly all of Netflix's consumer facing traffic will show how gateway infrastructure is used in real highly available, massive scale services.
An edge gateway is an essential piece of infrastructure for large scale cloud based services. This presentation details the purpose, benefits and use cases for an edge gateway to provide security, traffic management and cloud cross region resiliency. How a gateway can be used to enhance continuous deployment, and help testing of new service versions and get service insights and more are discussed. Philosophical and architectural approaches to what belongs in a gateway vs what should be in services will be discussed. Real examples of how gateway services are used in front of nearly all of Netflix's consumer facing traffic will show how gateway infrastructure is used in real highly available, massive scale services.
How can you accelerate the delivery of new, high-quality services? How can you be able to experiment and get feedback quickly from your customers? To get the most out of the agility afforded by serverless and containers, it is essential to build CI/CD pipelines that help teams iterate on code and quickly release features. In this talk, we demonstrate how developers can build effective CI/CD release workflows to manage their serverless or containerized deployments on AWS. We cover infrastructure-as-code (IaC) application models, such as AWS Serverless Application Model (AWS SAM) and new imperative IaC tools. We also demonstrate how to set up CI/CD release pipelines with AWS CodePipeline and AWS CodeBuild, and we show you how to automate safer deployments with AWS CodeDeploy.
With Apache Kafka’s rise for event-driven architectures, developers require a specification to design effective event-driven APIs. AsyncAPI has been developed based on OpenAPI to define the endpoints and schemas of brokers and topics. For Kafka applications, the broker’s design to handle high throughput serialized payloads brings challenges for consumers and producers managing the structure of the message. For this reason, a registry becomes critical to achieve schema governance. Apicurio Registry is an end-to-end solution to store API definitions and schemas for Kafka applications. The project includes serializers, deserializers, and additional tooling. The registry supports several types of artifacts including OpenAPI, AsyncAPI, GraphQL, Apache Avro, Google protocol buffers, JSON Schema, Kafka Connect schema, WSDL, and XML Schema (XSD). It also checks them for validity and compatibility.
In this session, we will be covering the following topics:
● The importance of having a contract-first approach to event-driven APIs
● What is AsyncAPI, and how it helps to define Kafka endpoints and schemas
● The Kafka challenges on message structure when serializing and deserializing
● Introduction to Apicurio Registry and schema management for Kafka
● Examples of how to use Apicurio Registry with popular Java frameworks like Spring and Quarkus
Serverless Architecture - Design Patterns and Best PracticesAmazon Web Services
As serverless architectures become more popular, customers are looking for a framework of patterns to help them identify how they can leverage AWS to deploy their workloads without managing servers or operating systems.
This webinar session describes reusable serverless patterns. For each pattern, operational and security best practices with potential pitfalls and nuances will be described. The patterns involve services including but not limited to AWS Lambda, Amazon API Gateway, Amazon Kinesis Data Streams and Data Firehose, Amazon DynamoDB, Amazon S3, AWS Step Functions, AWS Config, AWS X-Ray, and Amazon Athena.
This session can help audience recognise candidates for various serverless architectures in an organisation and understand areas of potential savings and increased agility. For example, using X-Ray in Lambda for tracing and operational insight; a pattern on high performance computing (HPC) using Lambda at scale; Step Functions as a way to handle orchestration for both the Automation and Batch patterns; a pattern for Security Automation using AWS Config rules to detect and automatically remediate violations of security standards; CI/CD development pipelines for serverless, which includes testing, deploying, and versioning (SAM tools); working with services from AI/ML area; plus tips to optimise Lambda functions for performance and cost-effectiveness.
The document describes Amazon EKS (Elastic Container Service for Kubernetes), including an overview of EKS, its architecture, features, and integration with other AWS services. Key points include: EKS manages Kubernetes control planes and nodes are launched in the customer's VPC, EKS supports networking via the AWS VPC CNI plugin, and EKS provides security and access management using IAM roles and policies.
OpenShift is a Platform-as-a-Service that provides development environments on demand using containers. It automates application lifecycles including build, deploy, and retirement. OpenShift uses containers to package applications and dependencies in a portable way. Red Hat addresses concerns around adopting containers at scale through OpenShift, which provides security, scalability, integration, management and certification capabilities. OpenShift runs on a user's choice of infrastructure and orchestrates applications across nodes using Kubernetes.
In this session, we walk through the fundamentals of Amazon VPC. First, we cover build-out and design fundamentals for VPCs, including picking your IP space, subnetting, routing, security, NAT, and much more. We then transition to different approaches and use cases for optionally connecting your VPC to your physical data center with VPN or AWS Direct Connect. This mid-level architecture discussion is aimed at architects, network administrators, and technology decision makers interested in understanding the building blocks that AWS makes available with Amazon VPC. Learn how you can connect VPCs with your offices and current data center footprint.
This document summarizes an upcoming presentation on architecting microservices on AWS. The presentation will:
- Review microservices architecture and how it differs from monolithic and service-oriented architectures.
- Cover key microservices design principles like independent deployment of services that communicate via APIs and using the right tools for each job.
- Provide example design patterns for implementing microservices on AWS using services like EC2, ECS, Lambda, API Gateway and more.
- Include a demo of microservices on AWS.
- Conclude with a question and answer session.
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Containers and workload security an overview Krishna-Kumar
Beginner Level Talk - Presented at Bangalore container conf 2018 - Containers and workload security an overview. Hope it get starts your container security journey :-)
The document discusses Cilium and Istio with Gloo Mesh. It provides an overview of Gloo Mesh, an enterprise service mesh for multi-cluster, cross-cluster and hybrid environments based on upstream Istio. Gloo Mesh focuses on ease of use, powerful best practices built in, security, and extensibility. It allows for consistent API for multi-cluster north-south and east-west policy, team tenancy with service mesh as a service, and driving everything through GitOps.
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...Amazon Web Services
- Elastic Load Balancing automatically distributes application traffic across multiple EC2 instances to improve availability and scalability.
- The Application Load Balancer provides advanced request routing features like path-based routing and integration with containers. It also offers improved security, performance, and monitoring capabilities compared to the Classic Load Balancer.
- Key components of Application Load Balancing include listeners, target groups, targets, rules, health checks, and metrics in CloudWatch. These components work together to route traffic, monitor instances, and scale capacity as needed.
This document provides an overview of AWS Fargate. It begins with introductions to containers and microservices. It describes how containers and microservices are well-suited to each other due to their ability to deploy and scale services independently. The document then discusses container orchestration tools and the options on AWS. It focuses on AWS Fargate, describing its key advantages of having no infrastructure to manage, ability to quickly launch and easily scale containers, and resource-based pricing. Fargate allows developers to focus just on their application without having to manage servers or clusters.
- 동영상 보기: https://www.youtube.com/watch?v=Rq4I57eqIp4
Amazon RDS 프록시는 Amazon Relational Database Service (RDS)를 위한 완전 관리형 고가용성 데이터베이스 프록시로, 애플리케이션의 확장 성, 데이터베이스 장애에 대한 탄력성 및 보안 성을 향상시킬 수 있습니다. (2020년 6월 서울 리전 출시)
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
In this session, we will walk through the fundamentals of Amazon Virtual Private Cloud (VPC). We will discuss core VPC concepts including picking your IP space, subnetting, routing, security, NAT and VPC Endpoints.
This talk explains what what Pod Security Policy is and it's importance in Kubernetes Security. The talk also takes a look at the current situation of docker hub's popular images and helm charts repository.
This talk stresses on the fact that having PSP enabled the right way is absolutely necessary for the real security of the cluster.
Link to the demos:
What is Pod Security Policy? https://www.youtube.com/watch?v=nrWRMP94vqc
Kubernetes Hostpath exploit thrawted with Pod Security Policy https://www.youtube.com/watch?v=APS0CfD6DsE
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
The document discusses Amazon Web Services container management services and Kubernetes. It provides an overview of AWS services like Amazon ECS, EKS, Fargate, ECR, Cloud Map and App Mesh. It also describes Kubernetes concepts like pods, deployments, services, namespaces and control plane/data plane architecture. Amazon EKS is highlighted as a managed Kubernetes service that makes it easy to run Kubernetes on AWS without operating the control plane.
Managing Container Images with Amazon ECR - AWS Online Tech TalksAmazon Web Services
The document discusses Amazon EC2 Container Registry (ECR), which is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images. It provides details on what ECR is, how it integrates with other AWS services like ECS, its access control and encryption features, and demos of common user workflows like creating a registry, pushing images, and using images in tasks.
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_SingaporeAmazon Web Services
The document discusses continuous integration, delivery, and deployment (CI/CD) using AWS services like CodeCommit, CodeBuild, CodeDeploy, CodePipeline, ECS Fargate, and ECR. It covers building Docker images with CodeBuild, orchestrating deployment pipelines with CodePipeline, and deploying containers to ECS Fargate.
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
This document discusses best practices for middleware and integration architecture modernization using Apache Camel. It provides an overview of Apache Camel, including what it is, how it works through routes, and the different Camel projects. It then covers trends in integration architecture like microservices, cloud native, and serverless. Key aspects of Camel K and Camel Quarkus are summarized. The document concludes with a brief discussion of the Camel Kafka Connector and pointers to additional resources.
This document compares Jenkins and AWS CodePipeline for implementing software pipelines. It finds that Jenkins provides more flexibility through plugins and scripting but requires managing infrastructure, while CodePipeline is fully hosted but has less customization options. Both can be combined, with CodePipeline triggering Jenkins jobs or Jenkins deploying code using CodeDeploy. The document concludes that the right solution depends on individual needs and integrating tools enables getting benefits from both.
Kubernetes Clusters Security with Amazon EKS (CON338-R1) - AWS re:Invent 2018Amazon Web Services
In this session, we discuss best practices for securing your Kubernetes deployments on AWS. We cover how to use AWS IAM with Kubernetes role-based access control (RBAC) for new or existing Kubernetes deployments, and we dive deep into how Amazon EKS implements secure cluster configuration by default.
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...Amazon Web Services
Deployed globally in multiple edge locations, AWS Global Accelerator helps you manage traffic destined to your multi-regional applications with further higher levels of availability and performance. This session comprises ways in which Ubiquity helps you build fault tolerant and highly performant systems across AWS regions using anycast static IP addresses. In this session, you will learn about Global Accelerator’s shuffle sharding technique used for its static IPs, benefits of anycast and more.
Get the Most out of Your Elastic Load Balancer for Different Workloads (NET31...Amazon Web Services
Bring your tricky questions and interesting use cases to this session, where we cover topics such as choosing the right load balancer, architectural best practices, load balancing principles, analyzing your application with Amazon CloudWatch metrics, and ELB access logs.
In this session, we walk through the fundamentals of Amazon VPC. First, we cover build-out and design fundamentals for VPCs, including picking your IP space, subnetting, routing, security, NAT, and much more. We then transition to different approaches and use cases for optionally connecting your VPC to your physical data center with VPN or AWS Direct Connect. This mid-level architecture discussion is aimed at architects, network administrators, and technology decision makers interested in understanding the building blocks that AWS makes available with Amazon VPC. Learn how you can connect VPCs with your offices and current data center footprint.
This document summarizes an upcoming presentation on architecting microservices on AWS. The presentation will:
- Review microservices architecture and how it differs from monolithic and service-oriented architectures.
- Cover key microservices design principles like independent deployment of services that communicate via APIs and using the right tools for each job.
- Provide example design patterns for implementing microservices on AWS using services like EC2, ECS, Lambda, API Gateway and more.
- Include a demo of microservices on AWS.
- Conclude with a question and answer session.
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Containers and workload security an overview Krishna-Kumar
Beginner Level Talk - Presented at Bangalore container conf 2018 - Containers and workload security an overview. Hope it get starts your container security journey :-)
The document discusses Cilium and Istio with Gloo Mesh. It provides an overview of Gloo Mesh, an enterprise service mesh for multi-cluster, cross-cluster and hybrid environments based on upstream Istio. Gloo Mesh focuses on ease of use, powerful best practices built in, security, and extensibility. It allows for consistent API for multi-cluster north-south and east-west policy, team tenancy with service mesh as a service, and driving everything through GitOps.
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...Amazon Web Services
- Elastic Load Balancing automatically distributes application traffic across multiple EC2 instances to improve availability and scalability.
- The Application Load Balancer provides advanced request routing features like path-based routing and integration with containers. It also offers improved security, performance, and monitoring capabilities compared to the Classic Load Balancer.
- Key components of Application Load Balancing include listeners, target groups, targets, rules, health checks, and metrics in CloudWatch. These components work together to route traffic, monitor instances, and scale capacity as needed.
This document provides an overview of AWS Fargate. It begins with introductions to containers and microservices. It describes how containers and microservices are well-suited to each other due to their ability to deploy and scale services independently. The document then discusses container orchestration tools and the options on AWS. It focuses on AWS Fargate, describing its key advantages of having no infrastructure to manage, ability to quickly launch and easily scale containers, and resource-based pricing. Fargate allows developers to focus just on their application without having to manage servers or clusters.
- 동영상 보기: https://www.youtube.com/watch?v=Rq4I57eqIp4
Amazon RDS 프록시는 Amazon Relational Database Service (RDS)를 위한 완전 관리형 고가용성 데이터베이스 프록시로, 애플리케이션의 확장 성, 데이터베이스 장애에 대한 탄력성 및 보안 성을 향상시킬 수 있습니다. (2020년 6월 서울 리전 출시)
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
In this session, we will walk through the fundamentals of Amazon Virtual Private Cloud (VPC). We will discuss core VPC concepts including picking your IP space, subnetting, routing, security, NAT and VPC Endpoints.
This talk explains what what Pod Security Policy is and it's importance in Kubernetes Security. The talk also takes a look at the current situation of docker hub's popular images and helm charts repository.
This talk stresses on the fact that having PSP enabled the right way is absolutely necessary for the real security of the cluster.
Link to the demos:
What is Pod Security Policy? https://www.youtube.com/watch?v=nrWRMP94vqc
Kubernetes Hostpath exploit thrawted with Pod Security Policy https://www.youtube.com/watch?v=APS0CfD6DsE
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
The document discusses Amazon Web Services container management services and Kubernetes. It provides an overview of AWS services like Amazon ECS, EKS, Fargate, ECR, Cloud Map and App Mesh. It also describes Kubernetes concepts like pods, deployments, services, namespaces and control plane/data plane architecture. Amazon EKS is highlighted as a managed Kubernetes service that makes it easy to run Kubernetes on AWS without operating the control plane.
Managing Container Images with Amazon ECR - AWS Online Tech TalksAmazon Web Services
The document discusses Amazon EC2 Container Registry (ECR), which is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images. It provides details on what ECR is, how it integrates with other AWS services like ECS, its access control and encryption features, and demos of common user workflows like creating a registry, pushing images, and using images in tasks.
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_SingaporeAmazon Web Services
The document discusses continuous integration, delivery, and deployment (CI/CD) using AWS services like CodeCommit, CodeBuild, CodeDeploy, CodePipeline, ECS Fargate, and ECR. It covers building Docker images with CodeBuild, orchestrating deployment pipelines with CodePipeline, and deploying containers to ECS Fargate.
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
This document discusses best practices for middleware and integration architecture modernization using Apache Camel. It provides an overview of Apache Camel, including what it is, how it works through routes, and the different Camel projects. It then covers trends in integration architecture like microservices, cloud native, and serverless. Key aspects of Camel K and Camel Quarkus are summarized. The document concludes with a brief discussion of the Camel Kafka Connector and pointers to additional resources.
This document compares Jenkins and AWS CodePipeline for implementing software pipelines. It finds that Jenkins provides more flexibility through plugins and scripting but requires managing infrastructure, while CodePipeline is fully hosted but has less customization options. Both can be combined, with CodePipeline triggering Jenkins jobs or Jenkins deploying code using CodeDeploy. The document concludes that the right solution depends on individual needs and integrating tools enables getting benefits from both.
Kubernetes Clusters Security with Amazon EKS (CON338-R1) - AWS re:Invent 2018Amazon Web Services
In this session, we discuss best practices for securing your Kubernetes deployments on AWS. We cover how to use AWS IAM with Kubernetes role-based access control (RBAC) for new or existing Kubernetes deployments, and we dive deep into how Amazon EKS implements secure cluster configuration by default.
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...Amazon Web Services
Deployed globally in multiple edge locations, AWS Global Accelerator helps you manage traffic destined to your multi-regional applications with further higher levels of availability and performance. This session comprises ways in which Ubiquity helps you build fault tolerant and highly performant systems across AWS regions using anycast static IP addresses. In this session, you will learn about Global Accelerator’s shuffle sharding technique used for its static IPs, benefits of anycast and more.
Get the Most out of Your Elastic Load Balancer for Different Workloads (NET31...Amazon Web Services
Bring your tricky questions and interesting use cases to this session, where we cover topics such as choosing the right load balancer, architectural best practices, load balancing principles, analyzing your application with Amazon CloudWatch metrics, and ELB access logs.
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
Cloud computing provides a number of advantages, such as the ability to scale your web application or website on demand. If you have a new web application and want to use cloud computing, you might be asking yourself, "Where do I start?" Join us in this session for best practices on scaling your resources from one to millions of users. We show you how to best combine different AWS services, how to make smarter decisions for architecting your application, and how to scale your infrastructure in the cloud.
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...Amazon Web Services
Data and events are the lifeblood of any modern application. By using stateless, loosely coupled microservices communicating through events, developers can build massively scalable systems that can process trillions of requests in seconds. In this talk, we cover design patterns for using Amazon SQS, Amazon SNS, AWS Step Functions, AWS Lambda, and Amazon S3 to build data processing and real-time notification systems with unbounded scale and serverless cost characteristics. We also explore how these approaches apply to practical use cases, such as training machine learning models, media processing, and data cleansing.
In this session, learn from market-leader Vonage how and why they re-architected their QoS-sensitive, highly available and highly performant legacy real-time communications systems to take advantage of Amazon EC2, Enhanced Networking, Amazon S3, ASG, Amazon RDS, Amazon ElastiCache, AWS Lambda, StepFunctions, Amazon SNS, Amazon SQS, Amazon Kinesis, Amazon EFS, and more. We also learn how Aspect, a multinational leader in call center solutions, used AWS Lambda, Amazon API Gateway, Amazon Kinesis, Amazon ElastiCache, Amazon Cognito, and Application Load Balancer with open-source API development tooling from Swagger, to build a comprehensive, microservices-based solution. Vonage and Aspect share their journey to TCO optimization, global outreach, and agility with best practices and insights.
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Amazon Web Services
You've spent the time designing, architecting, setting up, and configuring your Kubernetes cluster. Now, it's on to day two. "Day two" refers to the functions of scaling, optimizing, monitoring, securing, and in general keeping the lights on. In this talk, we discuss the tools that you have available to help you build a reliable and resilient Kubernetes cluster and run workloads in production. We discuss how to control the network, secure your environment using threat detection, scan your containers for vulnerabilities, use monitoring tools, and create scalable containers and clusters.
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018Amazon Web Services Korea
The document discusses DevOps practices for operating microservices on AWS, including introducing Spinnaker. Some key points discussed include:
- The need for immutable servers, infrastructure as code, release pipelines, deployment strategies like blue/green and canary releases, cluster management, automated testing, monitoring, and log streaming.
- Using tools like Packer, Terraform, Ansible for infrastructure as code and building immutable images.
- Continuous delivery pipelines for automated testing and deployments.
- Spinnaker as a continuous delivery platform supporting deployment strategies and cluster management across multiple clouds.
This document summarizes a presentation on building microservices with AWS. It discusses what microservices are, benefits of the microservices architecture like agility and scalability. It then outlines several AWS services that can be used to build, deploy, and manage microservices including Amazon EC2, ECS, Lambda, API Gateway, CodeCommit, CodeDeploy, CodeBuild, CodePipeline, Secrets Manager, DynamoDB, CloudWatch, X-Ray, Step Functions, and service discovery. The presentation provides examples of how these services can support microservices development.
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Amazon Web Services
Even the best continuous delivery and DevOps practices cannot guarantee that there will be no issues in production. The rise of Site Reliability Engineering (SRE) has promoted new ways to automate resilience into your system and applications to circumvent potential problems, but it’s time to “shift-left” this effort into engineering. In this session, learn to leverage AWS Lambda functions as “remediation as code.” We show how to make it part of your continuous delivery process and orchestrate the invocation of Self-Healing Lambda functions in case of unexpected situations impacting the reliability of your system. Gone are the days of traditional operation teams—it’s the rise of “shift-lefters”! This session is brought to you by AWS partner, Dynatrace.
Amazon Elastic Container Service for Kubernetes (Amazon EKS) I AWS Dev Day 2018AWS Germany
Containers are an increasingly important way for developers to package and deploy their applications and AWS offers multiple container products to help you deploy, manage, and scale containers in production. In this session we dive deep into Amazon Elastic Container Service for Kubernetes (Amazon EKS), a new managed service for running Kubernetes on AWS. Learn how Amazon EKS works, from provisioning nodes, launching pods, and integrations with AWS services such as Elastic Load Balancing and Auto Scaling.
Learn more about containers here: https://aws.amazon.com/containers/
Building Microservices with the Twelve-Factor App Pattern - SRV346 - Chicago ...Amazon Web Services
Small monolithic apps are quick to build and fast to implement. But tightly coupled apps can quickly become difficult to operate, maintain, and scale as they grow. In this session, we cover how to properly construct services and distributed microservices systems. We explore how to build twelve-factor apps and discuss the right tools and architectures to implement them on AWS.
Building Microservices with the Twelve Factor App Pattern on AWSAmazon Web Services
The document discusses building microservices using the Twelve-Factor App methodology on AWS. It introduces the Twelve-Factor App methodology, which provides best practices for building modern, cloud-native applications. It then covers each of the twelve factors in more detail, explaining how to apply them when building microservices on AWS services like EC2, ECS, and S3.
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018Amazon Web Services
Airbnb is going through tremendous growth internationally, evolving from a home sharing company to a global travel community with many product offerings. The growth driven by the business, increase in traffic, and aggressive hiring created a new challenge for the Production Infrastructure Team. The team has grown from a small team of 10 to a production platform organization with 100 engineers that builds foundational services that support homes, experiences, luxury, and China. We shifted our priority and focus to move away from putting out fires to building a platform that can grow with the company. In this session, we chronicle Airbnb’s architectural evolution that aligns with organizational growth strategy, and review how we overcame different architectural challenges leveraging AWS technologies.
Database Week at the San Francisco Loft: ElastiCache & Redis
Redis is an open source, in-memory data store that delivers sub-millisecond response times enabling millions of requests per second to power real-time applications. It can be used as a fast database, cache, message broker, and queue. Amazon ElastiCache delivers the ease-of-use and power of Redis along with the availability, reliability, scalability, security, and performance suitable for the most demanding applications. We’ll take a close look at Redis and how to use it to power different use cases.
Speaker: Ben Willett - Sr. Solutions Architect, AWS
Redis is an open source, in-memory data store that delivers sub-millisecond response times enabling millions of requests per second to power real-time applications. It can be used as a fast database, cache, message broker, and queue. Amazon ElastiCache delivers the ease-of-use and power of Redis along with the availability, reliability, scalability, security, and performance suitable for the most demanding applications. We’ll take a close look at Redis and how to use it to power different use cases.
Speaker: Samir Karande - Sr. Manager, Solutions Architecture, AWS
ServerlessConf 2018 Keynote - Debunking Serverless Myths (no video / detailed...Tim Wagner
Copy of the keynote with the video removed (for easier downloads onto mobile devices) and with some additional slides that expand on the cost analysis in more detail.
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...Amazon Web Services
This session introduces AWS Global Accelerator, a new global service that enables you to optimally route traffic to your multi-regional endpoints via static Anycast IP addresses that are announced from the expansive AWS edge network. This session walks through the various features and customer use cases for Global Accelerator. Several example use cases demonstrate how you can use Ubiquity to achieve near-zero application downtime and reduce latency for your global applications. We will walk you through the architecture and will also include a demo of the workflow. Attend this session if you are looking at ways to accelerate performance of your global applications, achieve high availability for your mission critical applications or easily manage multiple IP addresses through a static Anycast IP that fronts your applications.
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
Migrating enterprise applications to the cloud requires thorough planning and consideration for a number of variables. Should you move your application to a similar infrastructure in the cloud (in a lift-and-shift scenario)? Or should you refactor your application to take advantage of cloud-native services for object storage, serverless, auto-scaling, and so on? In this session, an AWS expert walks through the ten commandments that enterprises should follow when moving applications to the cloud and refactoring them for optimal performance. Then, a representative of Sysco Corporation, a Fortune 50 company, shares how the company migrated mission-critical legacy business systems and modernized them to take advantage of the AWS Cloud. Learn how the company moved its enterprise purchasing system, which processes millions of dollars in sales daily, to the AWS Cloud while achieving a 60% decrease in run costs. Also discover the lessons learned and highlights of the migration, which resulted in 30% increase in performance, 3x improvement in user accessibility, and a significant decrease in order backlogs and outages.
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfAmazon Web Services
AWS Summit Milano 2018
Come scalare da zero ai tuoi primi 10 milioni di utenti
Speaker: Giorgio Bonfiglio, AWS Technical Account Manager - Enterprise Support
CFD Studio Credentials – Branding, Design & Developmenttrannghia2018
CFD STUDIO is an independent creative studio, specializing in Branding, UX/UI Design, Website & Mobile App Development. We craft high-quality digital experiences for brands and business goals.
Our Mission is to transform ideas into impactful brands by blending creativity, technology, and strategic thinking, delivering solutions that not only captivate but also drive success.
Using speech recognition and natural language processing, Automated Minutes creates an accurately transcribed meeting minutes draft in a near real-time, secure environment.
On March 11th at 2 PM EST OnBoard’s product team, Heather Hansson and Philip Hinz, explored the power of OnBoard’s Automated Minutes.
Using this webinar, you can learn:
Why Automated Minutes? Customizable, Secure, and Governance-Built for Boards
How Automated Minutes works to capture and create an initial draft of your minutes
Personalizing and formatting your Minutes through rich text editing tools
UiPath NY AI Series: Session 1: Introduction to Agentic AI with UiPathDianaGray10
🚀 Embracing the Future: Starting the Course with Agentic AI with UiPath
📢 Event Overview:
Join us for an exciting session on Agentic AI with UiPath! This event is perfect for professionals, tech enthusiasts, and automation leaders eager to learn about autonomous and intelligent digital agents. Discover how UiPath’s Agentic AI is shaping the future of automation! 🤖✨
📅 What You’ll Learn
🔹 UiPath’s Agentic AI Vision - Learn about UiPath’s AI-driven automation future.
🔹 Evolution of UiPath’s Automation - From RPA to AI-powered automation, see the journey! 🚀
🔹 What is Agentic Automation? - Understand how self-adaptive AI is changing workflows.
🔹 Principles of Agentic Automation - Key ideas like autonomy & adaptability.
🔹 Real-World Applications - Success stories & use cases from businesses leveraging AI.
🔹 UiPath’s Agentic AI Architecture - A peek into the technical side of intelligent automation. 🏗️
🔹 Q&A Session
👥 Who Should Attend?
Automation Developers & Tech Enthusiasts 💡
Business Leaders 📊
IT Architects & Tech Innovators 🏗️
UiPath Community Members 🤝
📌 Register now & be part of the future of AI-driven automation! 🔥
Technology use over time and its impact on consumers and businesses.pptxkaylagaze
In this presentation, I explore how technology has changed consumer behaviour and its impact on consumers and businesses. I will focus on internet access, digital devices, how customers search for information and what they buy online, video consumption, and lastly consumer trends.
Why Ivalua: A Relational Acquisition Model (RAM 2025) ComparisonJon Hansen
What makes Jon Hansen’s ProcureTech assessment solution RAM unique?
RAM (short for “Relational Acquisition Model,” based on historical context), stands out due to its pioneering approach to procurement efficiency, developed in the late 1990s and early 2000s. While specific technical details about RAM’s current iteration as of March 1, 2025, are not fully detailed in recent public sources, its uniqueness can be inferred from Hansen’s documented history, writings, and interviews, particularly from Procurement Insights and related discussions.
RAM stands out for its agent-based adaptability, interactive design, early AI intelligence, people-process-tech integration, and proven government success—features ahead of its time in the 1990s and resonant with 2025’s procurement needs. It tackled inefficiencies with a practical, transparent approach, not just tech hype, saving millions and streamlining operations where others failed. While its current form isn’t fully public, its legacy as a ProcureTech pioneer remains unique, blending foresight with results in a way few contemporaries matched then or now.
Today’s ProcureTech solution providers—such as Coupa, GEP, Jaggaer, Sievo, Ivalua—can benefit from the Relational Acquisition Model (RAM) by drawing on its foundational principles and proven strengths, adapting them to enhance their offerings in the context of 2025’s complex procurement landscape. While RAM, developed in the late 1990s, lacks the technological scale of modern platforms, its agent-based design, focus on transparency, and human-centric efficiency offer valuable lessons.
Today’s ProcureTech providers can benefit from RAM by adopting its agent-based adaptability, transparent AI, interactive simplicity, human-tech balance, operational focus, and proven credibility. These could enhance responsiveness (e.g., tariff tweaks), trust (e.g., black box fears), and ROI (e.g., faster savings), potentially lifting efficiency by 10-20% or adoption by 15-30%. RAM’s lessons—distilled from a $12 million success—offer a roadmap to refine, not replace, modern solutions like Ivalua. It’s a legacy worth mining for a market chasing the next big thing.
World Information Architecture Day 2025 - UX at a CrossroadsJoshua Randall
User Experience stands at a crossroads: will we live up to our potential to design a better world? or will we be co-opted by “product management” or another business buzzword?
Looking backwards, this talk will show how UX has repeatedly failed to create a better world, drawing on industry data from Nielsen Norman Group, Baymard, MeasuringU, WebAIM, and others.
Looking forwards, this talk will argue that UX must resist hype, say no more often and collaborate less often (you read that right), and become a true profession — in order to be able to design a better world.
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]Jonathan Bowen
Alan Turing arguably wrote the first paper on formal methods 75 years ago. Since then, there have been claims and counterclaims about formal methods. Tool development has been slow but aided by Moore’s Law with the increasing power of computers. Although formal methods are not widespread in practical usage at a heavyweight level, their influence as crept into software engineering practice to the extent that they are no longer necessarily called formal methods in their use. In addition, in areas where safety and security are important, with the increasing use of computers in such applications, formal methods are a viable way to improve the reliability of such software-based systems. Their use in hardware where a mistake can be very costly is also important. This talk explores the journey of formal methods to the present day and speculates on future directions.
Data Intelligence Platform Transforming Data into Actionable Insights.pptxLisa Gerard
In today’s data-driven world, a Data Intelligence Platform plays a crucial role in empowering organizations to make informed, strategic decisions. By leveraging advanced analytics, seamless data integration, and robust governance, businesses can transform vast amounts of data into actionable insights.
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraScyllaDB
Learn how Responsive replaced embedded RocksDB with ScyllaDB in Kafka Streams, simplifying the architecture and unlocking massive availability and scale. The talk covers unbundling stream processors, key ScyllaDB features tested, and lessons learned from the transition.
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)Tsuyoshi Hirayama
DAO UTokyo 2025
東京大学情報学環 ブロックチェーン研究イニシアティブ
https://utbciii.com/2024/12/12/announcing-dao-utokyo-2025-conference/
Session 1 :DLT mass adoption
IBM Tsuyoshi Hirayama (平山毅)
Future-Proof Your Career with AI OptionsDianaGray10
Learn about the difference between automation, AI and agentic and ways you can harness these to further your career. In this session you will learn:
Introduction to automation, AI, agentic
Trends in the marketplace
Take advantage of UiPath training and certification
In demand skills needed to strategically position yourself to stay ahead
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
Transcript: Elements of Indigenous Style: Insights and applications for the b...BookNet Canada
From acquisitions and editorial to marketing and sales teams, every team member plays a role in accurately, respectfully, and ethically championing Indigenous and traditionally underrepresented voices. This session, led by Warren Cariou, Lead Editor of the second edition of Gregory Younging’s Elements of Indigenous Style, is for book industry professionals eager to learn and apply Indigenous teachings to their work.
Using Elements of Indigenous Style as a foundation, this session delves into its mind-opening content, which goes beyond the scope of a traditional style guide. The book advocates for the indigenization of publishing and addresses topics such as culturally appropriate publishing practices; understanding identity and community affiliation; Two-Spirit, trans, and Indigiqueer contexts; practices to support Indigenous linguistic and cultural sovereignty; and emerging issues in the digital environment. Warren provides actionable recommendations and best practices for publishers working on literary projects by or about Indigenous authors, which can be applied more broadly to other underrepresented communities.
Kaitlin Littlechild from the Indigenous Editors Association brings her expertise to the discussion as the moderator.
Link to recording and presentation slides: https://bnctechforum.ca/sessions/elements-of-indigenous-style-insights-and-applications-for-the-book-industry/
Presented by BookNet Canada on February 28, 2025 with support from the Department of Canadian Heritage.
Transform Your Future with Front-End Development TrainingVtechlabs
Kickstart your career in web development with our front-end web development course in Vadodara. Learn HTML, CSS, JavaScript, React, and more through hands-on projects and expert mentorship. Our front-end development course with placement includes real-world training, mock interviews, and job assistance to help you secure top roles like Front-End Developer, UI/UX Developer, and Web Designer.
Join VtechLabs today and build a successful career in the booming IT industry!
UiPath Automation Developer Associate Training Series 2025 - Session 2DianaGray10
In session 2, we will introduce you to Data manipulation in UiPath Studio.
Topics covered:
Data Manipulation
What is Data Manipulation
Strings
Lists
Dictionaries
RegEx Builder
Date and Time
Required Self-Paced Learning for this session:
Data Manipulation with Strings in UiPath Studio (v2022.10) 2 modules - 1h 30m - https://academy.uipath.com/courses/data-manipulation-with-strings-in-studio
Data Manipulation with Lists and Dictionaries in UiPath Studio (v2022.10) 2 modules - 1h - https:/academy.uipath.com/courses/data-manipulation-with-lists-and-dictionaries-in-studio
Data Manipulation with Data Tables in UiPath Studio (v2022.10) 2 modules - 1h 30m - https:/academy.uipath.com/courses/data-manipulation-with-data-tables-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
FinTech - US Annual Funding Report - 2024.pptxTracxn
US FinTech 2024, offering a comprehensive analysis of key trends, funding activities, and top-performing sectors that shaped the FinTech ecosystem in the US 2024. The report delivers detailed data and insights into the region's funding landscape and other developments. We believe this report will provide you with valuable insights to understand the evolving market dynamics.
Technology use over time and its impact on consumers and businesses.pptxkaylagaze
In this presentation, I will discuss how technology has changed consumer behaviour and its impact on consumers and businesses. I will focus on internet access, digital devices, how customers search for information and what they buy online, video consumption, and lastly consumer trends.
The Future of Repair: Transparent and Incremental by Botond DénesScyllaDB
Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombstone garbage collection. We want to address these challenges by making repairs incremental and allowing for automatic repair scheduling, without relying on external tools.
TrustArc Webinar: How to Create a Privacy-First CultureTrustArc
Privacy is no longer just a compliance issue—it’s a cornerstone of trust and a vital element of business success. Yet, many organizations struggle to embed privacy into their culture, leaving them vulnerable to breaches, regulatory action, and damaged reputations. Are your employees equipped to make privacy-conscious decisions? Does your company have the tools and mindset to prioritize data protection at every level?
This webinar brings together a panel of experts to explore why a strong privacy culture is critical and how it can drive both organizational integrity and customer confidence. You’ll learn how to align privacy values with business objectives, foster awareness and accountability among employees, and create policies that empower teams to safeguard sensitive information effectively.
Through engaging discussions and practical insights, we’ll provide actionable strategies for implementing privacy programs that stick. From building leadership support to weaving privacy considerations into daily workflows, you’ll discover what it takes to turn compliance into a competitive advantage and a core part of your company’s identity.
This webinar will review:
- Why your company needs a privacy culture
- Best practices for building a privacy-first culture
- Practical tips for implementing effective privacy programs
#4: Imagine it’s Friday night of this week. The conference is over, you are back home, sitting on your favorite couch, ready to unwind and you start Netflix. At least I hope you do :)
#5: This is the first thing you see as soon as you start Netflix. Interesting thing about this list is that it’s not static or universal. It is personalized to your taste. There are hundred and twenty five million versions of this list. One for each of our hundred and twenty five million members. But this one is mine. Personalized to my taste.
Which I just realized is filled with crime shows. Let’s not read anything specific into that. Moving on…
#6: Raise your hand if you actually start watching something within a minute or two after seeing that list.
Yeah, me neither! Most of us spend considerable amount of time on this screen scrolling, trying to pick something to watch. This behavior is actually relevant to our discussion.
Let’s say 20 minutes later you are still browsing the list. Meanwhile, our personalization algorithms are continuously running. So during those 20 minutes we could generate a new, better personalized list of shows for you in the cloud. If that does happen, how do we get that new list in front of you? How do we tell our application that a new list is ready for it to download?
Push messaging is a perfect solution for situations like this. Our old app polled our server periodically for new recommendations. It kinda worked but it was both wasteful and not that great latency wise. What’s worse is these twin goals of server efficiency and freshness of UI directly contradict each other. If you make polling interval too low to get freshest UI you are putting more load on your servers and if you increase polling interval to help your servers, the freshness of your UI suffers.
Now our server just pushes the new list to the client. Just as one data point, we cut down total number of requests to our website by 12% when we shifted our browser app from polling to push. At more than million requests per second those 12% add up really fast!
So please ignore all push messages on your phones for next 40 minutes because we are going to talk about push messaging now. Push notifications may be terrible for conference speakers but background push messages are awesome for applications.
#7: By the end of this presentation you’ll have a very clear understanding of
#12: My name is Susheel Aroskar. I am a software engineer in the Cloud Gateway team at Netflix. All of the Netflix HTTP API traffic passes through our Cloud Gateway. I have been at Netflix for 9 years now. Worked in three different teams. And somehow it still feels like I’m still just browsing the list, the real show is yet to start.
So let’s start by defining push.
What exactly is push? How is it different from the normal request / response paradigm that we all know and love?
#13: This is actually from a motivational poster at my local gym. That’s why I stopped going there. But it turns out to be surprisingly accurate definition for our purpose today.
Push really is different in just two ways:
There is a persistent, always-on connection between the server and the client for the entirety of the client’s lifetime, and
It’s the server that initiates the data transfer. Something does happen on the server and then the server pushes the data to the client instead of the client requesting it
We built our own push messaging system, named Zuul Push, to send background push messages to our app from our servers. Zuul push messages are similar to push messages you get on your smartphones except they work across all sorts of devices not just phones. They work anywhere where Netflix app runs. That includes TVs, game consoles, laptops and smartphones. To achieve this, Zuul Push uses standard, open web protocols like WebSockets and Server Sent Events (SSE) to push messages. Zuul Push server itself is open sourced too and is available today on GitHub.
#14: Zuul push is in fact not a single service but a complete push messaging infrastructure made up of multiple components
#15: There are Zuul Push servers. They sit on the network edge and accept connections from clients.
#16: Clients connect to push servers using either WebSocket or Server Sent Events protocol. Once connected, the client keeps the connection open for its entire lifetime. So these are persistent connections.
#17: Since there are many clients, connected to many push servers we need to track which client is connected to which server. This is the job of the push registry.
#19: On the backend, our push message senders need a simple, robust and high throughput mechanism to send push messages.
But our senders don’t really want to know about all the internal details of our push infrastructure. What they really a want is a simple, one liner method call to send a push message to a given client. Our push library gives them this simple interface by hiding all this complexity behind a single sendMessage() call.
#20: Internally sendMessage() drops the message into a push message queue. By introducing message queues between senders and receivers we decouple them, making it easy to operate them independent of each other. Message queues also let us absorb wide variations in number of incoming messages. They act as a buffers that absorb big spikes in traffic.
#21: Finally our message processor ties all these components together to do the actual push message delivery.
#22: It reads push messages from the push message queue. Each push message is addressed to a specific client.
#23: The message router then looks up in the Push Registry which push server the requested client is connected to.
#24: If the push server is found in the registry, message processor connects to that push server and delivers the push message. The port used by the message processor to connect to the push server is reachable only on the internal 10/24 subnetwork and is guarded by Amazon security groups.
On the other hand, If the push server is not found in the registry, it means requested client is not connected or online at this time. In such case processor just drops the message on the floor.
Now that we have seen how all Zuul Push components fit together, we can dig a little deeper in each component’s details.
#25: Zuul Push server is probably the biggest piece of the whole infrastructure. Our push cluster today handles 10s of millions concurrent, persistent client connections at peak and is rapidly growing.
Zuul Push server is based on our Zuul cloud gateway and hence shares its name. Zuul cloud gateway fronts all Netflix HTTP API traffic coming into our system. It handles millions of requests per second. It was recently rewritten to use async, non-blocking I/O so It provided a perfect foundation for building massively scalable push messaging server like Zuul Push.
#26: But why do we need async I/O?
Many of you are probably familiar with the C10K challenge. The challenge was first coined in 1999. It simply asks how can we support 10,000 connections on a single server. We have long since blown past the original 10,000 number but the name stuck.
This capability to support tens of thousands of connections on a single box is crucial for a service like Zuul Push that has to handle millions of mostly idle but always-on persistent connections.
#27: Traditional way of handling multiple connections is to spawn a new thread for each new connection. This thread then does blocking read/write operations on that connection. This model doesn’t scale to meet the C10K challenge.
You would quickly exhaust your memory allocating 10,000 stacks for 10000 threads. It’d also pin your CPU down because of the frequent context switches between those 10,000 threads.
#28: Async I/O follows a different model. It uses operating system’s I/0 multiplexing primitives like epoll or keque to register read/write callbacks for all open connections on a single thread. Whenever any socket is ready for I/O, it’s callbacks get invoked using the same single thread so now you don’t need thousands and thousands of threads. The trade off is somewhat more complex programming model because now you as developer are responsible for keeping track of all the state inside your code. You can no longer rely on the thread stack to do it for you because the same single thread stack is now shared by all the open connections.
#29:
We use Netty to do async I/O. Netty is a great open source networking library in Java. It is widely used by many popular open source Java projects like Cassandra, Hadoop etc. so it is well tested and battle proven.
We are not going to go into details of Netty in this talk but this is how Netty async I/O program structure looks like from 10,000 feet. Those Inbound and Outbound channel handlers you see here are analogous to the read and write callbacks we just discussed.
It’s very similar in essence to how Node.js handles multiple connections. If you know Node.js internals you can think of Netty as a libuv counterpart in the Java world.
#30: This is a simplified version of our push server’s Netty pipeline. There is a lot of stuff going on here but I really want to call out your attention to just two highlighted methods
getPushAuthHandler()
getPushRegistraionHandler()
You can override these methods to plug in your own custom authentication and custom push session registration in Zuul Push server.
Rest of the handlers you see here - things like HttpServerCodec or WebSocketServerProtocolHandler - are all off the shelf protocol handlers provided by Netty, which is great. Netty is doing most of the heavy lifting of parsing HTTP and WebSocket protocol here.
#31: Each client connecting to the Zuul push server must identify and authenticate itself before it can start receiving push messages on that connection.
You can plugin your own custom authentication by extending PushAuthHandler and implementing its doAuth() method. doAuth() receives original HTTP WebSocket connection request as an argument. This allows you to inspect cookies, other headers and the body of the request inside doAuth() which you can use to implement your own, custom authentication.
#32: As we saw push registry is used to keep track of which client is connected to which Zuul Push server.
#33: Just like custom authentication, Zuul Push lets you plugin custom datastore of your choice for push registry.
You’d extend PushRegistrationHandler class and implement it’s registerClient() method to do that.
#34: You can use any data store but for best results it should have following characteristics
#35: Low read latency is important because you write registration once when client connects and then look it up multiple times. Once every time someone sends a push message for that client.
#36: Support for record expiry is important because in the real world we cannot rely on every single client closing its connection cleanly, every single time. Most of the time they will close it cleanly which takes care of cleaning up their push registration record from the registry. But sometimes clients crash. Sometimes servers crash. This will leave behind phantom registration records in the registry. A record that indicates that a particular client is connected to particular server, but is no longer accurate. In such cases We need a way to clean up those phantom registration records automatically. Zuul push relies on automatic record expiry or TTL to do that.
#37: Beyond these two features then there are the usual suspects for high availability
#39: These are all great choices for push registry. There are probably several more
#40: We use Dynomite. Dynomite is another open source project from Netflix that wraps Redis and augments it with features like auto-sharding, read/write quorums and cross region replication. You can think of it as Amazon Dynamo meets Redis. We chose Dynomite since it supports replication across AWS regions out of the box which is important for our use case. And also because Dynomite is well supported operationally inside Netflix by a central data engineering team!
#41: This component handles backend message queuing, routing and delivery of push messages on behalf of message senders.
#43: Most of our push message senders use fire-and-forget approach to message delivery.
Those who are interested in knowing the final delivery status of the push message can either subscribe to the push delivery status queue or read it from a Hive table.
#44: Netflix runs in three different AWS regions. A backend service trying to send a push message to a particular customer generally has no idea to which region that customer may be connected. Our message routing infrastructure takes care of routing that message to the correct AWS region for them. We use Kafka message queue replication to deliver messages across regions.
#45: In practice, we have found we can use single push message queue to deliver all sorts of push messages and still stay under our delivery latency SLA. However, our design lets you use different message queues for different message priorities to avoid “priority inversion” issue. Priority inversion happens when a message with higher priority is kept waiting behind lower priority messages for delivery because they all are sharing the same queue. Using different message queues for different priorities guarantees this can never happen.
#46: Our message processor is built on top of Mantis. Mantis is our internal scalable stream processing engine similar to Apache Flink. It uses Mesos container management system. This allows us to quickly spin up more message processor instances. It also has a support for auto-scaling number of processors based on the number of pending messages in the queue. This makes it very easy for us to meet our delivery SLA under wide variety of loads while still staying resource efficient.
#47: At this point I’d like to switch gears and cover some of the operational aspects of running Zuul Push in production at Netflix traffic scale. Zuul push is little different than usual stateless REST services so it requires a little TLC (tender love and care) when you run it in production.
#48: The first and biggest difference is the long lived stable connections. They make Zuul Push somewhat stateful.
#49: Persistent connections are great from client’s point of view because they improve clients’ efficiency dramatically. Unlike plain HTTP, clients don’t have to make and break connections constantly.
That’s why we all rejoiced when WebSockets appeared in browsers and replaced hacks like long poll or Comet.
#50:
But they are terrible from the point of view of anyone operating a server. Mainly because they complicate deployments and rollbacks. Let’s say you deploy a new build to fix some urgent issue. Your push clients will still be happily connected to your old cluster. Because they open connection once and then hang on to that connection for their lifetime. They won’t migrate to the new cluster just because you deployed the new build. You’d have to force them to migrate by killing the old cluster. But if you do that, they will all swarm to the new cluster at the same time like a thundering herd. It’s a lose-lose scenario.
Thundering herd is large number of clients, all trying to connect at the same time. This causes a sudden and large spike in traffic that is order of magnitudes higher than your steady state traffic. It’s one of the thing you have to watch out for when you are trying design a robust, resilient system.
#51:
We found our way out of this pickle by limiting client connection lifetime. We auto-close the client connection after certain time. Our clients are coded to reconnect back whenever they lose connection to the server. So the client will connect back and every time it does so it will most probably land on a different server. This limits client’s stickiness to a single server.
We have tuned this connection lifetime period carefully to strike a good balance between client efficiency which we desire and client stickiness which we are trying to avoid. Empirically, we have found, somewhere between 25 to 35 minutes is the sweet spot.
#52:
Not only we limit connection’s lifetime, we also randomize it within some band, every time the client reconnects. This means different clients end up with slightly different connection lifetimes (somewhere between 28 to 32 minutes in our case) after which they will disconnect and reconnect back.
#53: This randomization ensures that a random network-wide blip doesn’t end up accidentally synchronizing millions of connections’ reconnect schedules causing a thundering herd which would then repeat every 30 minutes after that. The only thing worse than a thundering herd is a recurring thundering herd!
#54:
This is an extra optimization. I know I said earlier we auto-close client connection from server side but that’s not entirely accurate. Instead in the latest version, our server sends a special message to the client - using the same push channel - telling it to close the connection from the client side. Because of the way TCP works, the party that closes the connection enters the TCP TIME_WAIT state. This state can consume the file descriptor of that connection for up to 2 minutes on Linux. Since our server is handling tens of thousands of open connections simultaneously, server’s file descriptors are far more valuable than client’s file descriptors. By having client close the connection, we conserve server’s file descriptors.
There is a flip side to this optimization though. You have to be prepared to handle misbehaving clients that won’t close their connections when told by the server. To handle such clients we start a timer when we send them CLOSE CONNECTION message and then close the connection forcefully from the server-side if the client doesn’t comply within set time limit.
#55: So we took care of stateful, sticky connections problem. Next we focused our attention on optimizing our push cluster size. Our big epiphany here was most of the connections were idle, most of the time. This meant neither memory nor CPU was under a lot of pressure even with large number of open connections.
#56: So we chose a big Amazon instance type for our push server, carefully tuned its Linux TCP kernel parameters and JVM options and packed it with as many connections as possible...
ulimit -n 262144
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 87380 16777216"
sysctl -w net.core.somaxconn=65536
sysctl -w net.ipv4.tcp_max_syn_backlog=65536
-Xmx3g -Xms3g
-XX:MaxDirectMemorySize=256m
#58: And we got a visit from our dear old friend - the thundering herd! All those thousands and thousands of clients from that single server came roaring back with reconnects.
You know you have a problem when loss of a single server can start a stampede!
#59: So we licked our wounds, learnt from our mistake, and tried the “Goldilocks” strategy for the second round. You don’t want to run your server either too hot or too cold. So we found the instance size that’s just right for us
m4.large
2 vCPU
8 GB
84K connections / box
#60: The main lesson here is you should really optimize for the actual cost of your cluster not just low instance count. I know, when stated like that it seems obvious but it wasn’t obvious to us initially because we conflated small push cluster size - low number of push server instances - with efficient operation. In reality more number of cheaper instances are preferable to fewer number of big instances, cost being equal.
Being able to support millions of connections on a single box is certainly impressive technically but it will eventually come back to bite you in production.
And even if you don’t have huge traffic volume like Netflix, it may still make sense to use more number of smaller servers instead of few BIG servers. Mostly because using smaller servers gives you more cost-efficient autoscaling at lower traffic volume. At low enough traffic volume, you may only need couple of big servers to handle all your connections. But then you can’t autoscale up and down efficiently to match your traffic since your step size is big - a single server taking somewhere between 25 to 50% of all your traffic. You can fit your traffic curve with much more efficiently - in terms of autoscaling - with small increment/decrement steps, that is small servers.
#61: Next problem we ran into was autoscaling. How do we autoscale our push cluster as the traffic goes up and down?
#62: Our two go to strategies for auto-scaling REST services are either autoscale on RPS - requests per second or on CPU load average. Both are ineffective for Zuul Push. There is no continuous RPS - thanks to persistent, long lived connections and CPU is mostly idle as we saw earlier.
So how do you autoscale?
#63: Real limiting factor for a push server is the number of open connections per box. So it makes perfect sense to auto-scale by average number of open connections per box.
Thankfully AWS makes it easy to autoscale on anything as long as you can export it as a custom cloud-watch metric from your app. We export number of open connections from our server process.
#64: Final problem we had to solve was to make Amazon’s Classic Load Balancers play nice with WebSockets. Our push servers sit behind Amazon’s Classic Load Balancers or CLBs for short.
Unfortunately CLBs can not proxy WebSocket connections. When WebSocket client - like browser - wants to open a WebSocket it sends a special HTTP request to the server called WebSocket upgrade request. If the server supports WebSockets it returns a special “Switching protocols” response and upgrades the original HTTP connection to a long lived WebSocket connection. CLBs do not understand this initial WebSocket upgrade requests. They treat it as any other HTTP request and tear down the connection as soon as server returns the response. So you can’t have persistent WebSocket connections through CLBs.
BTW, this is issue is not specific to just CLBs. You’d run into similar issues with any reverse proxy or a load balancer that does not understand WebSocket protocol natively.
#65:
We found a way around this by running our CLBs in TCP load balancing mode. Normally CLBs run as HTTP load balancers and do layer 7 load-balancing. But you can configure them to run as TCP load balancers. If you do that, you force them to do load balancing at layer 4. In this mode they just proxy TCP packets back and forth without trying to parse any layer 7 application protocol which would be HTTP in this case. This keeps CLBs from mangling the WebSocket upgrade requests that they do not understand.
#66:
Good thing about CLBs in TCP mode is that they can still terminate TLS. This means you can still offload SSL handling to CLBs.
#67: Flip side of WebSockets is they are vulnerable to cross site request forgery if not properly secured. To secure them against CSRF, web server must ensure the “Origin” header has correct value before accepting the incoming WebSocket connection. Thankfully, Zuul Push server already does this for you.
#68: Deregistering server from an CLB kills all client connections to that server instantaneously.
Whenever we deploy new build, we deregister old instances from CLB so that they no longer receive any traffic. What we ideally want in this case is for the CLB to not send old instances any new traffic but let the existing connections on those instances continue for the rest of their natural connection lifetime.
However, by default, ELBs kill all connections to an instance as soon as you deregister it from the ELB.
Fortunately it is possible to make CLBs behave in the manner we want. AWS console has an CLB setting called “connection draining”. Once you enable it and set it to a high enough timeout value, CLBs will gradually drain client connections from your old, out of traffic servers and let them migrate to new servers over time.
Once you have made all these tweaks, your CLB will handle lots and lots of WebSocket connections happily, no problem.
#69: I do want to note here that Amazon has recently introduced a new load balancer type - ALB, short for application load balancer that does understand WebSocket protocol. Unfortunately it came too late for us. By then we had already figured how to get CLBs to do what we wanted. But if you are starting today, you may want to give ALB a try first.
#70: May be 20 to 30 minutes. This limits stateful, sticky issues.
#71: To spread out reconnect peaks as time progresses
#72: As long as final cost stays the same. This helps in limiting the size of a thundering herd.
#73: As CPU or RPS are not a good proxy for a load on the system for push cluster
#74: Most load balancers like HAProxy etc. let you to do load balancing at layer 4, TCP level.
Most of these operational best practices are already built into Zuul Push.
#75: Finally, what can you do with this push messaging capability? Now that we finally have our push hammer in production we are seeing a lot of nails….
#76: Our recent integration with Alexa is one good example. Suppose user asks Alexa to play “Stranger Things”. The actual speech recognition of user’s spoken command happens in the Cloud using Alexa voice processing service. So now we need an ultra-low latency mechanism to transmit this synthesized command from the cloud to the Netflix application running on user’s TV. The application polling the cloud at fixed intervals clearly won’t do here.
Push messaging to the rescue!
#77: We have even more exciting plans for using push in the future.
For example, we could auto detect a client that is generating lots of errors and send that client a push message asking it to upload its state and any other relevant diagnostic to cloud.
#78: And if all that the diagnostic data still doesn’t help we could reach for the oldest tool in every software engineer’s toolbox and restart the application. Now we could do it remotely. What could go wrong?
#79: But if something does go wrong, we can now send you a push message, saying “We are sorry”
#80: Hopefully, these examples have already got you thinking about how you can use push messaging to add novel and rich functionality to your applications.
#81: I have been pleading the case for PUSH for the last 40 minutes. Now I have just one last request to make at this point...
#82: I have been pleading the case for PUSH for the last 40 minutes. Now I have just one last request to make at this point...
#83: All of the things we have discussed so far, all of it is open source. You can find it in the project Zuul under Netflix OSS on Github. It even comes with a sample, toy push server example that you can start playing with immediately.
So go ahead, give it a spin. File bugs. And if you would be so kind, may be even send us a pull request or two..