Scaling Push Messaging for Millions of Netflix Devices

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling Push Messaging for Millions
of Netflix Devices
Susheel Aroskar
Senior Software Engineer
Netflix Inc/Edge Gateway
8 9 7 1 5

Why do we need push?

How I spend my time in Netflix application

Agenda

Agenda
• What is Push?

Agenda
• What is Push?
• How you can build it?

Agenda
• What is Push?
• How you can operate it?

Agenda
• What is Push?
• How you can operate it?
• What can you do with it?

Susheel Aroskar
Senior Software Engineer
Cloud Gateway
saroskar@netflix.com
github.com/raksoras
@susheelaroskar

PERSIST
UNTIL
SOMETHING
HAPPPENS

Zuul Push Architecture

WebSockets
Zuul Push
Servers
SSE

Register
User
WebSockets SSE
Zuul Push
Servers
Push
Registry

Register
User
WebSockets SSE
Zuul Push
Servers
Push
Registry
Register
User

Register
User
WebSockets
Push Library
SSE
Zuul Push
Servers
Push
Registry
Register
User

Register
User
WebSockets
Push Library
Push
Message
Queue
SSE
Zuul Push
Servers
Push
Registry
Register
User

Register
User
WebSockets
Push Library Message
Processor
Push
Message
Queue
SSE
Zuul Push
Servers
Push
Registry
Register
User

Register
User
Looku
p
Server
WebSockets
Processor
Push
Message
Queue
SSE
Zuul Push
Servers
Push
Registry
Register
User

Register
User
Looku
p
Server
Deliver
Message
WebSockets
Processor
Push
Message
Queue
SSE
Zuul Push
Servers
Push
Registry
Register
User

Zuul Push Server
Handling millions of
persistent connections

C10K Challenge
Handling 10,000+ connections on a single box

Socket Socket
Thread per Connection Model
Thread-1 Thread-2
Read
Write
Write
Read

Socket Socket
Thread per Connection Model
Thread-1 Thread-2
Read
Write
Write
Read
Async I/O Model
Socket
read
callback
write
callback
Socket
Single
Threadread
callback
write
callback
Vs

S
O
C
K
E
T
Channel
Inbound
Handler
Channel
Inbound
Handler
Channel
Outbound
Handler
Channel
Outbound
Handler
Channel Pipeline
Head Tail
Typical Netty Program Structure

Netty Channel Handler Pipeline
protected void addPushHandlers(ChannelPipeline pl) {
pl.addLast(new HttpServerCodec());
pl.addLast(new HttpObjectAggregator());
pl.addLast(getPushAuthHandler());
pl.addLast(new WebSocketServerCompressionHandler());
pl.addLast(new WebSocketServerProtocolHandler());
pl.addLast(getPushRegistrationHandler());
}

Plug in your custom authentication policy
Authenticate by Cookies, JWT or any other custom scheme

Push Registry
Tracking clients’ connection
metadata in real time

Custom Push Registration Handler
public class MyRegistration extends PushRegistrationHandler {
@Override
protected void registerClient(
ChannelHandlerContext ctx,
PushUserAuth auth,
PushConnection conn,
PushConnectionRegistry registry) {
super.registerClient(ctx, authEvent, conn, registry);
ctx.executor().submit(() -> storeInRedis(auth));
}
}

Push Registry Features Checklist

• Low read latency

• Automatic record expiry

• Sharding

• Sharding
• Replication

Few Good Choices for Push Registry

We Use Dynomite for Our Push Registry
https://github.com/Netflix/dynomite
Redis
+ Auto-sharding
+ Read/Write quorum
+ Cross-region replication
Dynomite

Message Processing
Queue, route and deliver

We use Kafka message
queues to decouple
message senders from
receivers

Fire and Forget Message Delivery

We Deliver Messages Across Regions

Use Different Queues for Different Priorities

We run multiple message processor
instances in parallel to scale our
message processing throughput.

Operating Zuul Push
Different than REST of them

Persistent Connections Make Zuul Push Server Stateful

Long lived stable connections are great for client efficiency

Long lived stable connections are great for client efficiency
but they are terrible for quick deploy or rollback

If You Love Your Clients Set Them Free
Tear down connections periodically

Randomize Each Connection’s Lifetime to
automatically dampen any reconnect storms

Reconnection Storm Under Randomized Connection
Lifetimes
Time
Numberofreconnects

Instead of Closing Connection From Server side,
Ask Client to Close it.

How to Optimize the Push Server
Most connections are idle!

Use BIG Server to Handle Tons of Connections
ulimit -n 262144
net.ipv4.tcp_rmem="4096 87380
16777216"
net.ipv4.tcp_wmem="4096 87380
16777216"
-Xmx3g -Xms3g
-XX:MaxDirectMemorySize=256m

Goldilocks Strategy - Just Right Server Size

Optimize for Cost, NOT Instance Count
$$ $$
❌

How to Auto-scale Push Cluster?

• Requests per second?
• CPU??
Amazon Autoscaling

Number of open connections per server
Amazon CloudWatch

Using Classic Load Balancers with WebSockets
Classic Load Balancers do not proxy WebSockets

Solution I - Run CLB as a TCP Load Balancer
7 Application
6 Presentation
5 Session
4. Transport
3. Network
2. Data Link
1. Physical
HTTP
TLS
TCP
IP
Ethernet
Layer 7 HTTP load balancer
Layer 4 TCP load balancer
OSI 7 network layers model HTTP

Solution 1 - The Good
TLS Termination

Solution 1 - The Good, The Bad
TLS Termination Cross Site Request Forgery

Solution 1 - The Good, The Bad and The Ugly
TLS Termination Cross Site Request Forgery
Deregister == Disconnect

Solution II - Use Application Load Balancers
Application Load Balancers can proxy WebSockets natively

A Quick Recap of Push Operation Best Practices
• Recycle connections periodically

• Randomize connection lifetime

• More number of small servers >> few BIG servers

• Auto-scale on number of open connections per box

• Auto-scale on number of open connections per box
• Either use ALB or CLB in TCP mode

If You Build It, They Will Push
Push messaging use cases

Alexa + Netflix = Weekend Lost
Speech Recognition,
Lambda
“Alexa play Stranger Things”
Zuul Push
Alexa Voice Service

Trigger On-demand diagnostics
by sending a push message

Remote recovery by push message

One Last Call to Action

One Last Call to Action - PULL!

One Last Call to Action - PULL!
https://github.com/Netflix/zuul

In Conclusion
Push can make you

In Conclusion
Push can make you
Rich (in functionality),

In Conclusion
Push can make you
Thin (by getting rid of polling)

In Conclusion
Push can make you
Thin (by getting rid of polling)
and Happy!

Thank you!
Susheel Aroskar
saroskar@netflix.com
@susheelaroskar

Scaling Push Messaging for Millions of Netflix Devices

Recommended

More Related Content

What's hot (20)

Similar to Scaling Push Messaging for Millions of Netflix Devices (20)

Recently uploaded (20)

Scaling Push Messaging for Millions of Netflix Devices

Editor's Notes