WhatsApp Web: How does it work?
Whatsapp have launched an online/web version of their overly popular smartphone messaging app.
I was very much interested in seeing the architecture of this app because as far as we knew, they never stored messages on their server but all the data was only stored in users phone. So I started to look under the hood of the webapp and what we saw was a beauty.
First let me list down the FrameWorks they have used in creating this app.
Look at the following points:
- Velocity.js : Velocity is an animation engine with the same API as jQuery's $.animate(). It works with and without jQuery. It's incredibly fast, and it features color animation, transforms, loops, easings, SVG support, and scrolling. It is the best of jQuery and CSS transitions combined.
These are the major pieces. They have been using secured websockets for communication with your phone through there server. I wonder why they didn't use webrtc's dataChannel there.
If we go through the concept and ask this question answer became clear, because only android would have supported that.
They are using Chrome's FileSystem Api which makes their application Chrome specific. In this case even data channels could have been used, as it negates the previous argument. We think the reason for not using WebRtc based data channel is to avoid difficulty of setting up the initial connection, which websockets are solving by putting a server in between.
They seem to be using Google Material Design principles.
So, we see they have modified form of XMPP present in there chat protocol and they are forwarding stanzas which there phone receives to the webclient. So, to use the webclient phone should be on and working. And every communication that happens on webclient actually would go via your phone. After the above discussion we can say webclient is a just a proxy UI for your phone.
What does this means?
More data transfer over phone. Check your data usage.
More battery consumption because of data transfer.
Though the web-client of whatsapp makes our life easier, it does comes at a cost.
Going ahead we would try to see if we can write some sort of chrome plugin which can get me some data out of Wa object of js and store the data on my server. Keeping my fingers crossed. Though we think we can still write a dumb plugin to parse html and get me the data, but let me first attempt a more elegant solution if possible.
Edit: How does initial handshake takes place?
So we open the webclient and we see a QR code, how does that happen? What happens in the background?
Look at the below image, it would explain the steps:
It first sent the details of the current client, OS, browser and session id. Then it sent the stored session details about the connected client. It got 401 unauthorized request from the server saying that current session is logged out and it needs to create a new one. I think the third frame is the ttl frame.
When using our client we scan the QR code, the mobile client connects over websocket to the same channel as specified by the QR code and then sends its initial info.
Look at the below image and see the frames received on the webclient below:
In the above image we can see the selected frame is the most interesting one as it has all the data from the mobile client to the web client.
As we can see the info also has battery state and is the phone charging or not also sent in this frame.
I hope, I answered your question.