This project intends to provide a complete description and re-implementation of the WhatsApp Web API, which will eventually lead to a custom client. WhatsApp Web internally works using WebSockets; this project does as well.
Before you can run the application, make sure that you have the following software installed:
- Node.js (at least version 8, as the
awaitsyntax is used)
- the CSS preprocessor Sass (which you previously need Ruby for)
- Python 2.7 with the following
git+https://github.com/dpallot/simple-websocket-server.gitfor acting as WebSocket server and client
pycryptofor the encryption stuff
pyqrcodefor QR code generation
Before starting the application for the first time, run
npm install to install all dependencies. Lastly, to finally launch it, just run
npm start. Using fancy
nodemon magic, all three local components will be started after each other and when you edit a file, the changed module will automatically restart to apply the changes.
The project is organized in the following way. Note the used ports and make sure that they are not in use elsewhere before starting the application.
Login and encryption details
WhatsApp Web encrypts the data using several different algorithms. These include AES 256 ECB, Curve25519 as Diffie-Hellman key agreement scheme, HKDF for generating the extended shared secret and HMAC with SHA256. Starting the WhatsApp Web session happens by just connecting to one of its websocket servers at
wss:// means that the websocket connection is secure;
w[1-8] means that any number between 1 and 8 can follow the
w). Also make sure that, when establishing the connection, the HTTP header
Origin: https://web.whatsapp.com is set, otherwise the connection will be rejected.
When you send messages to a WhatsApp Web websocket, they need to be in a specific format. It is quite simple and looks like
1515590796,["data",123]. Note that apparently the message tag can be anything. This application mostly uses the current timestamp as tag, just to be a bit unique. WhatsApp itself often uses message tags like
1234.--0 or something like that. Obviously the message tag may not contain a comma. Additionally, JSON _objects_are possible as well as payload.
To log in at an open websocket, follow these steps:
- Generate your own
clientId, which needs to be 16 base64-encoded bytes (i.e. 25 characters). This application just uses 16 random bytes, i.e.
- Decide for a tag for your message, which is more or less arbitrary (see above). This application uses the current timestamp (in seconds) for that. Remember this tag for later.
- The message you send to the websocket looks like this:
messageTag,["admin","init",[0,2,7314],["Long browser description","ShortBrowserDesc"],"clientId",true].
- Obviously, you need to replace
clientIdby the values you chose before
[0,2,7314]part specifies the current WhatsApp Web version. The last value changes frequently. It should be quite backwards-compatible though.
"Long browser description"is an arbitrary string that will be shown in the WhatsApp app in the list of registered WhatsApp Web clients after you scan the QR code.
"ShortBrowserDesc"has not been observed anywhere yet but is arbitrary as well.
- Obviously, you need to replace
- After a few moments, your websocket will receive a message in the specified format with the message tag you chose in step 2. The JSON object of this message has the following attributes:
status: should be 200
ref: in the application, this is treated as the server ID; important for the QR generation, see below
ttl: is 20000, maybe the time after the QR code becomes invalid
update: a boolean flag
curr: the current WhatsApp Web version, e.g.
time: the timestamp the server responded at, as floating-point milliseconds, e.g.
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#qr-code-generation)QR code generation
- Generate your own private key with Curve25519, e.g.
- Get the public key from your private key, e.g.
- Obtain the string later encoded by the QR code by concatenating the following values with a comma:
- the server ID, i.e. the
refattribute from step 4
- the base64-encoded version of your public key, i.e.
- your client ID
- the server ID, i.e. the
- Turn this string into an image (e.g. using
pyqrcode) and scan it using the WhatsApp app.
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#after-scanning-the-qr-code)After scanning the QR code
- Immediately after you scan the QR code, the websocket receives several important JSON messages that build up the encryption details. These use the specified message format and have a JSON _array_ as payload. Their message tag has no special meaning. The first entry of the JSON array has one of the following values:
Conn: array contains JSON object as second element with connection information containing the following attributes and many more:
battery: the current battery percentage of your phone
browserToken(could be important, but not used by the application yet)
clientToken(could be important, but not used by the application yet)
phone: an object with detailed information about your phone, e.g.
platform: your phone OS, e.g.
pushname: the name of yours you provided WhatsApp
serverToken(could be important, but not used by the application yet)
wid: your phone number in the chat identification format (see below)
Stream: array has four elements in total, so the entire payload is like
Props: array contains JSON object as second element with several properties like
videoMaxEdge(960) and others
- You are now ready for generating the final encryption keys. Start by decoding the
Connas base64 and storing it as
secret. This decoded secret will be 144 bytes long.
- Take the _first 32 bytes_ of the decoded secret and use it as a public key. Together with your private key, generate a shared key out of it and call it
sharedSecret. The application does it using
privateKey.get_shared_key(curve25519.Public(secret[:32]), lambda a:a).
- Use a key containing 32 null bytes to encode the shared secret using HMAC SHA256. Take this value and extend it to 80 bytes using HKDF. Call this value
sharedSecretExpanded. This is done with
HKDF(HmacSha256("\0"*32, sharedSecret), 80).
- This step is optional, it validates the data provided by the server. The method is called HMAC validation. Do it by first calculating
HmacSha256(sharedSecretExpanded[32:64], secret[:32] + secret[64:]). Compare this value to
secret[32:64]. If they are not equal, abort the login.
- You now have the encrypted keys: store
sharedSecretExpanded[64:] + secret[64:]as
- The encrypted keys now need to be decrypted using AES with
sharedSecretExpanded[:32]as key, i.e. store
keysDecryptedvariable is 64 bytes long and contains two keys, each 32 bytes long. The
encKeyis used for decrypting binary messages sent to you by the WhatsApp Web server or encrypting binary messages you send to the server. The
macKeyis needed to validate the messages sent to you:
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#validating-and-decrypting-messages)Validating and decrypting messages
Now that you have the two keys, validating and decrypting messages the server sent to you is quite easy. Note that this is only needed for _binary_ messages, all JSON you receive stays plain. The binary messages always have 32 bytes at the beginning that specify the HMAC checksum.
- Validate the message by hashing the actual message content with the
messageContentis the _entire_binary message):
HmacSha256(macKey, messageContent[32:]). If this value is not equal to
messageContent[:32], the message sent to you by the server is invalid and should be discarded.
- Decrypt the message content using AES and the
The data you get in the final step has a binary format which is described in the following. Even though it’s binary, you can still see several strings in it, especially the content of messages you sent is quite obvious there.
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#binary-message-format)Binary message format
The Python script
backend/decoder.py implements the
MessageParser class. It is able to create a JSON structure out of binary data in which the data is still organized in a rather messy way. The section about Node Handling below will discuss how the nodes are reorganized afterwards.
MessageParser initially just needs some data and then processes it byte by byte, i.e. as a stream. It has a couple of constants and a lot of methods which all build on each other.
- _Tags_ with their respective integer values
- _LISTEMPTY: 0
- _STREAM8: 2
- _DICTIONARY0: 236
- _DICTIONARY1: 237
- _DICTIONARY2: 238
- _DICTIONARY3: 239
- _LIST8: 248
- _LIST16: 249
- _JIDPAIR: 250
- _HEX8: 251
- _BINARY8: 252
- _BINARY20: 253
- _BINARY32: 254
- _NIBBLE8: 255
- _Tokens_ are a long list of 151 strings in which the indices matter:
[None,None,None,"200","400","404","500","501","502","action","add", "after","archive","author","available","battery","before","body", "broadcast","chat","clear","code","composing","contacts","count", "create","debug","delete","demote","duplicate","encoding","error", "false","filehash","from","g.us","group","groups_v2","height","id", "image","in","index","invis","item","jid","kind","last","leave", "live","log","media","message","mimetype","missing","modify","name", "notification","notify","out","owner","participant","paused", "picture","played","presence","preview","promote","query","raw", "read","receipt","received","recipient","recording","relay", "remove","response","resume","retry","s.whatsapp.net","seconds", "set","size","status","subject","subscribe","t","text","to","true", "type","unarchive","unavailable","url","user","value","web","width", "mute","read_only","admin","creator","short","update","powersave", "checksum","epoch","block","previous","409","replaced","reason", "spam","modify_tag","message_info","delivery","emoji","title", "description","canonical-url","matched-text","star","unstar", "media_key","filename","identity","unread","page","page_count", "search","media_message","security","call_log","profile","ciphertext", "invite","gif","vcard","frequent","privacy","blacklist","whitelist", "verify","location","document","elapsed","revoke_invite","expiration", "unsubscribe","disable"]
- Unpacking nibbles: Returns the ASCII representation for numbers between 0 and 9. Returns
.for 11 and
- Unpacking hex values: Returns the ASCII representation for numbers between 0 and 9 or letters between A and F (i.e. uppercase) for numbers between 10 and 15.
- Unpacking bytes: Expects a tag as an additional parameter, namely _NIBBLE_8_ or _HEX8. Unpacks a nibble or hex value accordingly.
- Byte: A plain ol’ byte.
- Integer with N bytes: Reads N bytes and builds a number out of them. Can be little or big endian; if not specified otherwise, big endian is used. Note that no negative values are possible.
- Int16: An integer with two bytes, read using Integer with N bytes.
- Int20: Consumes three bytes and constructs an integer using the last four bits of the first byte and the entire second and third byte. Is therefore always big endian.
- Int32: An integer with four bytes, read using Integer with N bytes.
- Int64: An integer with eight bytes, read using Integer with N bytes.
- Packed8: Expects a tag as an additional parameter, namely _NIBBLE_8_ or _HEX8. Returns a string.
- First reads a byte
nand does the following
n&127many times: Reads a byte
land for each nibble, adds the result of its _unpacked version_ to the return value (using _unpacking bytes_ with the given tag). Most significant nibble first.
- If the most significant bit of
nwas set, removes the last character of the return value.
- First reads a byte
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#variable-length-integers)Variable length integers
In contrast to the previous number formats, reading a _variable length integer_ (VLI) does _not_ change the current data pointer. First, the length
l of the VLI is read by reading bytes until a byte with the most significant bit set is encountered, but at most 10 bytes. TODO _Ranged variable length integers_ expect a minimum and a maximum value. If the read _variable length integer_ is less then the minimum or greater than or equal to the maximum, throw an error.
- Read bytes: Reads and returns the specified number of bytes.
- Check for list tag: Expects a tag as parameter and returns true if the tag is
LIST_16(i.e. 0, 248 or 249).
- Read list size: Expects a list tag as parameter. Returns 0 for
LIST_EMPTY, returns a read byte for
LIST_8or a read _Int16_ for
- Read a string from characters: Expects the string length as parameter, reads this many bytes and returns them as a string.
- Get a token: Expects an index to the array of Tokens, and returns the respective string.
- Get a double token: Expects two integers
band gets the token at index
Reading a string needs a _tag_ as parameter. Depending on this tag, different data is read.
- If the tag is between 3 and 235, the _token_ (i.e. a string) of this tag is got. If the token is
"c.us"is returned instead, otherwise the token is returned as is.
- If the tag is between _DICTIONARY_0_ and _DICTIONARY3, a _double token_ is returned, with
tag-DICTIONARY_0as first and a read byte as second parameter.
- _LISTEMPTY: Nothing is returned (e.g.
- _BINARY8: A byte is read which is then used to _read a string from characters_ with this length.
- _BINARY20: An _Int20_ is read which is then used to _read a string from characters_ with this length.
- _BINARY32: An _Int32_ is read which is then used to _read a string from characters_ with this length.
- First, a byte is read which is then used to _read a string_
iwith this tag.
- Second, another byte is read which is then used to _read a string_
jwith this tag.
jare joined together with an
@sign and the result is returned.
- First, a byte is read which is then used to _read a string_
- _NIBBLE_8_ or _HEX8: A _Packed8_ with this tag is returned.
Reading an attribute list needs the number of attributes to read as parameter. An attribute list is always a JSON object. For each attribute read, the following steps are executed for getting key-value pairs (exactly in this order!):
- Key: A byte is read which is then used to _read a string_ with this tag.
- Value: A byte is read which is then used to _read a string_ with this tag.
A node always consists of a JSON array with exactly three entries: description, attributes and content. The following steps are needed to read a node:
- A _list size_
ais read by using a read byte as the tag. The list size 0 is invalid.
- The description tag is read as a byte. The value 2 is invalid for this tag. The description string
descris then obtained by _reading a string_ with this tag.
- The attributes object
attrsis read by _reading an attributes object_ with length
(a-2 + a%2) >> 1.
awas odd, this node does not have any content, i.e.
[descr, attrs, None]is returned.
- For getting the node’s content, first a byte, i.e. a tag is read. Depending on this tag, different types of content emerge:
- If the tag is a list tag, a _list is read_ using this tag (see below for lists).
- _BINARY8: A byte is read which is then used as length for reading bytes.
- _BINARY20: An _Int20_ is read which is then used as length for reading bytes.
- _BINARY32: An _Int32_ is read which is then used as length for reading bytes.
- If the tag is something else, a _string is read_ using this tag.
[descr, attrs, content]is returned.
Reading a list requires a _list tag_ (i.e. _LISTEMPTY, _LIST_8_ or _LIST16). The length of the list is then obtained by _reading a list size_ using this tag. For each list entry, a node is read.
(https://github.com/sigalor/whatsapp-web-reveng/blob/master/README.md#whatsapp-web-api)WhatsApp Web API
WhatsApp Web itself has an interesting API as well. You can even try it out directly in your browser. Just log in at the normal https://web.whatsapp.com/, then open the browser development console. Now enter something like the following (see below for details on the chat identification):
window.Store.Wap.profilePicFind("email@example.com").then(res => console.log(res));
window.Store.Wap.lastseenFind("firstname.lastname@example.org").then(res => console.log(res));
window.Store.Wap.statusFind("email@example.com").then(res => console.log(res));
Using the amazing Chrome developer console, you can see that
The WhatsApp Web API uses the following formats to identify chats with individual users and groups of multiple users.
[country code][number]@c.us, e.g.
firstname.lastname@example.org you are from Germany and your phone number is
[phone number of group creator]-[timestamp of group creation]@g.us, e.g.
email@example.com the group that
firstname.lastname@example.org on November 5 2017.
There are two types of WebSocket messages that are exchanged between server and client. On the one hand, plain JSON that is rather unambiguous (especially for the API calls above), on the other hand encrypted binary messages. Unfortunately, these binary ones cannot be looked at using the Chrome developer tools. Additionally, the Python backend, that of course also receives these messages, needs to decrypt them, as they contain encrypted data. The section about encryption details discusses how it can be decrypted.
- Allow sending messages as well. Of course JSON is easy, but _writing_ the binary message format needs to start being examined.
- Allow reusing the session after successful login. Probably normal cookies are best for this.
- An UI that is not that technical, but rather starts to emulate the actual WhatsApp Web UI.
- The _Node Handling_ section. Could become very long.
- The _Disclaimer_ section. Should contain stuff like “no warranty” and “don’t do bad stuff”.
- Outsource the different documentation parts into their own files, maybe into the
2018-04-09 19:53 +0000