------------------------------------------------------------------------------- Protocol Buffers This is a BINARY streaming data format, defined by Google. https://developers.google.com/protocol-buffers/ It is defined as a structure in multiple languages, and converted to functions to convert to a I/O streaming format. Its purpose is to pack information very efficently in a stream/file format. However is supposed to handle updates, so additions can continue to work with old data (eg: it is undefined) ------------------------------------------------------------------------------- A XML data structure could be... John Doe jdoe@example.com A human Readable form would be... person { name: "John Doe" email: "jdoe@example.com" } Structure Definition (in C, C++) The numbers are 'field numbers' which is what is sent. message person { required string name = 1; optional string email = 2; optionat int32 id = 3; } But it is stored/sent in a binary format (28 bytes)... Attempt to work out stream requirments (assuming no compression) person. 0 (no header) name. 1 <- field marker (for feild 1) John Doe. 9 <- string + length email. 1 <- field marker (for field 2) jdoe@example.com 17 <- string + length -- 28 ------------------------------------------------------------------------------- Messages... Keys are two values in one byte... field number, and its 'wire type' The field is in the upper 5 bytes, and its type in lower 3 bits There is no defined order for sending of fields. unknown/ungven fields should have sensible defaults. obsolete field should be ignored and field number NOT reused. wire types... 0 integer follows 1 64 bit - double float (just a 4 byte lump of data) 2 length delimited (strings, packed repeated fields, embedded structure) 3,4 groups (depreciated) 5 32 bit - float --- Integer values are single byte for values 0-127 With MSB used to say more significant values follow. Value 300 becomes... 300 -> 100101100 convert to binary -> 000 0010 , 010 1100 break into groups of 7 -> 010 1100 , 000 0010 least significant value first -> 1010 1100 , 0000 0010 Add MSB on all but last -> 0xAC , 0x02 two bytes to send... so a message (in hex) 08 96 01 Which means... 08 field 0, integer, 96 01 value 150 Negative Signed integers are always ten bytes long! (as if it is a unsigned integer) Unless a ZigZag method is used where number is absolute but with the LSB set if negative. value encoded integer 0 0 -1 1 1 2 -2 3 Encoding 'n' (32 bit) using bit rolls... e = (n << 1) ^ (n >> 31) That is the number is XORed with the sign --- Strings are sent length, value EG: message Test2 { optional string b = 2; } with b set to 'testing" (7 characters) 12 07 74 65 73 74 69 6e 67 ^ ^ t e s t i n g | `- length `-field 2, string --- Embedded structures are encoded, and the packed like string (type and length) -------------------------------------------------------------------------------