Reverse Engineering keyboard firmware with Ghidra - Part 1
In March 2019, the NSA (yes, that NSA) released a reverse engineering tool called Ghidra. This is pretty cool, as it’s relatively easy to use, pretty powerful, and free as in both speech and beer (compared to the similar and popular IDA Pro which is not).
Around a year earlier, I’d “upgraded” my non-backlit Ducky One TKL keyboard to a backlit one: The non-backlit version is identical to the backlit version, they just don’t install the LEDs. So, as an aftermarket mod, you can solder in your own LEDs, then flash the backlit version’s firmware and voilà - a backlit keyboard, good as new (and with your own choice of colour - I chose a nice fiery orange).
This modification process is even described in the user manual:
As with most consumer electronics, the firmware update tool from Ducky is a Windows only affair. (Un)Luckily, I do have a Windows machine which runs my VR set-up, but the idea of extracting the firmware from the proprietary updater (to be able to flash it from Linux or anything else) sat lodged in the back of my mind.
It’s worth saying that Sprite_tm went through a similar process to hack his Coolermaster keyboard to add a Snake game, and his write-up is good reading, containing some great tips and ideas to learn from.
Poking at the firmware updater in a hex editor shows that the firmware itself
isn’t just stored in plain text
anywhere in the .exe
file, so it must be obfuscated (“encrypted”) in some way.
This is pretty common for firmware update programs (why??).
So, April 2019 comes around, and I come across some articles about Ghidra, and thought “I’ve never tried reverse engineering a binary before, why don’t we have a play with Ghidra and the Ducky firmware updater?”
And so, here we are. This is Part 1, because I haven’t been entirely successful yet.1
Disclaimer:
Before this, I’ve never used any kind of decompiler or reverse engineering tool, and never attempted to reverse an unknown binary. I have no idea what I’m doing, but by writing it down here the internet can tell me what I’m doing wrong, and I can refer back to it later.
1. Getting Ghidra
You can download Ghidra from ghidra-sre.org. At the time of writing the current version is v9.1.2.
I’ve only run it on Linux, which is as simple as execting the ghidraRun
script in the download package. There’s a ghidraRun.bat
for Windows too, and
it’s Java-based so I assume it Just Works™.
2. Import and analyse
Once you’ve created a new project in Ghidra, you can click File -> Import File,
and select whatever it is you want to poke around at. It will try to autodetect
the type of file (e.g. Portable Executable, x86:LE:default:windows
for my
Windows .exe), but you can also tell it manually what it’s looking at if you
have a raw binary dump or something.
Once the file is imported, double click it in the file list, a dragon will fly towards your face (no, really!), and the main window will open. It will ask you if you want to “analyze it now” - to which you should answer “Yes”, leave the default options, and then let it work its magic.
At this point, Ghidra will work through the binary, decompiling the program and working out functions, symbols, classes, variables and so on as best it can.
Honestly, it feels like magic.
Once it’s done, you’ll have a disassembly in the middle of the window (showing the machine code making up the program), lists of symbols on the left, and an empty decompiler window on the right.
You can scroll through the disassembly and double click on any instruction to open up a C-code decompiled version in the decompiler window. Note that this is just what Ghidra has been able to figure out from the machine code! It turns the raw processor instructions back into higher-level C code for ease of analysis.
3. Get to work
Now you have to put in the manual work of figuring out what the program is doing and where. The decompiled code (probably) doesn’t have (m)any variable or function names, and will be somewhat “strange” compared to what a human would have written in the first place, so figuring out what’s going on does take some effort.
In my case, it’s a firmware update tool, so I can assume that the interesting part of the code is going to be something like:
- Extract the firmware blob from somewhere in the
.exe
file - De-obfuscate/“Decrypt” it
- Find the keyboard USB device
- Program the new firmware over USB
My gut feeling was that USB access was going to require some syscalls and I was planning to start by looking for them (aside: my complete lack of knowledge about Windows is going to hamper me here), but to my delight the program is using Holtek’s2 ISP programming library for actually programming the microcontroller, and that has symbols left in!
This means we can see a function called ISP_WriteProgramB
, and that gives us
a starting point to work back from: Where does the data which is being written
by ISP_WriteProgramB
come from?
The XREF
section in the disassembly (highlighted) shows us all of the places
that ISP_WriteProgramB
is called from:
We can double-click any of these to be taken to the call-site. We can also use
the Holtek ISP programming guide linked above to figure out what the calls are doing - for instance the ones where the third argument is '\x01'
are program operations, and '\0'
is a verify.
Working through the XREF
s, the 5th and 6th look the most interesting: these
are in a function containing a bunch of strings like:
"Erasing external...."
, "Programming external...."
, "Programming internal...."
So, this looks like it’s the core function which does the programming. We can rename it from FUN_00449470
to something better (xx_do_programming
) by right-clicking the function name in any of the windows and picking “Rename Function”. I’ve been using the prefix xx_
so that all the things I have renamed are grouped together for ease-of-finding later.
All references to the function will update, so you can see the new name in the
ISP_WriteProgramB
XREF
list, and anywhere else that function is called.
4. GOTO 3
Now we need to keep figuring out what things do, and giving them better names.
Just after the "Programming internal...."
message, there’s a call to
CreateThread
, with FUN_00448fd0
, which calls ISP_WriteProgramB
. With
those clues it’s probably safe to assume that this is the thing that writes the
main program to internal memory. By looking at the Holtek documentation, we
know that the last argument - DAT_00470228
- is the variable storing the
firmware data to be written (xx_internal_data
).
After renaming that, we can find its XREF
s, and see that it’s written to
(W)
in xx_do_programming
on line 49:
That looks like an allocation (the function calls operator_new
), so there’s a
good chance that the argument is the size of the data. The same applies to
“external” data (found nearby the "Programming external..."
message), and with
those labelled, one function call stands out:
This takes the two data pointers and the two size values - just after the memory for the data is allocated. So, this has to be the function which reads and “decrypts” the firmware.
“Decryption”
Here’s the decompiled FUN_0043cb30
/xx_get_fw
function in its entirety, with
the parameters renamed to what we know already:
void __cdecl
xx_get_fw(size_t internal_size,size_t external_size,
void *internal_data,void *external_data)
{
DWORD DVar1;
uint uVar2;
uint uVar3;
FILE *local_10c;
char local_108 [260];
uint local_4;
local_4 = DAT_0046b5f0 ^ (uint)&local_10c;
DVar1 = GetModuleFileNameA((HMODULE)0x0,local_108,0x104);
if (DVar1 != 0) {
_fopen_s(&local_10c,local_108,"rb");
FUN_0043c9e0();
_fseek(local_10c,-0x2e4 - internal_size,2);
_fread(internal_data,1,internal_size,local_10c);
uVar2 = 0;
if (0 < (int)internal_size) {
do {
uVar3 = uVar2 & 0x80000003;
if ((int)uVar3 < 0) {
uVar3 = (uVar3 - 1 | 0xfffffffc) + 1;
}
*(byte *)((int)internal_data + uVar2) =
(&DAT_004700b0)[uVar3] ^
*(byte *)((int)internal_data + uVar2) ^
(byte)uVar2;
uVar2 = uVar2 + 1;
} while ((int)uVar2 < (int)internal_size);
}
if (external_size != 0) {
_fseek(local_10c,(-0x2e4 - internal_size) - external_size,2);
_fread(external_data,1,external_size,local_10c);
uVar2 = 0;
if (0 < (int)external_size) {
do {
uVar3 = uVar2 & 0x80000003;
if ((int)uVar3 < 0) {
uVar3 = (uVar3 - 1 | 0xfffffffc) + 1;
}
*(byte *)(uVar2 + (int)external_data) =
(&DAT_004700b0)[uVar3] ^
*(byte *)(uVar2 + (int)external_data) ^
(byte)uVar2;
uVar2 = uVar2 + 1;
} while ((int)uVar2 < (int)external_size);
}
}
_fclose(local_10c);
FUN_004273e0();
return;
}
FUN_004273e0();
return;
}
Because it’s decompiled, it’s a little hard to read. However, it opens a file,
seeks to a point near the end, and then reads internal_size
bytes into internal_data
:
_fopen_s(&filep,filename,"rb");
xx_get_file_size();
_fseek(filep,-0x2e4 - internal_size,2);
_fread(internal_data,1,internal_size,filep);
After that, it does some funny-looking processing on
it involving XOR (^
). This is
a really good sign.
do {
uVar3 = uVar2 & 0x80000003;
if ((int)uVar3 < 0) {
uVar3 = (uVar3 - 1 | 0xfffffffc) + 1;
}
*(byte *)((int)internal_data + uVar2) =
(&DAT_004700b0)[uVar3] ^
*(byte *)((int)internal_data + uVar2) ^
(byte)uVar2;
uVar2 = uVar2 + 1;
} while ((int)uVar2 < (int)internal_size);
Then, it repeats the process for external_data
.
The opening, seeking and reading is getting the “encrypted” firmware data out
of somewhere near the end of the .exe
file, and all of the XOR-ing is likely to
be “decrypting” it. XOR is used very
commonly to obfuscate data in these
kinds of scenarios, and not very commonly used otherwise - so this function
looks pretty much exactly how I expected the “load and decrypt” function to
look.
The missing piece of the puzzle is what value(s) the data is being XOR-ed with,
and that would be DAT_004700b0
- we’ll call this xx_secret_key
.
We’re piecing bits of the puzzle together, but the list of things to track down is growing:
- We’ve found where
xx_internal_data
is coming from - We don’t know
xx_internal_size
yet, which we need to find out where the data starts:_fseek(filep,-0x2e4 - internal_size,2);
- We don’t know
xx_secret_key
, which we need in order to “decrypt” the data
Secret key
xx_secret_key
is written to from only one function, which does some very
similar-looking stuff: Opens a file, seeks to the end, reads some data, does some XOR-ing:
local_4 = DAT_0046b5f0 ^ (uint)&local_358;
local_34c = 0x48545054;
local_348[0] = 0x53;
local_348[1] = 0x72;
local_348[2] = 0x59;
local_348[3] = 0x47;
local_358 = 0;
local_350 = 0
DVar1 = GetModuleFileNameA((HMODULE)0x0,local_108,0x104);
if (DVar1 == 0) {
FUN_004273e0();
return;
}
_fopen_s(&local_354,local_108,"rb");
xx_get_file_size();
_fseek(local_354,-0x23c,2);
_fread(local_344,1,0x23c,local_354);
_fclose(local_354);
iVar2 = 0;
do {
*(byte *)((int)local_118 + iVar2) =
*(byte *)((int)local_118 + iVar2) ^ local_348[iVar2];
*(byte *)((int)&local_11c + iVar2) =
*(byte *)((int)&local_11c + iVar2) ^ local_348[iVar2];
*(byte *)((int)&local_358 + iVar2) =
local_10c[iVar2] ^ *(byte *)((int)local_118 + iVar2);
*(byte *)((int)&local_350 + iVar2) =
local_10c[iVar2] ^ *(byte *)((int)&local_11c + iVar2);
iVar2 = iVar2 + 1;
} while (iVar2 < 4);
if (local_358 == local_34c) {
_xx_secret_key = local_118[0];
}
else {
if (local_350 != local_34c) {
FUN_004273e0();
return;
}
_xx_secret_key = local_11c;
}
Luckily in this case, everything we need is right here in this function - there
are no further “unknowns” which need to be found, and so we can copy it over to
a .c
file, compile it, and run it to (hopefully) get the secret key.
Stack smashing
The first issue I hit is that the read()
is reading 572 bytes, but the buffer
it’s reading into is only 552 bytes (138*4-byte words) long. This must be a quirk of decompilation.
In the original code, there was a buffer of 572 bytes, but the decompiler has
split that into four separate local variables (I think because the later code
accesses them separately):
undefined4 local_344 [138];
undefined4 local_11c;
undefined4 local_118 [3];
byte local_10c [4];
The read()
into local_344
was therefore overrunning the buffer, which was
triggering gcc
’s stack smashing protector and crashing the program. Once I
noticed what was going on, I fixed up the decompiler’s mistake and reworded the
declarations a little by making local_344
larger and setting up the other
variables to alias it:
undefined4 local_344[143];
#define local_11c local_344[138]
undefined4 *local_118 = &local_344[139];
byte *local_10c = (byte *)&local_344[142];
Pointer size
Also, because I’m running on a 64-bit machine, I did run into a problem with
the casts to (int)
for the various pointers:
((int)&local_11c + iVar2)
This assumes that a pointer is the same size as an int
, which isn’t true (in
general) for 64-bit machines where int
is 4 bytes and pointers are 8. This
can be quickly fixed by either compiling in 32-bit mode, or replacing the casts
to int
with casts to uintptr_t
which is guaranteed to be large enough for a
pointer. Without this, the 64-bit pointers get converted to 32-bit int
s,
dropping the top 32-bits and causing the program to crash.
Result
What’s slightly weird, is that it tries two options for the secret key, either
taking local_118[0]
or local_11c
depending on which of local_358
or
local_350
matches local_34c = 0x48545054
. I don’t know why it does this.
If neither match, then it throws an exception in FUN_004273e0
.
if (local_358 == local_34c) {
_xx_secret_key = local_118[0];
}
else {
if (local_350 != local_34c) {
FUN_004273e0();
return;
}
_xx_secret_key = local_11c;
}
Running the fixed-up code gives us a value for xx_secret_key: 0x52fc9285
. There’s no
way to tell if it’s correct (yet), but the fact that it didn’t hit the
exception path is a good indicator.
This first part of the function is only working on a small section of the data read from the file. Further down, the rest of it is processed - “decrypting” it by XOR-ing with the key and position:
do {
uVar4 = uVar3 & 0x80000003;
if ((int)uVar4 < 0) {
uVar4 = (uVar4 - 1 | 0xfffffffc) + 1;
}
*(byte *)((int)local_344 + uVar3) =
*(byte *)((int)local_344 + uVar3) ^
(&xx_secret_key)[uVar4] ^
(byte)uVar3;
uVar3 = uVar3 + 1;
} while (uVar3 < 0x238);
But what is this extra data? Initially, I thought it would be part of a longer key used to decrypt some stuff later, but if we look at the file data before and after decryption, it becomes obvious that’s not the case.
Before decryption, it’s just noise (with some structure):
0x000000: b5 a7 ba 68 81 97 fa 55 bd aa ce 61 89 9f f2 5d ...h...U...a...]
0x000010: a4 83 ee 41 a1 b3 ae 7c 9d 8b e6 49 a8 be da 75 ...A...|...I...u
0x000020: a5 b3 de 71 90 b7 da 75 fb 8a f8 48 87 8f e3 7d ...q...u...H...}
0x000030: b5 a3 ce 61 b1 a7 ca 65 bd ab c6 69 b9 af c2 6d ...a...e...i...m
0x000040: c5 d3 be 11 c1 d7 ba 15 cd db b6 19 c9 df b2 1d ................
0x000050: d5 c3 ae 01 d1 c7 aa 05 dd cb a6 09 d9 cf a2 0d ................
0x000060: e5 f3 9e 31 e1 f7 9a 35 ed fb 96 39 e9 ff 92 3d ...1...5...9...=
0x000070: f5 e3 8e 21 f1 e7 8a 25 fd eb 86 29 f9 ef 82 2d ...!...%...)...-
0x000080: 05 13 7e d1 01 17 7a d5 0d 1b 76 d9 09 1f 72 dd ..~...z...v...r.
0x000090: 15 03 6e c1 11 07 6a c5 1d 0b 66 c9 19 0f 62 cd ..n...j...f...b.
0x0000a0: 25 33 5e f1 21 37 5a f5 66 79 76 ac 59 58 20 9c %3^.!7Z.fyv.YX .
0x0000b0: 51 46 4e e1 31 27 4a e5 3d 2b 46 e9 39 2f 42 ed QFN.1'J.=+F.9/B.
0x0000c0: 45 53 3e 91 41 57 3a 95 4d 5b 36 99 49 5f 32 9d ES>.AW:.M[6.I_2.
0x0000d0: 55 43 2e 81 51 47 2a 85 5d 4b 26 89 59 4f 22 8d UC..QG*.]K&.YO".
0x0000e0: 65 73 1e b1 61 77 1a b5 6d 7b 16 b9 69 7f 12 bd es..aw..m{..i...
0x0000f0: 75 63 0e a1 71 67 0a a5 7d 6b 06 a9 79 6f 02 ad uc..qg..}k..yo..
0x000100: 85 93 fe 51 81 97 fa 55 8d 9b f6 59 89 9f f2 5d ...Q...U...Y...]
0x000110: 95 83 ee 41 91 87 ea 45 9d 8b e6 49 99 8f e2 4d ...A...E...I...M
0x000120: a5 b3 de 71 a1 b7 da 75 e4 fa 86 59 ff da a0 0e ...q...u...Y....
0x000130: dc cc a0 41 e7 96 e4 55 93 9b c6 69 b9 af c2 6d ...A...U...i...m
0x000140: c5 d3 be 11 c1 d7 ba 15 cd db b6 19 c9 df b2 1d ................
0x000150: d5 c3 ae 01 d1 c7 aa 05 dd cb a6 09 d9 cf a2 0d ................
0x000160: e5 f3 9e 31 e1 f7 9a 35 ed fb 96 39 e9 ff 92 3d ...1...5...9...=
0x000170: f5 e3 8e 21 f1 e7 8a 25 fd eb 86 29 f9 ef 82 2d ...!...%...)...-
0x000180: 05 13 7e d1 01 17 7a d5 0d 1b 76 d9 09 1f 72 dd ..~...z...v...r.
0x000190: 15 03 6e c1 11 07 6a c5 1d 0b 66 c9 19 0f 62 cd ..n...j...f...b.
0x0001a0: 25 33 5e f1 21 37 5a f5 6c 75 05 b0 09 0e 62 c5 %3^.!7Z.lu....b.
0x0001b0: 15 68 2b 98 42 27 4a e5 3d 2b 46 e9 39 2f 42 ed .h+.B'J.=+F.9/B.
0x0001c0: 45 53 3e 91 41 57 3a 95 4d 5b 36 99 49 5f 32 9d ES>.AW:.M[6.I_2.
0x0001d0: 55 43 2e 81 51 47 2a 85 5d 4b 26 89 59 4f 22 8d UC..QG*.]K&.YO".
0x0001e0: 65 73 1e b1 61 77 1a b5 6d 7b 16 b9 69 7f 12 bd es..aw..m{..i...
0x0001f0: 75 63 0e a1 71 67 0a a5 7d 6b 06 a9 79 6f 02 ad uc..qg..}k..yo..
0x000200: 85 93 fe 51 81 97 fa 55 8d 9b f6 59 89 9f f2 5d ...Q...U...Y...]
0x000210: 95 83 ee 41 91 87 ea 45 9d 8b e6 49 99 8f e2 4d ...A...E...I...M
0x000220: a5 b3 de 71 a1 b7 da 75 55 2b 15 ba d6 e0 a5 15 ...q...uU+......
0x000230: b5 a3 ce 61 b0 a7 ca 65 d1 c2 a8 1a ...a...e....
However, after the decrypt:
0x000000: 30 34 44 39 00 00 00 00 30 31 38 38 00 00 00 00 04D9....0188....
0x000010: 31 00 00 00 30 34 44 39 00 00 00 00 31 31 38 38 1...04D9....1188
0x000020: 00 00 00 00 31 00 00 00 56 31 2e 31 2e 30 31 00 ....1...V1.1.01.
0x000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000a0: 00 00 00 00 00 00 00 00 4b 42 20 55 70 67 72 61 ........KB Upgra
0x0000b0: 64 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 de..............
0x0000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000120: 00 00 00 00 00 00 00 00 49 41 50 20 56 65 72 73 ........IAP Vers
0x000130: 69 6f 6e 20 56 31 2e 30 2e 30 00 00 00 00 00 00 ion V1.0.0......
0x000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0001a0: 00 00 00 00 00 00 00 00 41 4e 53 49 20 31 30 38 ........ANSI 108
0x0001b0: 20 4b 65 79 73 00 00 00 00 00 00 00 00 00 00 00 Keys...........
0x0001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x000220: 00 00 00 00 00 00 00 00 ab e2 9a 84 2c 2d 2e 2f ............,-./
It’s nice uniform data containing readable strings! They show the version of the firmware, the type of keyboard (108 keys) and other details which are shown on the main screen of the firmware updater program:
So, the chunk of data we read from the file contains both the “key” and the firmware information.
The fact that the decryption worked (turned the noise into readable strings) means that we can be confident the secret key is correct! Success!
That’s probably enough for Part 1. In Part 2 we’ll try using the
xx_secret_key
to start unpicking some more parts of the puzzle!
-
Of course, someone else has already done this, but I don’t think they shared any details: https://reverseengineering.stackexchange.com/q/13223 ↩︎
-
Holtek are the manufacturer of the Arm Cortex-M3 microcontroller in the keyboard ↩︎