## Vulnerability Disclosure: Fusée Gelée This report documents Fusée Gelée, a coldboot vulnerability that allows full, unauthenticated arbitrary code execution from an early bootROM context via Tegra Recovery Mode (RCM) on NVIDIA's Tegra line of embedded processors. As this vulnerability allows arbitrary code execution on the Boot and Power Management Processor (BPMP) before any lock-outs take effect, this vulnerability compromises the entire root-of-trust for each processor, and allows exfiltration of secrets e.g. burned into device fuses. Quick vitals:                     | | --------------------|-------------------------------------------------------- *Reporter:* | Katherine Temkin (@ktemkin) *Affiliation:* | ReSwitched (https://reswitched.tech) *E-mail:* | k@ktemkin.com *Affects:* | Tegra SoCs, independent of software stack *Versions:* | believed to affect Tegra SoCs released prior to the T186 / X2 *Impact:* | early bootROM code execution with no software requirements, which can lead to full compromise of on-device secrets where USB access is possible *Disclosure* | public disclosure planned for June 15th, 2018 #### Vulnerability Summary The USB software stack provided inside the boot instruction rom (IROM/bootROM) contains a copy operation whose length can be controlled by an attacker. By carefully constructing a USB control request, an attacker can leverage this vulnerability to copy the contents of an attacker-controlled buffer over the active execution stack, gaining control of the Boot and Power Management processor (BPMP) before any lock-outs or privilege reductions occur. This execution can then be used to exfiltrate secrets and to load arbitrary code onto the main CPU Complex (CCPLEX) "application processors" at the highest possible level of privilege (typically as the TrustZone Secure Monitor at PL3/EL3). #### Public Disclosure Notice This vulnerability is notable due to the significant number and variety of devices affected, the severity of the issue, and the immutability of the relevant code on devices already delivered to end users. This vulnerability report is provided as a courtesy to help aid remediation efforts, guide communication, and minimize impact to users. As other groups appear to have this or an equivalent exploit-- [including a group who claims they will be selling access to an implementation of such an exploit](http://team-xecuter.com/team-xecuter-coming-to-your-nintendo-switch-console/)-- it is the author and the ReSwitched team's belief that prompt public disclosure best serves the public interest. By minimizing the information asymmetry between the general public and exploit-holders and notifying the public, users will be able to best assess how this vulnerability impacts their personal threat models. Accordingly, ReSwitched anticipates public disclosure of this vulnerability: * If another group releases an implementation of the identified vulnerability; or * On June 15th, 2018, whichever comes first. ### Vulnerability Details The core of the Tegra boot process is approximated by the following block of pseudo-code, as obtained by reverse-engineering an IROM extracted from a vulnerable T210 system: ```C // If this is a warmboot (from "sleep"), restore the saved state from RAM. if (read_scratch0_bit(1)) { restore_warmboot_image(&load_addr); } // Otherwise, bootstrap the processor. else { // Allow recovery mode to be forced by a PMC scratch bit or physical straps. force_recovery = check_for_rcm_straps() || read_scratch0_bit(2); // Determine whether to use USB2 or USB3 for RCM. determine_rcm_usb_version(&usb_version); usb_ops = set_up_usb_ops(usb_version); usb_ops->initialize(); // If we're not forcing recovery, attempt to load an image from boot media. if (!force_recovery) { // If we succeeded, don't fall back into recovery mode. if (read_boot_configuration_and_images(&load_addr) == SUCCESS) { goto boot_complete; } } // In all other conditions if (read_boot_images_via_usb_rcm(, &load_addr) != SUCCESS) { /* load address is poisoned here */ } } boot_complete: /* apply lock-outs, and boot the program at address load_address */ ``` Tegra processors include a USB Recovery Mode (RCM), which we can observe to be activated under a number of conditions: * If the processor fails to find a valid Boot Control Table (BCT) + bootloader on its boot media; * If processor straps are pulled to a particular value e.g. by holding a button combination; or * If the processor is rebooted after a particular value is written into a power management controller scratch register. USB recovery mode is present in all devices, including devices that have been production secured. To ensure that USB recovery mode does not allow unauthenticated communications, RCM requires all recovery commands be signed using either RSA or via AES-CMAC. The bootloader's implementation of the Tegra RCM protocol is simple, and exists to allow loading a small piece of code (called the *miniloader* or *applet*) into the bootloader's local Instruction RAM (IRAM). In a typical application, this *applet* is `nvtboot-recovery`, a stub which allows further USB communications to bootstrap a system or to allow system provisioning. The RCM process is approximated by the following pseudo-code, again obtained via reverse engineering a dumped IROM from a T210: ```C // Significantly simplified for clarity, with error checking omitted where unimportant. while (1) { // Repeatedly handle USB standard events on the control endpoint EP0. usb_ops->handle_control_requests(current_dma_buffer); // Try to send the device ID over the main USB data pipe until we succeed. if ( rcm_send_device_id() == USB_NOT_CONFIGURED ) { usb_initialized = 0; } // Once we've made a USB connection, accept RCM commands on EP1. else { usb_initialized = 1; // Read a full RCM command and any associated payload into a global buffer. // (Error checking omitted for brevity.) rcm_read_command_and_payload(); // Validate the received RCM command; e.g. by checking for signatures // in RSA or AES_CMAC mode, or by trivially succeeding if we're not in // a secure mode. rc = rcm_validate_command(); if (rc != VALIDATION_PASS) { return rc; } // Handle the received and validated command. // For a "load miniloader" command, this sanity checks the (validated) // miniloader image and takes steps to prevent re-use of signed data not // intended to be used as an RCM command. rcm_handle_command_complete(...); } } ``` It is important to note that a full RCM command *and its associated payload* are read into 1) a global buffer, and 2) the target load address, respectively, before any signature checking is done. This effectively grants the attacker a narrow window in which they control a large region of unvalidated memory. The largest vulnerability surface area occurs in the `rcm_read_command_and_payload` function, which accepts the RCM command and payload packets via a USB bulk endpoint. For our purposes, this endpoint is essentially a simple pipe for conveyance of blocks of binary data separate from standard USB communications. The `rcm_read_command_and_payload` function actually contains several issues-- of which exactly one is known to be exploitable: ```C uint32_t total_rxd = 0; uint32_t total_to_rx = 0x400; // Loop until we've received our full command and payload. while (total_rxd < total_to_rx) { // Switch between two DMA buffers, so the USB is never DMA'ing into the same // buffer that we're processing. active_buffer = next_buffer; next_buffer = switch_dma_buffers(); // Start a USB DMA transaction on the RCM bulk endpoint, which will hopefully // receive data from the host in the background as we copy. usb_ops->start_nonblocking_bulk_read(active_buffer, 0x1000); // If we're in the first 680-bytes we're receiving, this is part of the RCM // command, and we should read it into the command buffer. if ( total_rxd < 680 ) { /* copy data from the DMA buffer into the RCM command buffer until we've read a full 680-byte RCM command */ // Once we've received the first four bytes of the RCM command, // use that to figure out how much data should be received. if ( total_rxd >= 4 ) { // validate: // -- the command won't exceed our total RAM // (680 here, 0x30000 in upper IRAM) // -- the command is >= 0x400 bytes // -- the size ends in 8 if ( rcm_command_buffer[0] >= 0x302A8u || rcm_command_buffer[0] < 0x400u || (rcm_command_buffer[0] & 0xF) != 8 ) { return ERROR_INVALID_SIZE; } else { left_to_rx = *((uint32_t *)rcm_command_buffer); } } } /* copy any data _past_ the command into a separate payload buffer at 0x40010000 */ /* -code omitted for brevity - */ // Wait for the DMA transaction to complete. // [This is, again, simplified to convey concepts.] while(!usb_ops->bulk_read_complete()) { // While we're blocking, it's still important that we respond to standard // USB packets on the control endpoint, so do that here. usb_ops->handle_control_requests(next_buffer); } } ``` Astute readers will notice an issue unrelated to the Fusée Gelée exploit: this code fails to properly ensure DMA buffers are being used exclusively for a single operation. This results in an interesting race condition in which a DMA buffer can be simultaneously used to handle a control request and a RCM bulk transfer. This can break the flow of RCM, but as both operations contain untrusted data, this issue poses no security risk. To find the actual vulnerability, we must delve deeper, into the code that handles standard USB control requests. The core of this code is responsible for responding to USB control requests. A *control request* is initiated when the host sends a setup packet, of the following form: Field |         Size    | Description ----------|:----:|----- direction | 1b | if '1', the device should respond with data type | 2b | specifies whether this request is of a standard type or not recipient | 5b | encodes the context in which this request should be considered;
for example, is this about a `DEVICE` or about an `ENDPOINT`? request | 8b | specifies the request number value | 16b | argument to the request index | 16b | argument to the request length | 16b | specifies the maximum amount of data to be transferred As an example, the host can request the status of a device by issuing a `GET_STATUS` request, at which point the device would be expected to respond with a short setup packet. Of particular note is the `length` field of the request, which should *limit* -- but not exclusively determine-- the *maximum* amount of data that should be included in the response. Per the specification, the device should respond with either the *amount of data specified* or the *amount of data available*, whichever is less. The bootloader's implementation of this behavior is conceptually implemented as follows: ```C // Temporary, automatic variables, located on the stack. uint16_t status; void *data_to_tx; // The amount of data available to transmit. uint16_t size_to_tx = 0; // The amount of data the USB host requested. uint16_t length_read = setup_packet.length; /* Lots of handler cases have omitted for brevity. */ // Handle GET_STATUS requests. if (setup_packet.request == REQUEST_GET_STATUS) { // If this is asking for the DEVICE's status, respond accordingly. if(setup_packet.recipient == RECIPIENT_DEVICE) { status = get_usb_device_status(); size_to_tx = sizeof(status); } // Otherwise, respond with the ENDPOINT status. else if (setup_packet.recipient == RECIPIENT_ENDPOINT){ status = get_usb_endpoint_status(setup_packet.index); size_to_tx = length_read; // <-- This is a critical error! } else { /* ... */ } // Send the status value, which we'll copy from the stack variable 'status'. data_to_tx = &status; } // Copy the data we have into our DMA buffer for transmission. // For a GET_STATUS request, this copies data from the stack into our DMA buffer. memcpy(dma_buffer, data_to_tx, size_to_tx); // If the host requested less data than we have, only send the amount requested. // This effectively selects min(size_to_tx, length_read). if (length_read < size_to_tx) { size_to_tx = length_read; } // Transmit the response we've constructed back to the host. respond_to_control_request(dma_buffer, length_to_send); ``` In most cases, the handler correctly limits the length of the transmitted responses to the amount it has available, per the USB specification. However, in a few notable cases, the length is *incorrectly always set to the amount requested* by the host: * When issuing a `GET_CONFIGURATION` request with a `DEVICE` recipient. * When issuing a `GET_INTERFACE` request with a `INTERFACE` recipient. * When issuing a `GET_STATUS` request with a `ENDPOINT` recipient. This is a critical security error, as the host can request up to 65,535 bytes per control request. In cases where this is loaded directly into `size_to_tx`, this value directly sets the extent of the `memcpy` that follows-- and thus can copy up to 65,535 bytes into the currently selected `dma_buffer`. As the DMA buffers used for the USB stack are each comparatively short, this can result in a _very_ significant buffer overflow. To validate that the vulnerability is present on a given device, one can try issuing an oversized request and watch as the device responds. Pictured below is the response generated when sending a oversized `GET_STATUS` control request with an `ENDPOINT` recipient to a T124: ![Reading a chunk of stack memory from a K1](stack_read.png) A compliant device should generate a two-byte response to a `GET_STATUS` request-- but the affected Tegra responds with significantly longer response. This is a clear indication that we've run into the vulnerability described above. To really understand the impact of this vulnerability, it helps to understand the memory layout used by the bootROM. For our proof-of-concept, we'll consider the layout used by the T210 variant of the affected bootROM: ![Bootrom memory layout](mem_layout.png) The major memory regions relevant to this vulnerability are as follows: * The bootROM's *execution stack* grows downward from `0x40010000`; so the execution stack is located in the memory *immediately preceding* that address. * The DMA buffers used for USB are located at `0x40005000` and `0x40009000`, respectively. Because the USB stack alternates between these two buffers once per USB transfer, the host effectively can control which DMA buffer is in use by sending USB transfers. * Once the bootloader's RCM code receives a 680-byte command, it begins to store received data in a section of upper IRAM located at address `0x40010000`, and can store up to `0x30000` bytes of payload. This address is notable, as it is immediately past the end of the active execution stack. Of particular note is the adjacency of the bootROM's *execution stack* and the attacker-controlled *RCM payload*. Consider the behavior of the previous pseudo-code segment on receipt of a `GET_STATUS` request to the `ENDPOINT` with an excessive length. The resulting memcpy: * copies *up to* 65,535 bytes total; * sources data from a region *starting at the status variable on the stack* and extending significantly past the stack -- effectively copying mostly *from the attacker-controllable RCM payload buffer* * targets a buffer starting either `0x40005000` or `0x40009000`, at the attacker's discretion, reaching addresses of up to `0x40014fff` or `0x40018fff` This is a powerful copy primitive, as it copies *from attacker controlled memory* and into a region that *includes the entire execution stack*: ![Effect of the vulnerability memcpy](copy_span.png) This would be a powerful exploit on any platform; but this is a particularly devastating attack in the bootROM environment, which does not: * Use common attack mitigations such as stack canaries, ostensibly to reduce complexity and save limited IRAM and IROM space. * Apply memory protections, so the entire stack and all attacker controlled buffers can be read from, written to, and executed from. * Employ typical 'application-processor' mitigation strategies such as ASLR. Accordingly, we now have: 1. The capability to load arbitrary payloads into memory via RCM, as RCM only validates command signatures once payload receipt is complete. 2. The ability to copy attacker-controlled values over the execution stack, overwriting return addresses and redirecting execution to a location of our choice. Together, these two abilities give us a full arbitrary-code execution exploit at a critical point in the Tegra's start-up process. As control flow is hijacked before return from `read_boot_images_via_usb_rcm`, none of the "lock-out" operations that precede normal startup are executed. This means, for example, that the T210 fuses-- and the keydata stored within them-- are accessible from the attack payload, and the bootROM is not yet protected. #### Exploit Execution The Fusée Launcher PoC exploits the vulnerability described on the T210 via a careful sequence of interactions: 1. The device is started in RCM mode. Device specifics will differ, but this is often via a key-combination held on startup. 2. A host computer is allowed to enumerate the RCM device normally. 3. The host reads the RCM device's ID by reading 16 bytes from the EP1 IN. 4. The host builds an exploit payload, which is comprised of: 1. An RCM command that includes a maximum length, ensuring that we can send as much payload as possible without completing receipt of the RCM payload. Only the length of this command is used prior to validation; so we can submit an RCM command that starts with a maximum length of 0x30298, but which fills the remaining 676 bytes of the RCM command with any value. 2. A set of values with which to overwrite the stack. As stack return address locations vary across the series, it's recommended that a large block composed of a single entry-point address be repeated a significant number of times, so one can effectively replace the entire stack with that address. 3. The program to be executed ("final payload") is appended, ensuring that its position in the binary matches the entry-point from the previous step. 4. The payload is padded to be evenly divisible by the 0x1000 block size to ensure the active block is not overwritten by the "DMA dual-use" bug described above. 5. The exploit payload is sent to the device over EP1 OUT, tracking the number of 0x1000-byte "blocks" that have been sent to the device. If this number is _even_, the next write will be issued to the lower DMA buffer (`0x40005000`); otherwise, it will be issued to the upper DMA buffer (`0x40009000`). 6. If the next write would target the lower DMA buffer, issue another write of a full 0x1000 bytes to move the target to the upper DMA buffer, reducing the total amount of data to be copied. 7. Trigger the vulnerable memcpy by sending a `GET_STATUS` `IN` control request with an `ENDPOINT` recipient, and a length long enough to smash the desired stack region, and preferably not longer than required. A simple host program that triggers this vulnerability is included with this report: see `fusee-launcher.py`. Note the restrictions on its function in the following section. ### Proof of Concept Included with this report is a set of three files: * `fusee-launcher.py` -- The main proof-of-concept accompanying this report. This python script is designed to launch a simple binary payload in the described bootROM context via the exploit. * `intermezzo.bin` -- This small stub is designed to relocate a payload from a higher load address to the standard RCM load address of `0x40010000`. This allows standard RCM payloads (such as `nvtboot-recover.bin`) to be executed. * `fusee.bin` -- An example payload for the Nintendo Switch, a representative and well-secured device based on a T210. This payload will print information from the device's fuses and protected IROM to the display, demonstrating that early bootROM execution has been achieved. **Support note:** Many host-OS driver stacks are reluctant to issue unreasonably large control requests. Accordingly, the current proof-of-concept includes code designed to work in the following environments: * **64-bit linux via `xhci_hcd`**. The proof-of-concept can manually submit large control requests, but does not work with the common `ehci_hcd` drivers due to driver limitations. A rough rule of thumb is that a connection via a blue / USB3 SuperSpeed port will almost always be handled by `xhci_hcd`. * **macOS**. The exploit works out of the box with no surprises or restrictions on modern macOS. Windows support would require addition of a custom kernel module, and thus was beyond the scope of a simple proof-of-concept. To use this proof-of-concept on a Nintendo Switch: 1. Set up an Linux or macOS environment that meets the criteira above, and which has a working `python3` and `pyusb` installed. 2. Connect the Switch to your host PC with a USB A -> USB C cable. 3. Boot the Switch in RCM mode. There are three ways to do this, but the first-- unseating its eMMC board-- is likely the most straightforward: 1. Ensure the Switch cannot boot off its eMMC. The most straightforward way to to this is to open the back cover and remove the socketed eMMC board; corrupting the BCT or bootloader on the eMMC boot partition would also work. 2. Trigger the RCM straps. Hold VOL_UP and short pin 10 on the right JoyCon connector to ground while engaging the power button. 3. Set bit 2 of PMC scratch register zero. On modern firmwares, this requires EL3 or pre-sleep BPMP execution. 4. Run the `fusee-launcher.py` with an argument of `fusee.bin`. (This requires `intermezzo.bin` to be located in the same folder as `fusee-launcher.py`.) ``` sudo python3 ./fusee-launcher.py fusee.bin ``` If everything functions correctly, your Switch should be displaying a collection of fuse and protected-IROM information: ![exploit working](switch_hax.jpg) ### Recommended Mitigations In this case, the recommended mitigation is to correct the USB control request handler such that it always correctly constrains the length to be transmitted. This has to be handled according to the type of device: * **For a device already in consumer hands**, no solution is proposed. Unfortunately, access to the fuses needed to configure the device's ipatches was blocked when the ODM_PRODUCTION fuse was burned, so no bootROM update is possible. It is suggested that consumers be made aware of the situation so they can move to other devices, where possible. * **For new devices**, the correct solution is likely to introduce an new ipatch or new ipatches that limits the size of control request responses. It seems likely that OEMs producing T210-based devices may move to T214 solutions; it is the hope of the author that the T214's bootROM shares immunity with the T186. If not, patching the above is a recommended modification to the mask ROM and/or ipatches of the T214, as well.