USB Basics
How is a key press processed and displayed?
Modern computers are built upon many layers of abstractions. Most programs people encouter do not need to worry about hardware details as everything as been standardized and abstracted away. Let’s break down the abstractions and understand how the a program like GNOME Terminal reads a keypress from the user on Linux.
Physical Layer
Assuming a standard USB keyboard, pressing down on a key will connect a GPIO (General Purpose Input/Output) pin inside of the keyboard microcontroller to ground or VCC. In most microcontrollers, this input pin will be connected to a Schmitt trigger to provide a strong signal to some data register. The keyboard microcontroller will then either trigger an interrupt, which causes the keyboard’s firmware to jump to an interrupt handler, or the keyboard firmware will periodically poll the state of the keys a few hundred or a few thousand times a second. In either case, the keyboard firmware now has a snapshot of the state of the keyboard. Now, the keyboard firmware simply needs to somehow communicate this keyboard state to the computer through USB.
USB Layer
USB is a host-controller bus which means that there can only be one host in the entire bus. In modern amd64 computers, the USB host device is either integrated directly with the CPU or is a separate component located on the motherboard (PCH) and connected over a PCI or DMI bus to the CPU. The USB host device handles the lower levels of the USB stack including physical signaling, polling, USB device packets, and USB transfers. This is usually done in hardware as part of implementing the xHCI standard. The operating system then only needs to worry about the higher levels of the USB stack - namely the USB transfer layer.
Before discussing the USB transfer layer, we first need to discuss the USB transaction layer. A USB transaction consists of a stream of 3 main types of packets: Token, Data, and Handshake packets. A token packet contains the address of the device that the USB host is communicating with as well as the endpoint and read/write direction. The data packet contains variable length data depending on the device and drivers as well as a 16 bit CRC. Up to 1024 bytes of data can be sent in a single data packet (USB high-speed). Finally, a handshake packet is used for signaling the result of a transaction and it contains either an ACK (success), NAK (retransmit request), or STALL (unretryable failure).
These packets are used in the USB transaction layer. There are three main types of USB transactions: OUT, IN, and SETUP. All transactions must begin with a packet sent by the host. USB devices are never allowed to initiate any transactions and can only respond to the host.
An OUT transaction is used to write data to device X on endpoint Y (data leaves the host). It consists of three packets:
sequenceDiagram Host->>Device: Token Packet with type "OUT" Note over Host,Device: Device at address X and endpoint Y begins listening Host->>Device: Data packet Note over Host,Device: 0–1,024 bytes long + checksum Device->>Host: Handshake Packet (ACK, NAK, or STALL)
An IN transaction is used to signal to a device at address X to write the data stored in endpoint Y to the host. It is very similar to the above OUT transaction except the data flows from the device into the host.
sequenceDiagram Host->>Device: Token Packet with type "IN" Note over Host,Device: Device at address X begins sending data from endpoint Y Device->>Host: Data packet Note over Device,Host: 0–1,024 bytes long + checksum Host->>Device: Handshake Packet (ACK, NAK, or STALL)
Finally, the SETUP transaction is used by the host to gather information from a newly attached device and begin device initialization. The host device writes a setup byte to the control endpoint (endpoint 0) of a new device. These setup transactions are used for setting the address of the USB device, checking the status of a particular endpoint, or gathering information about the device. Since the device has not yet been assigned an address, newly attached USB devices will not listen for USB packets until the host sends a USB reset command. The host will only reset a single device at a time so only one device will be listening on the default address 0. The host will send multiple SETUP transactions in the process of initializing a new device. There are again three stages to the transaction:
sequenceDiagram Host->>Device: Token Packet with type "SETUP" Note over Host,Device: Device at address X enters setup mode Host->>Device: SETUP packet Note over Host,Device: One byte setup packet Device->>Host: Handshake Packet (ACK, NAK, or STALL)
These USB transactions are used in the USB transfer layer. These transfers are The USB protocl allows for four possible types of USB transfers.
To setup a device, the USB host sends a series of transactions. It first performs a SETUP transaction, followed by an optional IN transaction as the device responds with any requested data, and lastly with a dummy transaction used to signal the end of the setup process.
The USB bus consists of three different types of devices. The Host device is the leader of all USB communications on the bus and will initialize new devices. The Host is the only USB device that can initiate communication with any other USB device. USB Hubs are simple repeaters that will repeat packets from connected downstream USB devices upwards towards the root device and vice-versa. The USB Devices are the actual things most people interact with such as keyboards, mice, or flash drives.
When a new keyboard (or any new USB device) is plugged into a USB port, the host will begin initializing it. The host is constantly polling for new devices (at around 1000 Hz.) and can detect when a new device was added to the bus. Once the host detects a new device, the host will signal a USB reset command to the new device. The newly attached device will reset to a default state with USB address 0. The host can then initialize the new device by sending USB packets to address 0. Since the host is the only device that can initiate communication, it is guaranteed that the there will only be one device at a time with address 0 and no address conflicts will occur.
If the device was attached to a USB hub, then the hub itself actually handles initial setup. The hub will then flag that a new device was connected. Once the host reads the hub endpoint and is notified that a new device was attached, the host proceeds to reset the device and assign the a unique address. In all other USB transactions, hubs act as transparent repeaters and can be ignored for our purposes.
While assigning the new address, the host will also determine what device was actually attached.
The USB host will send a Get_Device_Descriptor
transaction which tells the host what kind of
USB class it is, the device protocol, the vendor, and product ID’s. The host will additionally
enumerate the interface descriptors which contains information about the supported USB endpoints,
and the class of USB endpoint. For a USB keyboard, the device class is USB HID and will contain
a HID interface endpoint. The host will use the above data to determine what driver to load to use
the device.
Linux HID Driver Initialization
Everything before this point is usually implemented in silicon. The operating system does not need to
handle these low-level details. However, now that the new USB device has been initialized on the USB
bus, a device driver must be loaded and used to initialize the device. During system initialization,
each USB device driver creates a struct usb_driver
defined in include/linux/usb.h
.
struct usb_driver {
const char *name;
int (*probe) (struct usb_interface *intf,
const struct usb_device_id *id);
void (*disconnect) (struct usb_interface *intf);
int (*unlocked_ioctl) (struct usb_interface *intf, unsigned int code,
void *buf);
int (*suspend) (struct usb_interface *intf, pm_message_t message);
int (*resume) (struct usb_interface *intf);
int (*reset_resume)(struct usb_interface *intf);
int (*pre_reset)(struct usb_interface *intf);
int (*post_reset)(struct usb_interface *intf);
const struct usb_device_id *id_table;
const struct attribute_group **dev_groups;
struct usb_dynids dynids;
struct usbdrv_wrap drvwrap;
unsigned int no_dynamic_id:1;
unsigned int supports_autosuspend:1;
unsigned int disable_hub_initiated_lpm:1;
unsigned int soft_unbind:1;
};
The generic USB HID device driver located in drivers/hid/usbhid/hid-core.c
creates an instance of this
usb_driver
struct which supports all USB interfaces of class USB_INTERFACE_CLASS_HID
.
This includes mice, joysticks, and keyboards. When the Linux kernel is looking for the correct USB
driver to load, it calls the probe
function callback to get more information about the driver.
It then picks the most specific driver for the attached device. This probe function is also a place
for the driver to begin creating any required structures, storing the endpoint information for later use,
or any other initialization. The driver will then connect to the device, initialize the device, and
open the device. In the case of the USB HID driver, the hid-core driver encapsulates the USB I/O and
USB-HID transport protocol. The raw HID events are then exposed as a special HID device. This way the
same generic keyboard driver can be used even if it’s a USB, Bluetooth, or PS/2 keyboard. For a USB
keyboard, Linux exposes this raw HID device under /dev/usb/hiddevX
where X is a number that changes
depending on the number of HID devices.
Each time the USB host scans the USB keyboard, it will be able to read the current state of the keys.
This triggers an interrupt in the USB driver that than passes it to the USB HID driver. The USB HID
driver will than map the raw USB HID events to Linux specific HID events. These events are than exposed
as raw HID events to userspace under /dev/hid/
.
Input Drivers
These HID events needs to be translated into actual characters. The HID core driver discussed above is generic over many different HID devices and provides a framework for various HID transport and HID device protocols including touchscreens, backlights, mice, or even drawing tablets. These HID devices can be accessed over USB, I2C, or Bluetooth.
This conversion is handled by a input
driver. These input drivers are loaded by the kernel and create special devices in /dev/input/
that
can be interacted with using the read()
, write()
, open()
, close()
, etc. syscalls. In the case
of a keyboard, the input driver handles the conversion between HID events and keycodes. These drivers
usually implement the evdev
interface so that usermode programs can interact with these input files
through the libevdev
library. For a console only system, the kernel forwards these events to the
appropriate pseudo-tty. The process that is currently running the ptty can then receive these
key presses by reading the stdin file by default.
For graphical systems such as X, libevdev
is used to read the input device file located in /dev/input/
.
This way, X Server is responsible for all inputs on the system. Since all X client programs get input
and draw to the screen through the X windows API, X server is able to control which program gets which
input. However, if a program is running as root, it is still possible to bypass the X server and just
read the input device directly.