warren.hu

USB Basics

How is a key press processed and displayed?

Modern computers are built upon many layers of abstractions. Most programs people encouter do not need to worry about hardware details as everything as been standardized and abstracted away. Let’s break down the abstractions and understand how the a program like GNOME Terminal reads a keypress from the user on Linux.

Physical Layer

Assuming a standard USB keyboard, pressing down on a key will connect a GPIO (General Purpose Input/Output) pin inside of the keyboard microcontroller to ground or VCC. In most microcontrollers, this input pin will be connected to a Schmitt trigger to provide a strong signal to some data register. The keyboard microcontroller will then either trigger an interrupt, which causes the keyboard’s firmware to jump to an interrupt handler, or the keyboard firmware will periodically poll the state of the keys a few hundred or a few thousand times a second. In either case, the keyboard firmware now has a snapshot of the state of the keyboard. Now, the keyboard firmware simply needs to somehow communicate this keyboard state to the computer through USB.

USB Layer

USB is a host-controller bus which means that there can only be one host in the entire bus. In modern amd64 computers, the USB host device is either integrated directly with the CPU or is a separate component located on the motherboard (PCH) and connected over a PCI or DMI bus to the CPU. The USB host device handles the lower levels of the USB stack including physical signaling, polling, USB device packets, and USB transfers. This is usually done in hardware as part of implementing the xHCI standard. The operating system then only needs to worry about the higher levels of the USB stack - namely the USB transfer layer.

Before discussing the USB transfer layer, we first need to discuss the USB transaction layer. A USB transaction consists of a stream of 3 main types of packets: Token, Data, and Handshake packets. A token packet contains the address of the device that the USB host is communicating with as well as the endpoint and read/write direction. The data packet contains variable length data depending on the device and drivers as well as a 16 bit CRC. Up to 1024 bytes of data can be sent in a single data packet (USB high-speed). Finally, a handshake packet is used for signaling the result of a transaction and it contains either an ACK (success), NAK (retransmit request), or STALL (unretryable failure).

These packets are used in the USB transaction layer. There are three main types of USB transactions: OUT, IN, and SETUP. All transactions must begin with a packet sent by the host. USB devices are never allowed to initiate any transactions and can only respond to the host.

An OUT transaction is used to write data to device X on endpoint Y (data leaves the host). It consists of three packets:

sequenceDiagram
    Host->>Device: Token Packet with type "OUT"
    Note over Host,Device: Device at address X and endpoint Y begins listening
    Host->>Device: Data packet
    Note over Host,Device: 0–1,024 bytes long + checksum
    Device->>Host: Handshake Packet (ACK, NAK, or STALL)

An IN transaction is used to signal to a device at address X to write the data stored in endpoint Y to the host. It is very similar to the above OUT transaction except the data flows from the device into the host.

sequenceDiagram
    Host->>Device: Token Packet with type "IN"
    Note over Host,Device: Device at address X begins sending data from endpoint Y
    Device->>Host: Data packet
    Note over Device,Host: 0–1,024 bytes long + checksum
    Host->>Device: Handshake Packet (ACK, NAK, or STALL)

Finally, the SETUP transaction is used by the host to gather information from a newly attached device and begin device initialization. The host device writes a setup byte to the control endpoint (endpoint 0) of a new device. These setup transactions are used for setting the address of the USB device, checking the status of a particular endpoint, or gathering information about the device. Since the device has not yet been assigned an address, newly attached USB devices will not listen for USB packets until the host sends a USB reset command. The host will only reset a single device at a time so only one device will be listening on the default address 0. The host will send multiple SETUP transactions in the process of initializing a new device. There are again three stages to the transaction:

sequenceDiagram
    Host->>Device: Token Packet with type "SETUP"
    Note over Host,Device: Device at address X enters setup mode
    Host->>Device: SETUP packet
    Note over Host,Device: One byte setup packet
    Device->>Host: Handshake Packet (ACK, NAK, or STALL)

These USB transactions are used in the USB transfer layer. These transfers are The USB protocl allows for four possible types of USB transfers.

To setup a device, the USB host sends a series of transactions. It first performs a SETUP transaction, followed by an optional IN transaction as the device responds with any requested data, and lastly with a dummy transaction used to signal the end of the setup process.

The USB bus consists of three different types of devices. The Host device is the leader of all USB communications on the bus and will initialize new devices. The Host is the only USB device that can initiate communication with any other USB device. USB Hubs are simple repeaters that will repeat packets from connected downstream USB devices upwards towards the root device and vice-versa. The USB Devices are the actual things most people interact with such as keyboards, mice, or flash drives.

When a new keyboard (or any new USB device) is plugged into a USB port, the host will begin initializing it. The host is constantly polling for new devices (at around 1000 Hz.) and can detect when a new device was added to the bus. Once the host detects a new device, the host will signal a USB reset command to the new device. The newly attached device will reset to a default state with USB address 0. The host can then initialize the new device by sending USB packets to address 0. Since the host is the only device that can initiate communication, it is guaranteed that the there will only be one device at a time with address 0 and no address conflicts will occur.

If the device was attached to a USB hub, then the hub itself actually handles initial setup. The hub will then flag that a new device was connected. Once the host reads the hub endpoint and is notified that a new device was attached, the host proceeds to reset the device and assign the a unique address. In all other USB transactions, hubs act as transparent repeaters and can be ignored for our purposes.

While assigning the new address, the host will also determine what device was actually attached. The USB host will send a Get_Device_Descriptor transaction which tells the host what kind of USB class it is, the device protocol, the vendor, and product ID’s. The host will additionally enumerate the interface descriptors which contains information about the supported USB endpoints, and the class of USB endpoint. For a USB keyboard, the device class is USB HID and will contain a HID interface endpoint. The host will use the above data to determine what driver to load to use the device.

Linux HID Driver Initialization

Everything before this point is usually implemented in silicon. The operating system does not need to handle these low-level details. However, now that the new USB device has been initialized on the USB bus, a device driver must be loaded and used to initialize the device. During system initialization, each USB device driver creates a struct usb_driver defined in include/linux/usb.h.

struct usb_driver {
   const char *name;

   int (*probe) (struct usb_interface *intf,
             const struct usb_device_id *id);

   void (*disconnect) (struct usb_interface *intf);

   int (*unlocked_ioctl) (struct usb_interface *intf, unsigned int code,
           void *buf);

   int (*suspend) (struct usb_interface *intf, pm_message_t message);
   int (*resume) (struct usb_interface *intf);
   int (*reset_resume)(struct usb_interface *intf);

   int (*pre_reset)(struct usb_interface *intf);
   int (*post_reset)(struct usb_interface *intf);

   const struct usb_device_id *id_table;
   const struct attribute_group **dev_groups;

   struct usb_dynids dynids;
   struct usbdrv_wrap drvwrap;
   unsigned int no_dynamic_id:1;
   unsigned int supports_autosuspend:1;
   unsigned int disable_hub_initiated_lpm:1;
   unsigned int soft_unbind:1;
}; 

The generic USB HID device driver located in drivers/hid/usbhid/hid-core.c creates an instance of this usb_driver struct which supports all USB interfaces of class USB_INTERFACE_CLASS_HID. This includes mice, joysticks, and keyboards. When the Linux kernel is looking for the correct USB driver to load, it calls the probe function callback to get more information about the driver. It then picks the most specific driver for the attached device. This probe function is also a place for the driver to begin creating any required structures, storing the endpoint information for later use, or any other initialization. The driver will then connect to the device, initialize the device, and open the device. In the case of the USB HID driver, the hid-core driver encapsulates the USB I/O and USB-HID transport protocol. The raw HID events are then exposed as a special HID device. This way the same generic keyboard driver can be used even if it’s a USB, Bluetooth, or PS/2 keyboard. For a USB keyboard, Linux exposes this raw HID device under /dev/usb/hiddevX where X is a number that changes depending on the number of HID devices.

Each time the USB host scans the USB keyboard, it will be able to read the current state of the keys. This triggers an interrupt in the USB driver that than passes it to the USB HID driver. The USB HID driver will than map the raw USB HID events to Linux specific HID events. These events are than exposed as raw HID events to userspace under /dev/hid/.

Input Drivers

These HID events needs to be translated into actual characters. The HID core driver discussed above is generic over many different HID devices and provides a framework for various HID transport and HID device protocols including touchscreens, backlights, mice, or even drawing tablets. These HID devices can be accessed over USB, I2C, or Bluetooth.

This conversion is handled by a input driver. These input drivers are loaded by the kernel and create special devices in /dev/input/ that can be interacted with using the read(), write(), open(), close(), etc. syscalls. In the case of a keyboard, the input driver handles the conversion between HID events and keycodes. These drivers usually implement the evdev interface so that usermode programs can interact with these input files through the libevdev library. For a console only system, the kernel forwards these events to the appropriate pseudo-tty. The process that is currently running the ptty can then receive these key presses by reading the stdin file by default.

For graphical systems such as X, libevdev is used to read the input device file located in /dev/input/. This way, X Server is responsible for all inputs on the system. Since all X client programs get input and draw to the screen through the X windows API, X server is able to control which program gets which input. However, if a program is running as root, it is still possible to bypass the X server and just read the input device directly.