On Mon, Nov 21, 2022 at 12:02 AM Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> wrote:
On 11/19/2022 1:44 PM, Oded Gabbay wrote:
Add an introduction section for the accel subsystem. Most of the
relevant data is in the DRM documentation, so the introduction only
presents the why of the new subsystem, how are the compute accelerators
exposed to user-space and what changes need to be done in a standard
DRM driver to register it to the new accel subsystem.
Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
---
Documentation/accel/index.rst | 17 +++++
Documentation/accel/introduction.rst | 109 +++++++++++++++++++++++++++
Documentation/subsystem-apis.rst | 1 +
MAINTAINERS | 1 +
4 files changed, 128 insertions(+)
create mode 100644 Documentation/accel/index.rst
create mode 100644 Documentation/accel/introduction.rst
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
new file mode 100644
index 000000000000..2b43c9a7f67b
--- /dev/null
+++ b/Documentation/accel/index.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Compute Accelerators
+====================
+
+.. toctree::
+ :maxdepth: 1
+
+ introduction
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst
new file mode 100644
index 000000000000..5a3963eae973
--- /dev/null
+++ b/Documentation/accel/introduction.rst
@@ -0,0 +1,109 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Introduction
+============
+
+The Linux compute accelerators subsystem is designed to expose compute
+accelerators in a common way to user-space and provide a common set of
+functionality.
+
+These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
+Although these devices are typically designed to accelerate Machine-Learning
+and/or Deep-Learning computations, the accel layer is not limited to handling
You use "DL" later on as a short form for Deep-Learning. It would be
good to introduce that here.
+these types of accelerators.
+
+typically, a compute accelerator will belong to one of the following
Typically
+categories:
+
+- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
+ or an IP inside a SoC (e.g. laptop web camera). These devices
+ are typically configured using registers and can work with or without DMA.
+
+- Inference data-center - single/multi user devices in a large server. This
+ type of device can be stand-alone or an IP inside a SoC or a GPU. It will
+ have on-board DRAM (to hold the DL topology), DMA engines and
+ command submission queues (either kernel or user-space queues).
+ It might also have an MMU to manage multiple users and might also enable
+ virtualization (SR-IOV) to support multiple VMs on the same device. In
+ addition, these devices will usually have some tools, such as profiler and
+ debugger.
+
+- Training data-center - Similar to Inference data-center cards, but typically
+ have more computational power and memory b/w (e.g. HBM) and will likely have
+ a method of scaling-up/out, i.e. connecting to other training cards inside
+ the server or in other servers, respectively.
+
+All these devices typically have different runtime user-space software stacks,
+that are tailored-made to their h/w. In addition, they will also probably
+include a compiler to generate programs to their custom-made computational
+engines. Typically, the common layer in user-space will be the DL frameworks,
+such as PyTorch and TensorFlow.
+
+Sharing code with DRM
+=====================
+
+Because this type of devices can be an IP inside GPUs or have similar
+characteristics as those of GPUs, the accel subsystem will use the
+DRM subsystem's code and functionality. i.e. the accel core code will
+be part of the DRM subsystem and an accel device will be a new type of DRM
+device.
+
+This will allow us to leverage the extensive DRM code-base and
+collaborate with DRM developers that have experience with this type of
+devices. In addition, new features that will be added for the accelerator
+drivers can be of use to GPU drivers as well.
+
+Differentiation from GPUs
+=========================
+
+Because we want to prevent the extensive user-space graphic software stack
+from trying to use an accelerator as a GPU, the compute accelerators will be
+differentiated from GPUs by using a new major number and new device char files.
+
+Furthermore, the drivers will be located in a separate place in the kernel
+tree - drivers/accel/.
+
+The accelerator devices will be exposed to the user space with the dedicated
+261 major number and will have the following convention:
+
+- device char files - /dev/accel/accel*
+- sysfs - /sys/class/accel/accel*/
+- debugfs - /sys/kernel/debug/accel/accel*/
+
+Getting Started
+===============
+
+First, read the DRM documentation. Not only it will explain how to write a new
How about a link to the DRM documentation?
+DRM driver but it will also contain all the information on how to contribute,
+the Code Of Conduct and what is the coding style/documentation. All of that
+is the same for the accel subsystem.
+
+Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
+
+To expose your device as an accelerator, two changes are needed to
+be done in your driver (as opposed to a standard DRM driver):
+
+- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
+ driver_features field. It is important to note that this driver feature is
+ mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
I don't remember seeing code that validates a driver with
DRIVER_COMPUTE_ACCEL does not also have DRIVER_MODESET. What am I missing?
Look at drm_dev_init() (patch 3/4):
if (drm_core_check_feature(dev, DRIVER_COMPUTE_ACCEL) &&
(drm_core_check_feature(dev, DRIVER_RENDER) ||
drm_core_check_feature(dev, DRIVER_MODESET))) {
DRM_ERROR("DRM driver can't be both a compute acceleration
and graphics driver\n");
return -EINVAL;
}