discord-cluster-manager

Creating a Python Leaderboard

This section describes how to create Python-based leaderboards, which expect Python submissions (they can still inline compile CUDA code though). To create leaderboards on a Discord server, the Discord bot expects you to have a Leaderboard Admin or Leaderboard Creator role. These can be assigned by admins / owners of the server. Nevertheless, this section is also useful for participants to understand how their submissions are evaluated.

Like we’ve mentioned before, each leaderboard specifies a number of GPUs to evaluate on based on the creator’s choosing. You can think of each (task, GPU) pair as having essentially its own independent leaderboard, as for example, a softmax kernel on an NVIDIA T4 may perform very differently on an NVIDIA H100. We give leaderboard creators the option to select which GPUs they care about for their leaderboards – for example, they may only care about NVIDIA A100 and NVIDIA H100 performance for their leaderboard.

To create a leaderboard you can run:

``` /leaderboard create {leaderboard_name: str} {deadline: str} {task_zip: .zipped folder} ```

After running this, similar to leaderboard submissions, a UI window will pop up asking which GPUs the leaderboard creator wants to enable submissions on. In the remaining section, we detail how the unzipped task_zip folder should be structured. Examples of these folders can be found here.

The task.yml specification.

When a user submits a reference kernel, it is launched inside of a leaderboard-specific evaluation harness, and we provide several copy-able examples of a leaderboard folder in our GitHub. The relevant files are defined in a task.yml – for example, in the identity-py leaderboard, the YAML looks as follows: ```yaml title=”task.yml”

What files are involved in leaderboard evaluation

files:

Leaderboard language

lang: “py”

Description of leaderboard task

description: Identity kernel in Python.

Compilation flag for what to target as main

config: main: “eval.py”

An example to provide to participants for writing a leaderboard submission

templates: Python: “template.py”

tests:

benchmarks:

This config file controls all relevant details about how participant will interact with the leaderboard. We will discuss each parameter in detail. Some of the more simple keys are:

Required files in the leaderboard .zip

Other than task.yml, the files key controls the list of files that the evaluation harness expects. The leaderboard creator has to include all of these files, but we provide examples to make it a lot easier. The name key is how this file is imported locally, and the source key is the name of the actual file in the folder.

In short, most leaderboard creators will only have to edit task.py and reference.py, but we will go over how to edit these more in detail.

A simple task.py and reference.py example

To keep this simple, a leaderboard creator really only needs to specify:

  1. The input / output types of the desired leaderboard kernel.
  2. A generator that generates input data with specific properties.
  3. An actual example reference kernel that serves as ground truth.
  4. A comparison function to check for correctness of a user submitted kernel against the reference. We allow leaderboard creators full flexibility to specify things like margin of error.

We recommend following our examples for simplicity, but our task definition allows leaderboard creators to fully modify their evaluation harness. In the remaining sections, we will go over how to use our pre-defined examples. In all of our examples, the task.py file handles (1) and part of (2), while the reference.py file handles (2,3,4). Below, we provide the task.py for the identity-py leaderboard.

```python title=”task.py” from typing import TypedDict import torch

Define input / output types for kernel

input_t = torch.Tensor output_t = input_t

Define modifiable arguments for input data generation

class TestSpec(TypedDict): size: int seed: int


The example above specifies aliases for the input (`input_t`) and output (`output_t`) types of the kernel task. It also specifies
a struct called `TestSpec`, which specifies **what arguments are passed into the input data generator** at runtime. We distinguish
between `tests` cases and `benchmarks` cases, the former being the actual leaderboard cases and the latter being for users to 
debug their code. Using this `TestSpec` specification, we provide test cases to the `task.yml` and fill in the arguments, as shown below:


```yaml title="task.yml"
...
tests:
  - {"size": 128, "seed": 5236}
  - {"size": 129, "seed": 1001}
  - {"size": 256, "seed": 5531}

benchmarks:
  - {"size": 1024, "seed": 54352}
  - {"size": 4096, "seed": 6256}
  - {"size": 16384, "seed": 6252}
  - {"size": 65536, "seed": 125432}

Finally, we fill in details for the input data generator, reference kernel, and correctness checker for identity-py below:

```python title=”reference.py import torch from task import input_t, output_t from utils import verbose_allclose

Input data generator. Arguments must match TestSpec in task.py

def generate_input(size: int, seed: int) -> input_t: gen = torch.Generator(device=’cuda’) gen.manual_seed(seed) data = torch.empty(size, device=’cuda’, dtype=torch.float16) data.uniform_(0, 1, generator=gen) return data

Referece kernel. Must take input_t and produce output_t

def ref_kernel(data: input_t) -> output_t: return data

Returns any errors (empty if none)

def check_implementation(data, output) -> str: expected = ref_kernel(data) reasons = verbose_allclose(output, expected) if len(reasons) > 0: return “Mismatch found! custom implementation doesn’t match reference.: “ + reasons[0] return ‘’


As mentioned earlier, based on `task.yml` and `task.py`, each test case will pass a specified set
of arguments to `generate_input(...)` to produce the input data for that task case. We recommend specifying 
a seed argument to properly randomizing inputs in a reproducible manner. Furthermore, `check_implementation` returns
a string to give leaderboard creators the flexibility to provide error messages to participants to help debug. 

**Remark.** Leaderboard creators have the flexibility to edit the logic in `eval.py`, which uses all of these functions
to evaluate and measure the user specified kernels. The examples above assume the use of our `eval.py` implementation, but
this can be modified if desired.

## Deleting a Leaderboard
If you have sufficient permissions on the server, you can also delete leaderboards with:

<center>

/leaderboard delete {leaderboard_name: str} ``` </center>

This command will display a UI window with a list of available leaderboards. Select the leaderboard you want to delete from the list. Once confirmed, the leaderboard and all associated submissions will be permanently removed. Please use this command with caution, as it will also delete the leaderboard history as well.

Existing Leaderboard Examples

We try to provide examples of leaderboards that can be quickly copied and modified for other references here. Most leaderboards should be able to just modify these files.