nuImages devkit tutorial

Welcome to the nuImages tutorial. This demo assumes the database itself is available at /data/sets/nuimages, and loads a mini version of the dataset.

A Gentle Introduction to nuImages

In this part of the tutorial, let us go through a top-down introduction of our database. Our dataset is structured as a relational database with tables, tokens and foreign keys. The tables are the following:

  1. log - Log from which the sample was extracted.
  2. sample - An annotated camera image with an associated timestamp and past and future images and pointclouds.
  3. sample_data - An image or pointcloud associated with a sample.
  4. ego_pose - The vehicle ego pose and timestamp associated with a sample_data.
  5. sensor - General information about a sensor, e.g. CAM_BACK_LEFT.
  6. calibrated_sensor - Calibration information of a sensor in a log.
  7. category - Taxonomy of object and surface categories (e.g. vehicle.car, flat.driveable_surface).
  8. attribute - Property of an object that can change while the category remains the same.
  9. object_ann - Bounding box and mask annotation of an object (e.g. car, adult).
  10. surface_ann - Mask annotation of a surface (e.g. flat.driveable surface and vehicle.ego).

The database schema is visualized below. For more information see the schema page.

Google Colab (optional)


Open In Colab

If you are running this notebook in Google Colab, you can uncomment the cell below and run it; everything will be set up nicely for you. Otherwise, manually set up everything.

Initialization

To initialize the dataset class, we run the code below. We can change the dataroot parameter if the dataset is installed in a different folder. We can also omit it to use the default setup. These will be useful further below.

Tables

As described above, the NuImages class holds several tables. Each table is a list of records, and each record is a dictionary. For example the first record of the category table is stored at:

To see the list of all tables, simply refer to the table_names variable:

Indexing

Since all tables are lists of dictionaries, we can use standard Python operations on them. A very common operation is to retrieve a particular record by its token. Since this operation takes linear time, we precompute an index that helps to access a record in constant time.

Let us select the first image in this dataset version and split:

We can also get the sample record from a sample token:

What this does is actually to lookup the index. We see that this is the same index as we used in the first place.

From the sample, we can directly access the corresponding keyframe sample data. This will be useful further below.

Lazy loading

Initializing the NuImages instance above was very fast, as we did not actually load the tables. Rather, the class implements lazy loading that overwrites the internal __getattr__() function to load a table if it is not already stored in memory. The moment we accessed category, we could see the table being loaded from disk. To disable such notifications, just set verbose=False when initializing the NuImages object. Furthermore lazy loading can be disabled with lazy=False.

Rendering

To render an image we use the render_image() function. We can see the boxes and masks for each object category, as well as the surface masks for ego vehicle and driveable surface. We use the following colors:

At the top left corner of each box, we see the name of the object category (if with_category=True). We can also set with_attributes=True to print the attributes of each object (note that we can only set with_attributes=True to print the attributes of each object when with_category=True). In addition, we can specify if we want to see surfaces and objects, or only surfaces, or only objects, or neither by setting with_annotations to all, surfaces, objects and none respectively.

Let us make the image bigger for better visibility by setting render_scale=2. We can also change the line width of the boxes using box_line_width. By setting it to -1, the line width adapts to the render_scale. Finally, we can render the image to disk using out_path.

Let us find out which annotations are in that image.

We can see the object_ann and surface_ann tokens. Let's again render the image, but only focus on the first object and the first surface annotation. We can use the object_tokens and surface_tokens arguments as shown below. We see that only one car and the driveable surface are rendered.

To get the raw data (i.e. the segmentation masks, both semantic and instance) of the above, we can use get_segmentation().

Every annotated image (keyframe) comes with up to 6 past and 6 future images, spaced evenly at 500ms +- 250ms. However, a small percentage of the samples has less sample_datas, either because they were at the beginning or end of a log, or due to delays or dropped data packages. list_sample_content() shows for each sample all the associated sample_datas.

Besides the annotated images, we can also render the 6 previous and 6 future images, which are not annotated. Let's select the next image, which is taken around 0.5s after the annotated image. We can either manually copy the token from the list above or use the next pointer of the sample_data.

Now that we have the next token, let's render it. Note that we cannot render the annotations, as they don't exist.

Note: If you did not download the non-keyframes (sweeps), this will throw an error! We make sure to catch it here.

In this section we have presented a number of rendering functions. For convenience we also provide a script render_images.py that runs one or all of these rendering functions on a random subset of the 93k samples in nuImages. To run it, simply execute the following line in your command line. This will save image, depth, pointcloud and trajectory renderings of the front camera to the specified folder.

>> python nuimages/scripts/render_images.py --mode all --cam_name CAM_FRONT --out_dir ~/Downloads/nuImages --out_type image

Instead of rendering the annotated keyframe, we can also render a video of the 13 individual images, spaced at 2 Hz.

>> python nuimages/scripts/render_images.py --mode all --cam_name CAM_FRONT --out_dir ~/Downloads/nuImages --out_type video

Poses and CAN bus data

The ego_pose provides the translation, rotation, rotation_rate, acceleration and speed measurements closest to each sample_data. We can visualize the trajectories of the ego vehicle throughout the 6s clip of each annotated keyframe. Here the red x indicates the start of the trajectory and the green o the position at the annotated keyframe. We can set rotation_yaw to have the driving direction at the time of the annotated keyframe point "upwards" in the plot. We can also set rotation_yaw to None to use the default orientation (upwards pointing North). To get the raw data of this plot, use get_ego_pose_data() or get_trajectory().

Statistics

The list_*() methods are useful to get an overview of the dataset dimensions. Note that these statistics are always for the current split that we initialized the NuImages instance with, rather than the entire dataset.

list_categories() lists the category frequencies, as well as the category name and description. Each category is either an object or a surface, but not both.

We can also specify a sample_tokens parameter for list_categories() to get the category statistics for a particular set of samples.

list_attributes() shows the frequency, name and description of all attributes:

list_cameras() shows us how many camera entries and samples there are for each channel, such as the front camera. Each camera uses slightly different intrinsic parameters, which will be provided in a future release.

list_sample_data_histogram() shows a histogram of the number of images per annotated keyframe. Note that there are at most 13 images per keyframe. For the mini split shown here, all keyframes have 13 images.