ParlAI has very many utilities, roughly organized by function.

Thread Utilities

Provides utilities useful for multiprocessing.

This includes a SharedTable.

class parlai.utils.thread.SharedTable(init_dict=None)[source]

Bases: collections.abc.MutableMapping

Provides a simple shared-memory table of integers, floats, or strings.

Use this class as follows:

tbl = SharedTable({'cnt': 0})
with tbl.get_lock():
    tbl['startTime'] = time.time()
for i in range(10):
    with tbl.get_lock():
        tbl['cnt'] += 1

Create a shared memory version of each element of the initial dictionary.

Creates an empty array otherwise, which will extend automatically when keys are added.

Each different type (all supported types listed in the types array above) has its own array. For each key we store an index into the appropriate array as well as the type of value stored for that key.


Return the lock.


Return if an object is a torch Tensor, without importing torch.

Torch Utilities

Utility methods for dealing with torch code.

parlai.utils.torch.neginf(dtype: torch.dtype) → float[source]

Return a representable finite number near -inf for a dtype.

parlai.utils.torch.atomic_save(state_dict: Any, path: str) → None[source]

Like torch.save, but atomic.

Useful for preventing trouble coming from being pre-empted or killed while writing to disk. Works by writing to a temporary file, and then renaming the file to the final name.

parlai.utils.torch.padded_tensor(items: List[Union[List[int], torch.LongTensor]], pad_idx: int = 0, use_cuda: bool = False, left_padded: bool = False, max_len: Optional[int] = None, fp16friendly: bool = False, device: int = -1) → Tuple[torch.LongTensor, List[int]][source]

Create a padded matrix from an uneven list of lists.

Returns (padded, lengths), where padded is the padded matrix, and lengths is a list containing the lengths of each row.

Matrix is right-padded (filled to the right) by default, but can be left padded if the flag is set to True.

Matrix can also be placed on cuda automatically.

  • items (list[iter[int]]) – List of items

  • sort (bool) – If True, orders by the length

  • pad_idx (int) – the value to use for padding

  • use_cuda (bool) – if true, places padded on GPU

  • left_padded (bool) –

  • max_len (int) – if None, the max length is the maximum item length

  • fp16friendly (bool) – if True, pads the time dimension to be a multiple of 4.

  • device (int) – GPU device.


(padded, lengths) tuple

Return type

(Tensor[int64], list[int])

parlai.utils.torch.padded_3d(tensors: List[torch.LongTensor], pad_idx: int = 0, use_cuda: bool = False, dtype: Optional[torch.dtype] = torch.int64, fp16friendly: bool = False)[source]

Make 3D padded tensor for list of lists of 1D tensors or lists.

  • tensors – list of lists of 1D tensors (or lists)

  • pad_idx – padding to fill tensor with

  • use_cuda – whether to call cuda() before returning

  • fp16friendly (bool) – if True, pads the final dimension to be a multiple of 8.


3D tensor with the maximum dimensions of the inputs

parlai.utils.torch.concat_without_padding(text_idx, cand_idx, use_cuda, null_idx=0)[source]

Concatenate two right padded tensors and move padding to the right.

For example,

if text_idx = [[1, 2, 3, 4, 0, 0 ]] and cand_idx = [[5, 6, 7, 8, 0, 0 ]]:

Then result = (tokens, segments) where

tokens = [[1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0]] segments = [[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]]

parlai.utils.torch.argsort(keys: List[Any], *lists: List[List[Any]], descending: bool = False)[source]

Reorder each list in lists by the (descending) sorted order of keys.

  • keys (iter) – Keys to order by.

  • lists (list[list]) – Lists to reordered by keys’s order. Correctly handles lists and 1-D tensors.

  • descending (bool) – Use descending order if true.


The reordered items.

parlai.utils.torch.compute_grad_norm(parameters, norm_type=2.0)[source]

Compute norm over gradients of model parameters.

  • parameters – the model parameters for gradient norm calculation. Iterable of Tensors or single Tensor

  • norm_type – type of p-norm to use


the computed gradient norm

class parlai.utils.torch.IdentityLayer[source]

Bases: torch.nn.modules.module.Module

Identity layer module.

Useful for decoder-only Torch Generator agents.



parlai.utils.torch.total_parameters(model: torch.nn.modules.module.Module) → int[source]

Count the total number of parameters in the model.


model – the model whose parameters we wish to count.


total number of parameters in the model.

parlai.utils.torch.trainable_parameters(model: torch.nn.modules.module.Module) → int[source]

Count the total number of trainable parameters in the model.


model – the model whose parameters we wish to count.


total number of trainable parameters in the model.

class parlai.utils.torch.PipelineWorkItem(chunk_idx, layer_nos, next_device)

Bases: tuple

property chunk_idx

Alias for field number 0

property layer_nos

Alias for field number 1

property next_device

Alias for field number 2

class parlai.utils.torch.PipelineHelper[source]

Bases: object

PipelineHelper assists with implementing pipelining in model parallelism.

For a tutorial on model parallelism, as it’s implemented in parts of ParlAI, see https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html.

Usage: >>> my_model = PipelineHelper().make_parallel(my_model)

Note that you will need to manually implement logic which handles the moved layers.


Initialize self. See help(type(self)) for accurate signature.

make_parallel(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]

Allocate specific layers in a model to be ModelParallel.

Limited to only ModuleLists within the model. Uses some heuristics to attempt to evenly distribute layers across GPUs, in order to balance memory usage. They are:

  • Assume the 0th GPU will host the optimizer, word embeddings, etc.

  • Assume activation memory is linear with the number of parameters.

  • All layers are approximately equal in size.

static guess_split_size(item: Chunk, num_gpus: Optional[int] = None, dim=0) → int[source]

Estimate the number of chunks we should split the batch into via heuristics.

static split(item: Chunk, split_size: Optional[int] = None, dim=0) → List[Chunk][source]

Split a tensor or group of tensors into smaller chunks of the same type.

  • item – The item being split. May be a Tensor, a tuple of Tensors, or a dictionary mapping str -> Tensor.

  • split_size – The maximum size of each output chunk. If None, we will guess using heuristics

  • dim – The dimension to split along.

static join(items: List[Chunk], dim=0) → Chunk[source]

Join chunks back together, the inverse of split.

  • items – All the output chunks. Each chunk may be a tensor or a group of tensors.

  • dim – The dimension to join along.

static chunk_to(chunk: Chunk, device: str) → Chunk[source]

Move the chunk to the device.

Handles chunks which are groups of tensors.

static schedule_work_items(layers: torch.nn.modules.container.ModuleList, chunks: List[Chunk])[source]

Iterate through chunks and layers that should be pipelined.

Each iteration of this generator yields the following properties:

  • layer_nos: a list of indices of layers for you to forward through

  • chunk_idx: the index of the chunk we are manipulating. Use this if you need to update chunk representations.

  • next_device: where the chunk should be moved to AFTER the layer computation is done.

Uncategorized Utils

File for miscellaneous utility functions and constants.

parlai.utils.misc.maintain_dialog_history(history, observation, reply='', historyLength=1, useReplies='label_else_model', dict=None, useStartEndIndices=True, splitSentences=False)[source]

Keep track of dialog history, up to a truncation length.

Either includes replies from the labels, model, or not all using param ‘replies’.


parlai.utils.misc.load_cands(path, lines_have_ids=False, cands_are_replies=False)[source]

Load global fixed set of candidate labels that the teacher provides.

Every example will include these as candidates. The true labels for a specific example are also added to this set, so that it’s possible to get the right answer.

class parlai.utils.misc.Timer[source]

Bases: object

Computes elapsed time.


Initialize timer.


Reset timer to zero.


Resume timer.


Pause timer.


Get current timer time.

class parlai.utils.misc.TimeLogger[source]

Bases: object

Class for logging time progress against a goal.


Set up timer.


Return time elapsed at last log call.


Return current timer time.

log(done, total, report=None)[source]

Log report, time elapsed, and percentage progress towards goal.

  • done – number of examples completed so far

  • total – total number of elements to be completed. if total > 0, calculates the time remaining and percentage complete.

  • report – dict of pairs to log


tuple log string, log dict log string contains time elapsed and string representation of the log dict log dict contains pairs of all items to log, which includes percentage complete and projected time left if total > 0

class parlai.utils.misc.AttrDict(*args, **kwargs)[source]

Bases: dict

Helper class to have a dict-like object with dot access.

For example, instead of d = {‘key’: ‘value’} use d = AttrDict(key=’value’). To access keys, instead of doing d[‘key’] use d.key.

While this has some limitations on the possible keys (for example, do not set the key items or you will lose access to the items() method), this can make some code more clear.

__init__(*args, **kwargs)[source]

Initialize AttrDict using input dict.

class parlai.utils.misc.NoLock[source]

Bases: object

Empty lock.

Does nothing when you enter or exit.

parlai.utils.misc.float_formatter(f: Union[float, int]) → str[source]

Format a float as a pretty string.

parlai.utils.misc.nice_report(report) → str[source]

Render an agent Report as a beautiful string.

If pandas is installed, we will use it to render as a table. Multitask metrics will be shown per row, e.g.

If pandas is not available, we will use a dict with like-metrics placed next to each other.

parlai.utils.misc.round_sigfigs(x: Union[float, torch.Tensor], sigfigs=4) → float[source]

Round value to specified significant figures.

  • x – input number

  • sigfigs – number of significant figures to return


float number rounded to specified sigfigs


Build a nolock for other classes to use for no-op locking.

parlai.utils.misc.clip_text(text, max_len)[source]

Clip text to max length, adding ellipses.

parlai.utils.misc.display_messages(msgs: List[Dict[str, Any]], prettify: bool = False, ignore_fields: str = '', max_len: int = 1000, verbose: bool = False) → Optional[str][source]

Return a string describing the set of messages provided.

If prettify is true, candidates are displayed using prettytable. ignore_fields provides a list of fields in the msgs which should not be displayed.

parlai.utils.misc.str_to_msg(txt, ignore_fields='')[source]

Convert formatted string to ParlAI message dict.

  • txt – formatted string to convert. String format is tab-separated fields, with colon separating field name and contents.

  • ignore_fields – (default ‘’) comma-separated field names to not include in the msg dict even if they’re in the string.

parlai.utils.misc.msg_to_str(msg, ignore_fields='')[source]

Convert ParlAI message dict to string.

  • msg – dict to convert into a string.

  • ignore_fields – (default ‘’) comma-separated field names to not include in the string even if they’re in the msg dict.

parlai.utils.misc.set_namedtuple_defaults(namedtuple, default=None)[source]

Set all of the fields for a given nametuple to a singular value.

Additionally removes the default docstring for each field. Modifies the tuple in place, but returns it anyway.

More info: https://stackoverflow.com/a/18348004

  • namedtuple – A constructed collections.namedtuple

  • default – The default value to set.


the modified namedtuple

parlai.utils.misc.warn_once(msg: str) → None[source]

Log a warning, but only once.


msg (str) – Message to display

parlai.utils.misc.error_once(msg: str) → None[source]

Log an error, but only once.


msg (str) – Message to display

parlai.utils.misc.recursive_getattr(obj, attr, *args)[source]

Recursive call to getattr for nested attributes.