Skip to content

Documentation for average_pool

Functionality

The average_pool function applies average pooling on neural network hidden states. It uses an attention mask to ignore tokens that are not of interest, ensuring that only the specified tokens contribute to the pooling operation.

Parameters

  • last_hidden_states: A tensor of shape [batch_size, sequence_length, hidden_size] containing the last hidden states of the model.
  • attention_mask: A tensor of the same sequence dimension marking tokens to attend (1) and ignore (0).

Usage

  • Purpose: To aggregate token-level representations into a single vector by averaging over tokens specified by the attention mask.

Example

Suppose you have model outputs stored in hidden_states and an attention_mask. You can obtain the pooled output via:

pooled = average_pool(hidden_states, attention_mask)