rle_array package

Module contents

class RLEArray(data: numpy.ndarray, positions: numpy.ndarray)

Bases: pandas.core.arrays.base.ExtensionArray

Run-length encoded array.

  • data – Data for each run. Must be a one-dimensional. All Pandas-supported dtypes are supported.

  • positions – End-positions for each run. Must be one-dimensional and must have same length as data. dtype must be POSITIONS_DTYPE.

all(axis: Optional[int] = 0, out: Optional[Any] = None)bool
any(axis: Optional[int] = 0, out: Optional[Any] = None)bool
astype(dtype: Any, copy: bool = True, casting: str = 'unsafe')Any

Cast to a NumPy array with ‘dtype’.

  • dtype (str or dtype) – Typecode or data-type to which the array is cast.

  • copy (bool, default True) – Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.


array – NumPy ndarray with ‘dtype’ for its dtype.

Return type



Return a copy of the array.


Return type



Return ExtensionArray without NA values.



Return type


property dtype

An instance of ‘ExtensionDtype’.

factorize(na_sentinel: int = - 1)Tuple[numpy.ndarray, rle_array.array.RLEArray]

Encode the extension array as an enumerated type.


na_sentinel (int, default -1) – Value to use in the codes array to indicate missing values.


  • codes (ndarray) – An integer NumPy array that’s an indexer into the original ExtensionArray.

  • uniques (ExtensionArray) – An ExtensionArray containing the unique values of self.


    uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See also


Top-level factorize method that dispatches here.


pandas.factorize() offers a sort keyword as well.

fillna(value: Optional[Any] = None, method: Optional[str] = None, limit: Optional[int] = None)rle_array.array.RLEArray

Fill NA/NaN values using the specified method.

  • value (scalar, array-like) – If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like ‘value’ can be given. It’s expected that the array-like have the same length as ‘self’.

  • method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap.

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.


With NA/NaN filled.

Return type



A 1-D array indicating if each value is missing.


na_values – In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type

Union[np.ndarray, ExtensionArray]


If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

kurt(skipna: bool = True)Any
max(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
mean(skipna: bool = True, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
median(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
min(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
property nbytes

The number of bytes needed to store this object in memory.

prod(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
round(decimals: int = 0)rle_array.array.RLEArray
shift(periods: int = 1, fill_value: Optional[object] = None)rle_array.array.RLEArray

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

New in version 0.24.0.

  • periods (int, default 1) – The number of periods to shift. Negative values are allowed for shifting backwards.

  • fill_value (object, optional) –

    The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

    New in version 0.24.0.



Return type



If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

skew(skipna: bool = True)Any
std(skipna: bool = True, ddof: int = 1, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
sum(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
take(indices: Sequence[int], allow_fill: bool = False, fill_value: Optional[Any] = None)rle_array.array.RLEArray

Take elements from an array.

  • indices (sequence of int) – Indices to be taken.

  • allow_fill (bool, default False) –

    How to handle negative values in indices.

    • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

    • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

  • fill_value (any, optional) –

    Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

    For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.


Return type


  • IndexError – When the indices are out of bounds for the array.

  • ValueError – When indices contains negative values other than -1 and allow_fill is True.


ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.


Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
    return self._from_sequence(result, dtype=self.dtype)

Compute the ExtensionArray of unique values.



Return type


value_counts(dropna: bool = True)pandas.core.series.Series
var(skipna: bool = True, ddof: int = 1, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
view(dtype: Optional[Any] = None)Any

Return a view on the array.


dtype (str, np.dtype, or ExtensionDtype, optional) – Default None.


A view on the ExtensionArray’s data.

Return type

ExtensionArray or np.ndarray

class RLEDtype(dtype: Any)

Bases: pandas.core.dtypes.base.ExtensionDtype

classmethod construct_array_type()Callable[[numpy.ndarray, numpy.ndarray], rle_array.array.RLEArray]

Return the array type associated with this dtype.


Return type


classmethod construct_from_string(string: str)rle_array.dtype.RLEDtype

Strict construction from a string, raise a TypeError if not possible.

property kind

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also


property name

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

auto_convert_to_rle(df: pandas.core.frame.DataFrame, threshold: Optional[float] = None)pandas.core.frame.DataFrame

Auto-convert given DataFrame to RLE compressed DataFrame.


Datetime columns are currently not compressed due to pandas not supporting them.

Please note that RLE can, under some circumstances, require MORE memory than the uncompressed data. It is not advisable to set threshold to a value larger than 1 except for testing purposes.

  • df – Input DataFrame, may already contain RLE columns. This input data MIGHT not be copied!

  • threshold

    Compression threshold, e.g.:

    • None: compress all

    • 1.0 compresses only if RLE does NOT take up more space

    • 0.5 compresses if at least 50% memory are safed

    • 0.0 do not compress at all


ValueError – If threshold is negative.