rle_array.array module

class RLEArray(data: numpy.ndarray, positions: numpy.ndarray)

Bases: pandas.core.arrays.base.ExtensionArray

Run-length encoded array.

Parameters
  • data – Data for each run. Must be a one-dimensional. All Pandas-supported dtypes are supported.

  • positions – End-positions for each run. Must be one-dimensional and must have same length as data. dtype must be POSITIONS_DTYPE.

all(axis: Optional[int] = 0, out: Optional[Any] = None)bool
any(axis: Optional[int] = 0, out: Optional[Any] = None)bool
astype(dtype: Any, copy: bool = True, casting: str = 'unsafe')Any

Cast to a NumPy array with ‘dtype’.

Parameters
  • dtype (str or dtype) – Typecode or data-type to which the array is cast.

  • copy (bool, default True) – Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

array – NumPy ndarray with ‘dtype’ for its dtype.

Return type

ndarray

copy()rle_array.array.RLEArray

Return a copy of the array.

Returns

Return type

ExtensionArray

dropna()rle_array.array.RLEArray

Return ExtensionArray without NA values.

Returns

valid

Return type

ExtensionArray

property dtype

An instance of ‘ExtensionDtype’.

factorize(na_sentinel: int = - 1)Tuple[numpy.ndarray, rle_array.array.RLEArray]

Encode the extension array as an enumerated type.

Parameters

na_sentinel (int, default -1) – Value to use in the codes array to indicate missing values.

Returns

  • codes (ndarray) – An integer NumPy array that’s an indexer into the original ExtensionArray.

  • uniques (ExtensionArray) – An ExtensionArray containing the unique values of self.

    Note

    uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See also

factorize

Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

fillna(value: Optional[Any] = None, method: Optional[str] = None, limit: Optional[int] = None)rle_array.array.RLEArray

Fill NA/NaN values using the specified method.

Parameters
  • value (scalar, array-like) – If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like ‘value’ can be given. It’s expected that the array-like have the same length as ‘self’.

  • method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap.

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Returns

With NA/NaN filled.

Return type

ExtensionArray

isna()rle_array.array.RLEArray

A 1-D array indicating if each value is missing.

Returns

na_values – In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type

Union[np.ndarray, ExtensionArray]

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

kurt(skipna: bool = True)Any
max(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
mean(skipna: bool = True, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
median(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
min(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
property nbytes

The number of bytes needed to store this object in memory.

prod(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
round(decimals: int = 0)rle_array.array.RLEArray
shift(periods: int = 1, fill_value: Optional[object] = None)rle_array.array.RLEArray

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

New in version 0.24.0.

Parameters
  • periods (int, default 1) – The number of periods to shift. Negative values are allowed for shifting backwards.

  • fill_value (object, optional) –

    The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

    New in version 0.24.0.

Returns

Shifted.

Return type

ExtensionArray

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

skew(skipna: bool = True)Any
std(skipna: bool = True, ddof: int = 1, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
sum(skipna: bool = True, axis: Optional[int] = 0, out: Optional[Any] = None)Any
take(indices: Sequence[int], allow_fill: bool = False, fill_value: Optional[Any] = None)rle_array.array.RLEArray

Take elements from an array.

Parameters
  • indices (sequence of int) – Indices to be taken.

  • allow_fill (bool, default False) –

    How to handle negative values in indices.

    • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

    • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

  • fill_value (any, optional) –

    Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

    For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

Return type

ExtensionArray

Raises
  • IndexError – When the indices are out of bounds for the array.

  • ValueError – When indices contains negative values other than -1 and allow_fill is True.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
unique()rle_array.array.RLEArray

Compute the ExtensionArray of unique values.

Returns

uniques

Return type

ExtensionArray

value_counts(dropna: bool = True)pandas.core.series.Series
var(skipna: bool = True, ddof: int = 1, dtype: Optional[Any] = None, axis: Optional[int] = 0, out: Optional[Any] = None)Any
view(dtype: Optional[Any] = None)Any

Return a view on the array.

Parameters

dtype (str, np.dtype, or ExtensionDtype, optional) – Default None.

Returns

A view on the ExtensionArray’s data.

Return type

ExtensionArray or np.ndarray