Base

`tablite.base`

Attributes

`tablite.base.log = logging.getLogger(name)` `module-attribute`

`tablite.base.file_registry = set()` `module-attribute`

Classes

`tablite.base.SimplePage(id, path, len, py_dtype)`

Bases: object

Source code in tablite/base.py

def __init__(self, id, path, len, py_dtype) -> None:
    self.path = Path(path) / "pages" / f"{id}.npy"
    self.len = len
    self.dtype = py_dtype

    self._incr_refcount()

Attributes

`tablite.base.SimplePage.ids = count(start=1)` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.refcounts = {}` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.autocleanup = True` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.path = Path(path) / 'pages' / f'{id}.npy'` `instance-attribute`

`tablite.base.SimplePage.len = len` `instance-attribute`

`tablite.base.SimplePage.dtype = py_dtype` `instance-attribute`

Functions

`tablite.base.SimplePage.setstate(state)`

when an object is unpickled, say in a case of multi-processing, object.setstate(state) is called instead of init, this means we need to update page refcount as if constructor had been called

Source code in tablite/base.py

def __setstate__(self, state):
    """
    when an object is unpickled, say in a case of multi-processing,
    object.__setstate__(state) is called instead of __init__, this means
    we need to update page refcount as if constructor had been called
    """
    self.__dict__.update(state)

    self._incr_refcount()

`tablite.base.SimplePage.next_id(path)` `classmethod`

Source code in tablite/base.py

@classmethod
def next_id(cls, path):
    path = Path(path)

    while True:
        _id = f"{os.getpid()}-{next(cls.ids)}"
        _path = path / "pages" / f"{_id}.npy"

        if not _path.exists():
            break  # make sure we don't override existing pages if they are created outside of main thread

    return _id

`tablite.base.SimplePage.len()`

Source code in tablite/base.py

def __len__(self):
    return self.len

`tablite.base.SimplePage.repr() -> str`

Source code in tablite/base.py

def __repr__(self) -> str:
    try:
        return f"{self.__class__.__name__}({self.path}, {self.get()})"
    except FileNotFoundError as e:
        return f"{self.__class__.__name__}({self.path}, <{type(e).__name__}>)"
    except Exception as e:
        return f"{self.__class__.__name__}({self.path}, <{e}>)"

`tablite.base.SimplePage.hash() -> int`

Source code in tablite/base.py

def __hash__(self) -> int:
    return hash(self.path)

`tablite.base.SimplePage.owns()`

Source code in tablite/base.py

def owns(self):
    parts = self.path.parts

    return all((p in parts for p in Path(Config.pid).parts))

`tablite.base.SimplePage.del()`

When python's reference count for an object is 0, python uses it's garbage collector to remove the object and free the memory. As tablite tables have columns and columns have page and pages have data stored on disk, the space on disk must be freed up as well. This del override assures the cleanup of stored data.

Source code in tablite/base.py

def __del__(self):
    """When python's reference count for an object is 0, python uses
    it's garbage collector to remove the object and free the memory.
    As tablite tables have columns and columns have page and pages have
    data stored on disk, the space on disk must be freed up as well.
    This __del__ override assures the cleanup of stored data.
    """
    if not self.owns():
        return

    refcount = self.refcounts[self.path] = max(
        self.refcounts.get(self.path, 0) - 1, 0
    )

    if refcount > 0:
        return

    if self.autocleanup:
        self.path.unlink(True)

    del self.refcounts[self.path]

`tablite.base.SimplePage.get()`

loads stored data

RETURNS	DESCRIPTION
	np.ndarray: stored data.

Source code in tablite/base.py

def get(self):
    """loads stored data

    Returns:
        np.ndarray: stored data.
    """
    array = load_numpy(self.path)
    return MetaArray(array, array.dtype, py_dtype=self.dtype)

`tablite.base.Page(path, array)`

Bases: SimplePage

PARAMETER	DESCRIPTION
`path`	working directory. TYPE: `Path`
`array`	data TYPE: `array`

Source code in tablite/base.py

def __init__(self, path, array) -> None:
    """
    Args:
        path (Path): working directory.
        array (np.array): data
    """
    _id = self.next_id(path)

    type_check(array, np.ndarray)

    if Config.DISK_LIMIT <= 0:
        pass
    else:
        _, _, free = shutil.disk_usage(path)
        if free - array.nbytes < Config.DISK_LIMIT:
            msg = "\n".join(
                [
                    f"Disk limit reached: Config.DISK_LIMIT = {Config.DISK_LIMIT:,} bytes.",
                    f"array requires {array.nbytes:,} bytes, but only {free:,} bytes are free.",
                    "To disable this check, use:",
                    ">>> from tablite.config import Config",
                    ">>> Config.DISK_LIMIT = 0",
                    "To free space, clean up Config.workdir:",
                    f"{Config.workdir}",
                ]
            )
            raise OSError(msg)

    _len = len(array)
    # type_check(array, MetaArray)
    if not hasattr(array, "metadata"):
        raise ValueError
    _dtype = array.metadata["py_dtype"]

    super().__init__(_id, path, _len, _dtype)

    np.save(self.path, array, allow_pickle=True, fix_imports=False)
    log.debug(f"Page saved: {self.path}")

Attributes

`tablite.base.Page.ids = count(start=1)` `class-attribute` `instance-attribute`

`tablite.base.Page.refcounts = {}` `class-attribute` `instance-attribute`

`tablite.base.Page.autocleanup = True` `class-attribute` `instance-attribute`

`tablite.base.Page.path = Path(path) / 'pages' / f'{id}.npy'` `instance-attribute`

`tablite.base.Page.len = len` `instance-attribute`

`tablite.base.Page.dtype = py_dtype` `instance-attribute`

Functions

`tablite.base.Page.setstate(state)`

when an object is unpickled, say in a case of multi-processing, object.setstate(state) is called instead of init, this means we need to update page refcount as if constructor had been called

Source code in tablite/base.py

def __setstate__(self, state):
    """
    when an object is unpickled, say in a case of multi-processing,
    object.__setstate__(state) is called instead of __init__, this means
    we need to update page refcount as if constructor had been called
    """
    self.__dict__.update(state)

    self._incr_refcount()

`tablite.base.Page.next_id(path)` `classmethod`

Source code in tablite/base.py

@classmethod
def next_id(cls, path):
    path = Path(path)

    while True:
        _id = f"{os.getpid()}-{next(cls.ids)}"
        _path = path / "pages" / f"{_id}.npy"

        if not _path.exists():
            break  # make sure we don't override existing pages if they are created outside of main thread

    return _id

`tablite.base.Page.len()`

Source code in tablite/base.py

def __len__(self):
    return self.len

`tablite.base.Page.repr() -> str`

Source code in tablite/base.py

def __repr__(self) -> str:
    try:
        return f"{self.__class__.__name__}({self.path}, {self.get()})"
    except FileNotFoundError as e:
        return f"{self.__class__.__name__}({self.path}, <{type(e).__name__}>)"
    except Exception as e:
        return f"{self.__class__.__name__}({self.path}, <{e}>)"

`tablite.base.Page.hash() -> int`

Source code in tablite/base.py

def __hash__(self) -> int:
    return hash(self.path)

`tablite.base.Page.owns()`

Source code in tablite/base.py

def owns(self):
    parts = self.path.parts

    return all((p in parts for p in Path(Config.pid).parts))

`tablite.base.Page.del()`

When python's reference count for an object is 0, python uses it's garbage collector to remove the object and free the memory. As tablite tables have columns and columns have page and pages have data stored on disk, the space on disk must be freed up as well. This del override assures the cleanup of stored data.

Source code in tablite/base.py

def __del__(self):
    """When python's reference count for an object is 0, python uses
    it's garbage collector to remove the object and free the memory.
    As tablite tables have columns and columns have page and pages have
    data stored on disk, the space on disk must be freed up as well.
    This __del__ override assures the cleanup of stored data.
    """
    if not self.owns():
        return

    refcount = self.refcounts[self.path] = max(
        self.refcounts.get(self.path, 0) - 1, 0
    )

    if refcount > 0:
        return

    if self.autocleanup:
        self.path.unlink(True)

    del self.refcounts[self.path]

`tablite.base.Page.get()`

loads stored data

RETURNS	DESCRIPTION
	np.ndarray: stored data.

Source code in tablite/base.py

def get(self):
    """loads stored data

    Returns:
        np.ndarray: stored data.
    """
    array = load_numpy(self.path)
    return MetaArray(array, array.dtype, py_dtype=self.dtype)

`tablite.base.Column(path, value=None)`

Bases: object

Create Column

PARAMETER	DESCRIPTION
`path`	path of table.yml (defaults: Config.pid_dir) TYPE: `Path`
`value`	Data to store. Defaults to None. TYPE: `Iterable` DEFAULT: `None`

Source code in tablite/base.py

def __init__(self, path, value=None) -> None:
    """Create Column

    Args:
        path (Path): path of table.yml (defaults: Config.pid_dir)
        value (Iterable, optional): Data to store. Defaults to None.
    """
    self.path = path
    self.pages = []  # keeps pointers to instances of Page
    if value is not None:
        self.extend(value)

Attributes

`tablite.base.Column.path = path` `instance-attribute`

`tablite.base.Column.pages = []` `instance-attribute`

Functions

`tablite.base.Column.len()`

Source code in tablite/base.py

def __len__(self):
    return sum(len(p) for p in self.pages)

`tablite.base.Column.repr()`

Source code in tablite/base.py

def __repr__(self):
    return f"{self.__class__.__name__}({self.path}, {self[:]})"

`tablite.base.Column.repaginate()`

resizes pages to Config.PAGE_SIZE

Source code in tablite/base.py

def repaginate(self):
    """resizes pages to Config.PAGE_SIZE"""
    from tablite.nimlite import repaginate as _repaginate

    _repaginate(self)

`tablite.base.Column.extend(value)`

extends the column.

PARAMETER	DESCRIPTION
`value`	data TYPE: `ndarray`

Source code in tablite/base.py

def extend(self, value):  # USER FUNCTION.
    """extends the column.

    Args:
        value (np.ndarray): data
    """
    if isinstance(value, Column):
        self.pages.extend(value.pages[:])
        return
    elif isinstance(value, np.ndarray):
        pass
    elif isinstance(value, (list, tuple)):
        value = list_to_np_array(value)
    else:
        raise TypeError(f"Cannot extend Column with {type(value)}")
    type_check(value, np.ndarray)
    for array in self._paginate(value):
        self.pages.append(Page(path=self.path, array=array))

`tablite.base.Column.clear()`

clears the column. Like list().clear()

Source code in tablite/base.py

def clear(self):
    """
    clears the column. Like list().clear()
    """
    self.pages.clear()

`tablite.base.Column.getpages(item)`

public non-user function to identify any pages + slices of data to be retrieved given a slice (item)

PARAMETER	DESCRIPTION
`item`	target slice of data TYPE: `(int, slice)`

RETURNS	DESCRIPTION
	list of pages/np.ndarrays.

Example: [Page(1), Page(2), np.ndarray([4,5,6], int64)] This helps, for example when creating a copy, as the copy can reference the pages 1 and 2 and only need to store the np.ndarray that is unique to it.

Source code in tablite/base.py

def getpages(self, item):
    """public non-user function to identify any pages + slices
    of data to be retrieved given a slice (item)

    Args:
        item (int,slice): target slice of data

    Returns:
        list of pages/np.ndarrays.

    Example: [Page(1), Page(2), np.ndarray([4,5,6], int64)]
    This helps, for example when creating a copy, as the copy
    can reference the pages 1 and 2 and only need to store
    the np.ndarray that is unique to it.
    """
    # internal function
    if isinstance(item, int):
        if item < 0:
            item = len(self) + item
        item = slice(item, item + 1, 1)

    type_check(item, slice)
    is_reversed = False if (item.step is None or item.step > 0) else True

    length = len(self)
    scan_item = slice(*item.indices(length))
    range_item = range(*item.indices(length))

    pages = []
    start, end = 0, 0
    for page in self.pages:
        start, end = end, end + page.len
        if is_reversed:
            if start > scan_item.start:
                break
            if end < scan_item.stop:
                continue
        else:
            if start > scan_item.stop:
                break
            if end < scan_item.start:
                continue
        ro = intercept(range(start, end), range_item)
        if len(ro) == 0:
            continue
        elif len(ro) == page.len:  # share the whole immutable page
            pages.append(page)
        else:  # fetch the slice and filter it.
            search_slice = slice(ro.start - start, ro.stop - start, ro.step)
            np_arr = load_numpy(page.path)
            match = np_arr[search_slice]
            pages.append(match)

    if is_reversed:
        pages.reverse()
        for ix, page in enumerate(pages):
            if isinstance(page, SimplePage):
                data = page.get()
                pages[ix] = np.flip(data)
            else:
                pages[ix] = np.flip(page)

    return pages

`tablite.base.Column.iter_by_page()`

iterates over the column, page by page. This method minimizes the number of reads.

RETURNS	DESCRIPTION
	generator of tuple: start: int end: int data: np.ndarray

Source code in tablite/base.py

def iter_by_page(self):
    """iterates over the column, page by page.
    This method minimizes the number of reads.

    Returns:
        generator of tuple:
            start: int
            end: int
            data: np.ndarray
    """
    start, end = 0, 0
    for page in self.pages:
        start, end = end, end + page.len
        yield start, end, page

`tablite.base.Column.getitem(item)`

gets numpy array.

PARAMETER	DESCRIPTION
`item`	slice of column TYPE: `int OR slice`

RETURNS	DESCRIPTION
	np.ndarray: results as numpy array.

Remember:

>>> R = np.array([0,1,2,3,4,5])
>>> R[3]
3
>>> R[3:4]
array([3])

Source code in tablite/base.py

def __getitem__(self, item):  # USER FUNCTION.
    """gets numpy array.

    Args:
        item (int OR slice): slice of column

    Returns:
        np.ndarray: results as numpy array.

    Remember:
    ```
    >>> R = np.array([0,1,2,3,4,5])
    >>> R[3]
    3
    >>> R[3:4]
    array([3])
    ```
    """
    result = []
    for element in self.getpages(item):
        if isinstance(element, SimplePage):
            result.append(element.get())
        else:
            result.append(element)

    if result:
        arr = np_type_unify(result)
    else:
        arr = np.array([])

    if isinstance(item, int):
        if len(arr) == 0:
            raise IndexError(
                f"index {item} is out of bounds for axis 0 with size {len(self)}"
            )
        return numpy_to_python(arr[0])
    else:
        return arr

`tablite.base.Column.setitem(key, value)`

sets values.

PARAMETER	DESCRIPTION
`key`	selector TYPE: `(int, slice)`
`value`	values to insert TYPE: `any`

RAISES	DESCRIPTION
`KeyError`	Following normal slicing rules

Source code in tablite/base.py

def __setitem__(self, key, value):  # USER FUNCTION.
    """sets values.

    Args:
        key (int,slice): selector
        value (any): values to insert

    Raises:
        KeyError: Following normal slicing rules
    """
    if isinstance(key, int):
        self._setitem_integer_key(key, value)

    elif isinstance(key, slice):
        if not isinstance(value, np.ndarray):
            value = list_to_np_array(value)
        type_check(value, np.ndarray)

        if key.start is None and key.stop is None and key.step in (None, 1):
            self._setitem_replace_all(key, value)
        elif key.start is not None and key.stop is None and key.step in (None, 1):
            self._setitem_extend(key, value)
        elif key.stop is not None and key.start is None and key.step in (None, 1):
            self._setitem_prextend(key, value)
        elif (
            key.step in (None, 1) and key.start is not None and key.stop is not None
        ):
            self._setitem_insert(key, value)
        elif key.step not in (None, 1):
            self._setitem_update(key, value)
        else:
            raise KeyError(f"bad key: {key}")
    else:
        raise KeyError(f"bad key: {key}")

`tablite.base.Column.delitem(key)`

deletes items selected by key

PARAMETER	DESCRIPTION
`key`	selector TYPE: `(int, slice)`

RAISES	DESCRIPTION
`KeyError`	following normal slicing rules.

Source code in tablite/base.py

def __delitem__(self, key):  # USER FUNCTION
    """deletes items selected by key

    Args:
        key (int,slice): selector

    Raises:
        KeyError: following normal slicing rules.
    """
    if isinstance(key, int):
        self._del_by_int(key)
    elif isinstance(key, slice):
        self._del_by_slice(key)
    else:
        raise KeyError(f"bad key: {key}")

`tablite.base.Column.get_by_indices(indices: Union[List[int], np.ndarray]) -> np.ndarray`

retrieves values from column given a set of indices.

PARAMETER	DESCRIPTION
`indices`	targets TYPE: `array`

This method uses np.take, is faster than iterating over rows. Examples:

>>> indices = np.array(list(range(3,700_700, 426)))
>>> arr = np.array(list(range(2_000_000)))
Pythonic:
>>> [v for i,v in enumerate(arr) if i in indices]
Numpyionic:
>>> np.take(arr, indices)

Source code in tablite/base.py

def get_by_indices(self, indices: Union[List[int], np.ndarray]) -> np.ndarray:
    """retrieves values from column given a set of indices.

    Args:
        indices (np.array): targets

    This method uses np.take, is faster than iterating over rows.
    Examples:
    ```
    >>> indices = np.array(list(range(3,700_700, 426)))
    >>> arr = np.array(list(range(2_000_000)))
    Pythonic:
    >>> [v for i,v in enumerate(arr) if i in indices]
    Numpyionic:
    >>> np.take(arr, indices)
    ```
    """
    type_check(indices, np.ndarray)

    dtypes = set()
    values = np.empty(
        indices.shape, dtype=object
    )  # placeholder for the indexed values.

    for start, end, page in self.iter_by_page():
        range_match = np.asarray(((indices >= start) & (indices < end)) | (indices == -1)).nonzero()[0]
        if len(range_match):
            # only fetch the data if there's a range match!
            data = page.get() 
            sub_index = np.take(indices, range_match)
            # sub_index2 otherwise will raise index error where len(data) > (-1 - start)
            # so the clause below is required:
            if len(data) > (-1 - start):
                sub_index = np.where(sub_index == -1, -1, sub_index - start)
            arr = np.take(data, sub_index)
            dtypes.add(arr.dtype)
            np.put(values, range_match, arr)

    if len(dtypes) == 1:  # simplify the datatype
        dtype = next(iter(dtypes))
        values = np.array(values, dtype=dtype)
    return values

`tablite.base.Column.iter()`

Source code in tablite/base.py

def __iter__(self):  # USER FUNCTION.
    for page in self.pages:
        data = page.get()
        for value in data:
            yield value

`tablite.base.Column.eq(other)`

compares two columns. Like list1 == list2

Source code in tablite/base.py

def __eq__(self, other):  # USER FUNCTION.
    """
    compares two columns. Like `list1 == list2`
    """
    if len(self) != len(other):  # quick cheap check.
        return False

    if isinstance(other, (list, tuple)):
        return all(a == b for a, b in zip(self[:], other))

    elif isinstance(other, Column):
        if self.pages == other.pages:  # special case.
            return True

        # are the pages of same size?
        if len(self.pages) == len(other.pages):
            if [p.len for p in self.pages] == [p.len for p in other.pages]:
                for a, b in zip(self.pages, other.pages):
                    if not (a.get() == b.get()).all():
                        return False
                return True
        # to bad. Element comparison it is then:
        for a, b in zip(iter(self), iter(other)):
            if a != b:
                return False
        return True

    elif isinstance(other, np.ndarray):
        start, end = 0, 0
        for p in self.pages:
            start, end = end, end + p.len
            if not (p.get() == other[start:end]).all():
                return False
        return True
    else:
        raise TypeError(f"Cannot compare {self.__class__} with {type(other)}")

`tablite.base.Column.ne(other)`

compares two columns. Like list1 != list2

Source code in tablite/base.py

def __ne__(self, other):  # USER FUNCTION
    """
    compares two columns. Like `list1 != list2`
    """
    if len(self) != len(other):  # quick cheap check.
        return True

    if isinstance(other, (list, tuple)):
        return any(a != b for a, b in zip(self[:], other))

    elif isinstance(other, Column):
        if self.pages == other.pages:  # special case.
            return False

        # are the pages of same size?
        if len(self.pages) == len(other.pages):
            if [p.len for p in self.pages] == [p.len for p in other.pages]:
                for a, b in zip(self.pages, other.pages):
                    if not (a.get() == b.get()).all():
                        return True
                return False
        # to bad. Element comparison it is then:
        for a, b in zip(iter(self), iter(other)):
            if a != b:
                return True
        return False

    elif isinstance(other, np.ndarray):
        start, end = 0, 0
        for p in self.pages:
            start, end = end, end + p.len
            if (p.get() != other[start:end]).any():
                return True
        return False
    else:
        raise TypeError(f"Cannot compare {self.__class__} with {type(other)}")

`tablite.base.Column.copy()`

returns deep=copy of Column

RETURNS	DESCRIPTION
	Column

Source code in tablite/base.py

def copy(self):
    """returns deep=copy of Column

    Returns:
        Column
    """
    cp = Column(path=self.path)
    cp.pages = self.pages[:]
    return cp

`tablite.base.Column.copy()`

see copy

Source code in tablite/base.py

def __copy__(self):
    """see copy"""
    return self.copy()

`tablite.base.Column.imul(other)`

Repeats instance of column N times. Like list() * N

Example:

>>> one = Column(data=[1,2])
>>> one *= 5
>>> one
[1,2, 1,2, 1,2, 1,2, 1,2]

Source code in tablite/base.py

def __imul__(self, other):
    """
    Repeats instance of column N times. Like list() * N

    Example:
    ```
    >>> one = Column(data=[1,2])
    >>> one *= 5
    >>> one
    [1,2, 1,2, 1,2, 1,2, 1,2]
    ```
    """
    if not (isinstance(other, int) and other > 0):
        raise TypeError(
            f"a column can be repeated an integer number of times, not {type(other)} number of times"
        )
    self.pages = self.pages[:] * other
    return self

`tablite.base.Column.mul(other)`

Repeats instance of column N times. Like list() * N

Example:

>>> one = Column(data=[1,2])
>>> two = one * 5
>>> two
[1,2, 1,2, 1,2, 1,2, 1,2]

Source code in tablite/base.py

def __mul__(self, other):
    """
    Repeats instance of column N times. Like list() * N

    Example:
    ```
    >>> one = Column(data=[1,2])
    >>> two = one * 5
    >>> two
    [1,2, 1,2, 1,2, 1,2, 1,2]
    ```
    """
    if not isinstance(other, int):
        raise TypeError(
            f"a column can be repeated an integer number of times, not {type(other)} number of times"
        )
    cp = self.copy()
    cp *= other
    return cp

`tablite.base.Column.iadd(other)`

Source code in tablite/base.py

def __iadd__(self, other):
    if isinstance(other, (list, tuple)):
        other = list_to_np_array(other)
        self.extend(other)
    elif isinstance(other, Column):
        self.pages.extend(other.pages[:])
    else:
        raise TypeError(f"{type(other)} not supported.")
    return self

`tablite.base.Column.contains(item)`

determines if item is in the Column. Similar to 'x' in ['a','b','c'] returns boolean

PARAMETER	DESCRIPTION
`item`	value to search for TYPE: `any`

RETURNS	DESCRIPTION
`bool`	True if item exists in column.

Source code in tablite/base.py

def __contains__(self, item):
    """determines if item is in the Column.
    Similar to `'x' in ['a','b','c']`
    returns boolean

    Args:
        item (any): value to search for

    Returns:
        bool: True if item exists in column.
    """
    for page in set(self.pages):
        if item in page.get():  # x in np.ndarray([...]) uses np.any(arr, value)
            return True
    return False

`tablite.base.Column.remove_all(*values)`

removes all values of values

Source code in tablite/base.py

def remove_all(self, *values):
    """
    removes all values of `values`
    """
    type_check(values, tuple)
    if isinstance(values[0], tuple):
        values = values[0]
    to_remove = list_to_np_array(values)
    for index, page in enumerate(self.pages):
        data = page.get()
        bitmask = np.isin(data, to_remove)  # identify elements to remove.
        if bitmask.any():
            bitmask = np.invert(bitmask)  # turn bitmask around to keep.
            new_data = np.compress(bitmask, data)
            new_page = Page(self.path, new_data)
            self.pages[index] = new_page

`tablite.base.Column.replace(mapping)`

replaces values using a mapping.

PARAMETER	DESCRIPTION
`mapping`	{value to replace: new value, ...} TYPE: `dict`

Example:

>>> t = Table(columns={'A': [1,2,3,4]})
>>> t['A'].replace({2:20,4:40})
>>> t[:]
np.ndarray([1,20,3,40])

Source code in tablite/base.py

def replace(self, mapping):
    """
    replaces values using a mapping.

    Args:
        mapping (dict): {value to replace: new value, ...}

    Example:
    ```
    >>> t = Table(columns={'A': [1,2,3,4]})
    >>> t['A'].replace({2:20,4:40})
    >>> t[:]
    np.ndarray([1,20,3,40])
    ```
    """
    type_check(mapping, dict)
    to_replace = np.array(list(mapping.keys()))
    for index, page in enumerate(self.pages):
        data = page.get()
        bitmask = np.isin(data, to_replace)  # identify elements to replace.
        if bitmask.any():
            warray = np.compress(bitmask, data)
            py_dtype = page.dtype
            for ix, v in enumerate(warray):
                old_py_val = numpy_to_python(v)
                new_py_val = mapping[old_py_val]
                old_dt = type(old_py_val)
                new_dt = type(new_py_val)

                warray[ix] = new_py_val

                py_dtype[new_dt] = py_dtype.get(new_dt, 0) + 1
                py_dtype[old_dt] = py_dtype.get(old_dt, 0) - 1

                if py_dtype[old_dt] <= 0:
                    del py_dtype[old_dt]

            data[bitmask] = warray
            self.pages[index] = Page(path=self.path, array=data)

`tablite.base.Column.types()`

returns dict with python datatypes

RETURNS	DESCRIPTION
`dict`	frequency of occurrence of python datatypes

Source code in tablite/base.py

def types(self):
    """
    returns dict with python datatypes

    Returns:
        dict: frequency of occurrence of python datatypes
    """
    d = Counter()
    for page in self.pages:
        assert isinstance(page.dtype, dict)
        d += page.dtype
    return dict(d)

`tablite.base.Column.index()`

returns dict with { unique entry : list of indices }

example:

>>> c = Column(data=['a','b','a','c','b'])
>>> c.index()
{'a':[0,2], 'b': [1,4], 'c': [3]}

Source code in tablite/base.py

def index(self):
    """
    returns dict with { unique entry : list of indices }

    example:
    ```
    >>> c = Column(data=['a','b','a','c','b'])
    >>> c.index()
    {'a':[0,2], 'b': [1,4], 'c': [3]}
    ```
    """
    d = defaultdict(list)
    for ix, v in enumerate(self.__iter__()):
        d[v].append(ix)
    return dict(d)

`tablite.base.Column.unique()`

returns unique list of values.

example:

>>> c = Column(data=['a','b','a','c','b'])
>>> c.unqiue()
['a','b','c']

Source code in tablite/base.py

def unique(self):
    """
    returns unique list of values.

    example:
    ```
    >>> c = Column(data=['a','b','a','c','b'])
    >>> c.unqiue()
    ['a','b','c']
    ```
    """
    arrays = []
    for page in set(self.pages):
        try:  # when it works, numpy is fast...
            arrays.append(np.unique(page.get()))
        except TypeError:  # ...but np.unique cannot handle Nones.
            arrays.append(multitype_set(page.get()))
    union = np_type_unify(arrays)
    try:
        return np.unique(union)
    except MemoryError:
        return np.array(set(union))
    except TypeError:
        return multitype_set(union)

`tablite.base.Column.histogram()`

returns 2 arrays: unique elements and count of each element

example:

>>> c = Column(data=['a','b','a','c','b'])
>>> c.histogram()
{'a':2,'b':2,'c':1}

Source code in tablite/base.py

def histogram(self):
    """
    returns 2 arrays: unique elements and count of each element

    example:
    ```
    >>> c = Column(data=['a','b','a','c','b'])
    >>> c.histogram()
    {'a':2,'b':2,'c':1}
    ```
    """
    d = defaultdict(int)
    for page in self.pages:
        try:
            uarray, carray = np.unique(page.get(), return_counts=True)
        except TypeError:
            uarray = page.get()
            carray = repeat(1, len(uarray))

        for i, c in zip(uarray, carray):
            v = numpy_to_python(i)
            d[(type(v), v)] += numpy_to_python(c)
    u = [v for _, v in d.keys()]
    c = list(d.values())
    return u, c  # unique, counts

`tablite.base.Column.statistics()`

provides summary statistics.

RETURNS	DESCRIPTION
`dict`	returns dict with:
	min (int/float, length of str, date)
	max (int/float, length of str, date)
	mean (int/float, length of str, date)
	median (int/float, length of str, date)
	stdev (int/float, length of str, date)
	mode (int/float, length of str, date)
	distinct (int/float, length of str, date)
	iqr (int/float, length of str, date)
	sum (int/float, length of str, date)
	histogram (see .histogram)

Source code in tablite/base.py

def statistics(self):
    """provides summary statistics.

    Returns:
        dict: returns dict with:
        - min (int/float, length of str, date)
        - max (int/float, length of str, date)
        - mean (int/float, length of str, date)
        - median (int/float, length of str, date)
        - stdev (int/float, length of str, date)
        - mode (int/float, length of str, date)
        - distinct (int/float, length of str, date)
        - iqr (int/float, length of str, date)
        - sum (int/float, length of str, date)
        - histogram (see .histogram)
    """
    values, counts = self.histogram()
    return summary_statistics(values, counts)

`tablite.base.Column.count(item)`

counts appearances of item in column.

Note that in python, True == 1 and False == 0, whereby the following difference occurs:

in python:

>>> L = [1, True]
>>> L.count(True)
2

in tablite:

>>> t = Table({'L': [1,True]})
>>> t['L'].count(True)
1

PARAMETER	DESCRIPTION
`item`	target item TYPE: `Any`

RETURNS	DESCRIPTION
`int`	number of occurrences of item.

Source code in tablite/base.py

def count(self, item):
    """counts appearances of item in column.

    Note that in python, `True == 1` and `False == 0`,
    whereby the following difference occurs:

    in python:
    ```
    >>> L = [1, True]
    >>> L.count(True)
    2
    ```
    in tablite:
    ```
    >>> t = Table({'L': [1,True]})
    >>> t['L'].count(True)
    1
    ```

    Args:
        item (Any): target item

    Returns:
        int: number of occurrences of item.
    """
    result = 0
    for page in self.pages:
        data = page.get()
        if data.dtype != "O":
            result += np.nonzero(page.get() == item)[0].shape[0]
            # what happens here ---^ below:
            # arr = page.get()
            # >>> arr
            # array([1,2,3,4,3], int64)
            # >>> (arr == 3)
            # array([False, False,  True, False,  True])
            # >>> np.nonzero(arr==3)
            # (array([2,4], dtype=int64), )  <-- tuple!
            # >>> np.nonzero(page.get() == item)[0]
            # array([2,4])
            # >>> np.nonzero(page.get() == item)[0].shape
            # (2, )
            # >>> np.nonzero(page.get() == item)[0].shape[0]
            # 2
        else:
            result += sum(1 for i in data if type(i) == type(item) and i == item)
    return result

`tablite.base.BaseTable(columns: [dict, None] = None, headers: [list, None] = None, rows: [list, None] = None, _path: [Path, None] = None)`

Bases: object

creates Table

PARAMETER	DESCRIPTION
`EITHER`	columns (dict, optional): dict with column names as keys, values as lists. Example: t = Table(columns={"a": [1, 2], "b": [3, 4]})
`_path`	path to main process working directory. TYPE: `Path` DEFAULT: `None`

Source code in tablite/base.py

def __init__(
    self,
    columns: [dict, None] = None,
    headers: [list, None] = None,
    rows: [list, None] = None,
    _path: [Path, None] = None,
) -> None:
    """creates Table

    Args:
        EITHER:
            columns (dict, optional): dict with column names as keys, values as lists.
            Example: t = Table(columns={"a": [1, 2], "b": [3, 4]})
        OR
            headers (list of strings, optional): list of column names.
            rows (list of tuples or lists, optional): values for columns
            Example: t = Table(headers=["a", "b"], rows=[[1,3], [2,4]])

        _path (pathlib.Path, optional): path to main process working directory.
    """
    if _path is None:
        if self._pid_dir is None:
            self._pid_dir = Path(Config.workdir) / Config.pid
            if not self._pid_dir.exists():
                self._pid_dir.mkdir()
                (self._pid_dir / "pages").mkdir()
            register(self._pid_dir)

        _path = Path(self._pid_dir)
        # if path exists under the given PID it will be overwritten.
        # this can only happen if the process previously was SIGKILLed.
    type_check(_path, Path)
    self.path = _path  # filename used during multiprocessing.
    self.columns = {}  # maps colunn names to instances of Column.

    # user friendly features.
    if columns and any((headers, rows)):
        raise ValueError("Either columns as dict OR headers and rows. Not both.")

    if headers and rows:
        rotated = list(zip(*rows))
        columns = {k: v for k, v in zip(headers, rotated)}

    if columns:
        type_check(columns, dict)
        for k, v in columns.items():
            self.__setitem__(k, v)

Attributes

`tablite.base.BaseTable.path = _path` `instance-attribute`

`tablite.base.BaseTable.columns = {}` `instance-attribute`

`tablite.base.BaseTable.rows` `property`

enables row based iteration in python types.

Example:

for row in Table.rows:
    print(row)

Yields: tuple: values is same order as columns.

Functions

`tablite.base.BaseTable.str()`

Source code in tablite/base.py

def __str__(self):  # USER FUNCTION.
    return f"{self.__class__.__name__}({len(self.columns):,} columns, {len(self):,} rows)"

`tablite.base.BaseTable.repr()`

Source code in tablite/base.py

def __repr__(self):
    return self.__str__()

`tablite.base.BaseTable.nbytes()`

finds the total bytes of the table on disk

RETURNS	DESCRIPTION
`tuple`	int: real bytes used on disk int: total bytes used if flattened

Source code in tablite/base.py

def nbytes(self):  # USER FUNCTION.
    """finds the total bytes of the table on disk

    Returns:
        tuple:
            int: real bytes used on disk
            int: total bytes used if flattened
    """
    real = {}
    total = 0
    for column in self.columns.values():
        for page in set(column.pages):
            real[page] = page.path.stat().st_size
        for page in column.pages:
            total += real[page]
    return sum(real.values()), total

`tablite.base.BaseTable.items()`

returns table as dict

RETURNS	DESCRIPTION
`dict`	Table as dict `{column_name: [values], ...}`

Source code in tablite/base.py

def items(self):  # USER FUNCTION.
    """returns table as dict

    Returns:
        dict: Table as dict `{column_name: [values], ...}`
    """
    return {
        name: column[:].tolist() for name, column in self.columns.items()
    }.items()

`tablite.base.BaseTable.delitem(key)`

Examples:

>>> del table['a']  # removes column 'a'
>>> del table[-3:]  # removes last 3 rows from all columns.

Source code in tablite/base.py

def __delitem__(self, key):  # USER FUNCTION.
    """
    Examples:
    ```
    >>> del table['a']  # removes column 'a'
    >>> del table[-3:]  # removes last 3 rows from all columns.
    ```
    """
    if isinstance(key, (int, slice)):
        for column in self.columns.values():
            del column[key]
    elif key in self.columns:
        del self.columns[key]
    else:
        raise KeyError(f"Key not found: {key}")

`tablite.base.BaseTable.setitem(key, value)`

table behaves like a dict. Args: key (str or hashable): column name value (iterable): list, tuple or nd.array with values.

As Table now accepts the keyword columns as a dict:

>>> t = Table(columns={'b':[4,5,6], 'c':[7,8,9]})

and the header/data combinations:

>>> t = Table(header=['b','c'], data=[[4,5,6],[7,8,9]])

This has the side-benefit that tuples now can be used as headers.

Source code in tablite/base.py

def __setitem__(self, key, value):  # USER FUNCTION
    """table behaves like a dict.
    Args:
        key (str or hashable): column name
        value (iterable): list, tuple or nd.array with values.

    As Table now accepts the keyword `columns` as a dict:
    ```
    >>> t = Table(columns={'b':[4,5,6], 'c':[7,8,9]})
    ```
    and the header/data combinations:
    ```
    >>> t = Table(header=['b','c'], data=[[4,5,6],[7,8,9]])
    ```
    This has the side-benefit that tuples now can be used as headers.
    """
    if value is None:
        self.columns[key] = Column(self.path, value=None)
    elif isinstance(value, (list, tuple)):
        value = list_to_np_array(value)
        self.columns[key] = Column(self.path, value)
    elif isinstance(value, (np.ndarray)):
        self.columns[key] = Column(self.path, value)
    elif isinstance(value, Column):
        self.columns[key] = value
    else:
        raise TypeError(f"{type(value)} not supported.")

`tablite.base.BaseTable.getitem(keys)`

Enables selection of columns and rows

PARAMETER	DESCRIPTION
`keys`	TYPE: `column name, integer or slice`
`Examples`
`>>>`	10] selects first 10 rows from all columns TYPE: `table[`
`>>>`	20:3] selects column 'b' and 'c' and 'a' twice for a slice. TYPE: `table['b', 'a', 'a', 'c', 2`

Raises: KeyError: if key is not found. TypeError: if key is not a string, integer or slice.

RETURNS	DESCRIPTION
`Table`	returns columns in same order as selection.

Source code in tablite/base.py

def __getitem__(self, keys):  # USER FUNCTION
    """
    Enables selection of columns and rows

    Args:
        keys (column name, integer or slice):
        Examples:
        ```
        >>> table['a']                        selects column 'a'
        >>> table[3]                          selects row 3 as a tuple.
        >>> table[:10]                        selects first 10 rows from all columns
        >>> table['a','b', slice(3,20,2)]     selects a slice from columns 'a' and 'b'
        >>> table['b', 'a', 'a', 'c', 2:20:3] selects column 'b' and 'c' and 'a' twice for a slice.
        >>> table[('b', 'a', 'a', 'c')]       selects columns 'b', 'a', 'a', and 'c' using a tuple.
        ```
    Raises:
        KeyError: if key is not found.
        TypeError: if key is not a string, integer or slice.

    Returns:
        Table: returns columns in same order as selection.
    """

    if not isinstance(keys, tuple):
        if isinstance(keys, list):
            keys = tuple(keys)
        else:
            keys = (keys,)
    if isinstance(keys[0], tuple):
        keys = tuple(list(chain(*keys)))

    integers = [i for i in keys if isinstance(i, int)]
    if len(integers) == len(keys) == 1:  # return a single tuple.
        keys = [slice(keys[0])]

    column_names = [i for i in keys if isinstance(i, str)]
    column_names = list(self.columns) if not column_names else column_names
    not_found = [name for name in column_names if name not in self.columns]
    if not_found:
        raise KeyError(f"keys not found: {', '.join(not_found)}")

    slices = [i for i in keys if isinstance(i, slice)]
    slc = slice(0, len(self)) if not slices else slices[0]

    if (
        len(slices) == 0 and len(column_names) == 1
    ):  # e.g. tbl['a'] or tbl['a'][:10]
        col = self.columns[column_names[0]]
        if slices:
            return col[slc]  # return slice from column as list of values
        else:
            return col  # return whole column

    elif len(integers) == 1:  # return a single tuple.
        row_no = integers[0]
        slc = slice(row_no, row_no + 1)
        return tuple(self.columns[name][slc].tolist()[0] for name in column_names)

    elif not slices:  # e.g. new table with N whole columns.
        return self.__class__(
            columns={name: self.columns[name] for name in column_names}
        )

    else:  # e.g. new table from selection of columns and slices.
        t = self.__class__()
        for name in column_names:
            column = self.columns[name]

            new_column = Column(t.path)  # create new Column.
            for item in column.getpages(slc):
                if isinstance(item, np.ndarray):
                    new_column.extend(item)  # extend subslice (expensive)
                elif isinstance(item, SimplePage):
                    new_column.pages.append(item)  # extend page (cheap)
                else:
                    raise TypeError(f"Bad item: {item}")

            # below:
            # set the new column directly on t.columns.
            # Do not use t[name] as that triggers __setitem__ again.
            t.columns[name] = new_column

        return t

`tablite.base.BaseTable.len()`

Source code in tablite/base.py

def __len__(self):  # USER FUNCTION.
    if not self.columns:
        return 0
    return max(len(c) for c in self.columns.values())

`tablite.base.BaseTable.eq(other) -> bool`

Determines if two tables have identical content.

PARAMETER	DESCRIPTION
`other`	table for comparison TYPE: `Table`

RETURNS	DESCRIPTION
`bool`	True if tables are identical. TYPE: `bool`

Source code in tablite/base.py

def __eq__(self, other) -> bool:  # USER FUNCTION.
    """Determines if two tables have identical content.

    Args:
        other (Table): table for comparison

    Returns:
        bool: True if tables are identical.
    """
    if isinstance(other, dict):
        return self.items() == other.items()
    if not isinstance(other, BaseTable):
        return False
    if id(self) == id(other):
        return True
    if len(self) != len(other):
        return False
    if len(self) == len(other) == 0:
        return True
    if self.columns.keys() != other.columns.keys():
        return False
    for name, col in self.columns.items():
        if not (col == other.columns[name]):
            return False
    return True

`tablite.base.BaseTable.clear()`

clears the table. Like dict().clear()

Source code in tablite/base.py

def clear(self):  # USER FUNCTION.
    """clears the table. Like dict().clear()"""
    self.columns.clear()

`tablite.base.BaseTable.save(path, compression_method=zipfile.ZIP_DEFLATED, compression_level=1)`

saves table to compressed tpz file.

PARAMETER	DESCRIPTION
`path`	file destination. TYPE: `Path`
`compression_method`	See zipfile compression methods. Defaults to ZIP_DEFLATED. DEFAULT: `ZIP_DEFLATED`
`compression_level`	See zipfile compression levels. Defaults to 1. DEFAULT: `1`

The file format is as follows: .tpz is a gzip archive with table metadata captured as table.yml and the necessary set of pages saved as .npy files.

The zip contains table.yml which provides an overview of the data:

--------------------------------------
%YAML 1.2                              yaml version
columns:                               start of columns section.
    name: “列 1”                       name of column 1.
        pages: [p1b1, p1b2]            list of pages in column 1.
    name: “列 2”                       name of column 2
        pages: [p2b1, p2b2]            list of pages in column 2.
----------------------------------------

Source code in tablite/base.py

def save(
    self, path, compression_method=zipfile.ZIP_DEFLATED, compression_level=1
):  # USER FUNCTION.
    """saves table to compressed tpz file.

    Args:
        path (Path): file destination.
        compression_method: See zipfile compression methods. Defaults to ZIP_DEFLATED.
        compression_level: See zipfile compression levels. Defaults to 1.
        The default settings produce 80% compression at 10% slowdown.

    The file format is as follows:
    .tpz is a gzip archive with table metadata captured as table.yml
    and the necessary set of pages saved as .npy files.

    The zip contains table.yml which provides an overview of the data:
    ```
    --------------------------------------
    %YAML 1.2                              yaml version
    columns:                               start of columns section.
        name: “列 1”                       name of column 1.
            pages: [p1b1, p1b2]            list of pages in column 1.
        name: “列 2”                       name of column 2
            pages: [p2b1, p2b2]            list of pages in column 2.
    ----------------------------------------
    ```
    """
    if isinstance(path, str):
        path = Path(path)
    type_check(path, Path)
    if path.is_dir():
        raise TypeError(f"filename needed: {path}")
    if path.suffix != ".tpz":
        path = path.parent / (path.parts[-1] + ".tpz")

    # create yaml document
    _page_counter = 0
    d = {}
    cols = {}
    for name, col in self.columns.items():
        type_check(col, Column)
        cols[name] = {"pages": [p.path.name for p in col.pages]}
        _page_counter += len(col.pages)
    d["columns"] = cols
    yml = yaml.safe_dump(
        d, sort_keys=False, allow_unicode=True, default_flow_style=None
    )

    _file_counter = 0
    with zipfile.ZipFile(
        path, "w", compression=compression_method, compresslevel=compression_level
    ) as f:
        log.debug(f"writing .tpz to {path} with\n{yml}")
        f.writestr("table.yml", yml)
        for name, col in self.columns.items():
            for page in set(
                col.pages
            ):  # set of pages! remember t *= 1000 repeats t 1000x
                with open(page.path, "rb", buffering=0) as raw_io:
                    f.writestr(page.path.name, raw_io.read())
                _file_counter += 1
                log.debug(f"adding Page {page.path}")

        _fields = len(self) * len(self.columns)
        _avg = _fields // _page_counter
        log.debug(
            f"Wrote {_fields:,} on {_page_counter:,} pages in {_file_counter} files: {_avg} fields/page"
        )

`tablite.base.BaseTable.load(path, tqdm=_tqdm)` `classmethod`

loads a table from .tpz file. See also Table.save for details on the file format.

PARAMETER	DESCRIPTION
`path`	source file TYPE: `Path`

RETURNS	DESCRIPTION
`Table`	table in read-only mode.

Source code in tablite/base.py

@classmethod
def load(cls, path, tqdm=_tqdm):  # USER FUNCTION.
    """loads a table from .tpz file.
    See also Table.save for details on the file format.

    Args:
        path (Path): source file

    Returns:
        Table: table in read-only mode.
    """
    path = Path(path)
    log.debug(f"loading {path}")
    with zipfile.ZipFile(path, "r") as f:
        yml = f.read("table.yml")
        metadata = yaml.safe_load(yml)
        t = cls()

        page_count = sum([len(c["pages"]) for c in metadata["columns"].values()])

        with tqdm(
            total=page_count,
            desc=f"loading '{path.name}' file",
            disable=Config.TQDM_DISABLE,
        ) as pbar:
            for name, d in metadata["columns"].items():
                column = Column(t.path)
                for page in d["pages"]:
                    bytestream = io.BytesIO(f.read(page))
                    data = np.load(bytestream, allow_pickle=True, fix_imports=False)
                    column.extend(data)
                    pbar.update(1)
                t.columns[name] = column
    update_access_time(path)
    return t

`tablite.base.BaseTable.copy()`

Source code in tablite/base.py

def copy(self):
    cls = type(self)
    t = cls()
    for name, column in self.columns.items():
        new = Column(t.path)
        new.pages = column.pages[:]
        t.columns[name] = new
    return t

`tablite.base.BaseTable.imul(other)`

Repeats instance of table N times.

Like list: t = t * N

PARAMETER	DESCRIPTION
`other`	multiplier TYPE: `int`

Source code in tablite/base.py

def __imul__(self, other):
    """Repeats instance of table N times.

    Like list: `t = t * N`

    Args:
        other (int): multiplier
    """
    if not (isinstance(other, int) and other > 0):
        raise TypeError(
            f"a table can be repeated an integer number of times, not {type(other)} number of times"
        )
    for col in self.columns.values():
        col *= other
    return self

`tablite.base.BaseTable.mul(other)`

Repeat table N times. Like list: new = old * N

PARAMETER	DESCRIPTION
`other`	multiplier TYPE: `int`

RETURNS	DESCRIPTION
	Table

Source code in tablite/base.py

def __mul__(self, other):
    """Repeat table N times.
    Like list: `new = old * N`

    Args:
        other (int): multiplier

    Returns:
        Table
    """
    new = self.copy()
    return new.__imul__(other)

`tablite.base.BaseTable.iadd(other)`

Concatenates tables with same column names.

Like list: table_1 += table_2

RAISES	DESCRIPTION
`ValueError`	If column names don't match.

RETURNS	DESCRIPTION
`None`	self is updated.

Source code in tablite/base.py

def __iadd__(self, other):
    """Concatenates tables with same column names.

    Like list: `table_1 += table_2`

    Args:
        other (Table)

    Raises:
        ValueError: If column names don't match.

    Returns:
        None: self is updated.
    """
    type_check(other, BaseTable)
    for name in self.columns.keys():
        if name not in other.columns:
            raise ValueError(f"{name} not in other")
    for name in other.columns.keys():
        if name not in self.columns:
            raise ValueError(f"{name} missing from self")

    for name, column in self.columns.items():
        other_col = other.columns.get(name, None)
        column.pages.extend(other_col.pages[:])
    return self

`tablite.base.BaseTable.add(other)`

Concatenates tables with same column names.

Like list: table_3 = table_1 + table_2

RAISES	DESCRIPTION
`ValueError`	If column names don't match.

RETURNS	DESCRIPTION
	Table

Source code in tablite/base.py

def __add__(self, other):
    """Concatenates tables with same column names.

    Like list: `table_3 = table_1 + table_2`

    Args:
        other (Table)

    Raises:
        ValueError: If column names don't match.

    Returns:
        Table
    """
    type_check(other, BaseTable)
    cp = self.copy()
    cp += other
    return cp

`tablite.base.BaseTable.add_rows(*args, **kwargs)`

its more efficient to add many rows at once.

if both args and kwargs, then args are added first, followed by kwargs.

supported cases:

>>> t = Table()
>>> t.add_columns('row','A','B','C')
>>> t.add_rows(1, 1, 2, 3)                              # (1) individual values as args
>>> t.add_rows([2, 1, 2, 3])                            # (2) list of values as args
>>> t.add_rows((3, 1, 2, 3))                            # (3) tuple of values as args
>>> t.add_rows(*(4, 1, 2, 3))                           # (4) unpacked tuple becomes arg like (1)
>>> t.add_rows(row=5, A=1, B=2, C=3)                    # (5) kwargs
>>> t.add_rows(**{'row': 6, 'A': 1, 'B': 2, 'C': 3})    # (6) dict / json interpreted a kwargs
>>> t.add_rows((7, 1, 2, 3), (8, 4, 5, 6))              # (7) two (or more) tuples as args
>>> t.add_rows([9, 1, 2, 3], [10, 4, 5, 6])             # (8) two or more lists as rgs
>>> t.add_rows(
    {'row': 11, 'A': 1, 'B': 2, 'C': 3},
    {'row': 12, 'A': 4, 'B': 5, 'C': 6}
    )                                                   # (9) two (or more) dicts as args - roughly comma sep'd json.
>>> t.add_rows( *[
    {'row': 13, 'A': 1, 'B': 2, 'C': 3},
    {'row': 14, 'A': 1, 'B': 2, 'C': 3}
    ])                                                  # (10) list of dicts as args
>>> t.add_rows(row=[15,16], A=[1,1], B=[2,2], C=[3,3])  # (11) kwargs with lists as values

Source code in tablite/base.py

def add_rows(self, *args, **kwargs):
    """its more efficient to add many rows at once.

    if both args and kwargs, then args are added first, followed by kwargs.

    supported cases:
    ```
    >>> t = Table()
    >>> t.add_columns('row','A','B','C')
    >>> t.add_rows(1, 1, 2, 3)                              # (1) individual values as args
    >>> t.add_rows([2, 1, 2, 3])                            # (2) list of values as args
    >>> t.add_rows((3, 1, 2, 3))                            # (3) tuple of values as args
    >>> t.add_rows(*(4, 1, 2, 3))                           # (4) unpacked tuple becomes arg like (1)
    >>> t.add_rows(row=5, A=1, B=2, C=3)                    # (5) kwargs
    >>> t.add_rows(**{'row': 6, 'A': 1, 'B': 2, 'C': 3})    # (6) dict / json interpreted a kwargs
    >>> t.add_rows((7, 1, 2, 3), (8, 4, 5, 6))              # (7) two (or more) tuples as args
    >>> t.add_rows([9, 1, 2, 3], [10, 4, 5, 6])             # (8) two or more lists as rgs
    >>> t.add_rows(
        {'row': 11, 'A': 1, 'B': 2, 'C': 3},
        {'row': 12, 'A': 4, 'B': 5, 'C': 6}
        )                                                   # (9) two (or more) dicts as args - roughly comma sep'd json.
    >>> t.add_rows( *[
        {'row': 13, 'A': 1, 'B': 2, 'C': 3},
        {'row': 14, 'A': 1, 'B': 2, 'C': 3}
        ])                                                  # (10) list of dicts as args
    >>> t.add_rows(row=[15,16], A=[1,1], B=[2,2], C=[3,3])  # (11) kwargs with lists as values
    ```

    """
    if not BaseTable._add_row_slow_warning:
        warnings.warn(
            "add_rows is slow. Consider using add_columns and then assigning values to the columns directly."
        )
        BaseTable._add_row_slow_warning = True

    if args:
        if not all(isinstance(i, (list, tuple, dict)) for i in args):  # 1,4
            args = [args]

        if all(isinstance(i, (list, tuple, dict)) for i in args):  # 2,3,7,8
            # 1. turn the data into columns:

            d = {n: [] for n in self.columns}
            for arg in args:
                if len(arg) != len(self.columns):
                    raise ValueError(
                        f"len({arg})== {len(arg)}, but there are {len(self.columns)} columns"
                    )

                if isinstance(arg, dict):
                    for k, v in arg.items():  # 7,8
                        d[k].append(v)

                elif isinstance(arg, (list, tuple)):  # 2,3
                    for n, v in zip(self.columns, arg):
                        d[n].append(v)

                else:
                    raise TypeError(f"{arg}?")
            # 2. extend the columns
            for n, values in d.items():
                col = self.columns[n]
                col.extend(list_to_np_array(values))

    if kwargs:
        if isinstance(kwargs, dict):
            if all(isinstance(v, (list, tuple)) for v in kwargs.values()):
                for k, v in kwargs.items():
                    col = self.columns[k]
                    col.extend(list_to_np_array(v))
            else:
                for k, v in kwargs.items():
                    col = self.columns[k]
                    col.extend(np.array([v]))
        else:
            raise ValueError(f"format not recognised: {kwargs}")

    return

`tablite.base.BaseTable.add_columns(*names)`

Adds column names to table.

Source code in tablite/base.py

def add_columns(self, *names):
    """Adds column names to table."""
    for name in names:
        self.columns[name] = Column(self.path)

`tablite.base.BaseTable.add_column(name, data=None)`

verbose alias for table[name] = data, that checks if name already exists

PARAMETER	DESCRIPTION
`name`	column name TYPE: `str`
`data`	values. Defaults to None. TYPE: `list,tuple)` DEFAULT: `None`

RAISES	DESCRIPTION
`TypeError`	name isn't string
`ValueError`	name already exists

Source code in tablite/base.py

def add_column(self, name, data=None):
    """verbose alias for table[name] = data, that checks if name already exists

    Args:
        name (str): column name
        data ((list,tuple), optional): values. Defaults to None.

    Raises:
        TypeError: name isn't string
        ValueError: name already exists
    """
    if not isinstance(name, str):
        raise TypeError("expected name as string")
    if name in self.columns:
        raise ValueError(f"{name} already in {self.columns}")
    self.__setitem__(name, data)

`tablite.base.BaseTable.stack(other)`

returns the joint stack of tables with overlapping column names. Example:

| Table A|  +  | Table B| = |  Table AB |
| A| B| C|     | A| B| D|   | A| B| C| -|
                            | A| B| -| D|

Source code in tablite/base.py

def stack(self, other):
    """
    returns the joint stack of tables with overlapping column names.
    Example:
    ```
    | Table A|  +  | Table B| = |  Table AB |
    | A| B| C|     | A| B| D|   | A| B| C| -|
                                | A| B| -| D|
    ```
    """
    if not isinstance(other, BaseTable):
        raise TypeError(f"stack only works for Table, not {type(other)}")

    cp = self.copy()
    for name, col2 in other.columns.items():
        if name not in cp.columns:
            cp[name] = [None] * len(self)
        cp[name].pages.extend(col2.pages[:])

    for name in self.columns:
        if name not in other.columns:
            if len(cp) > 0:
                cp[name].extend(np.array([None] * len(other)))
    return cp

`tablite.base.BaseTable.types()`

returns nested dict of data types in the form: {column name: {python type class: number of instances }, ... }

example:

>>> t.types()
{
    'A': {<class 'str'>: 7},
    'B': {<class 'int'>: 7}
}

Source code in tablite/base.py

def types(self):
    """
    returns nested dict of data types in the form:
    `{column name: {python type class: number of instances }, ... }`

    example:
    ```
    >>> t.types()
    {
        'A': {<class 'str'>: 7},
        'B': {<class 'int'>: 7}
    }
    ```
    """
    d = {}
    for name, col in self.columns.items():
        assert isinstance(col, Column)
        d[name] = col.types()
    return d

`tablite.base.BaseTable.display_dict(slice_=None, blanks=None, dtype=False)`

helper for creating dict for display.

PARAMETER	DESCRIPTION
`slice_`	python slice. Defaults to None. TYPE: `slice` DEFAULT: `None`
`blanks`	fill value for `None`. Defaults to None. TYPE: `optional` DEFAULT: `None`
`dtype`	Adds datatype to each column. Defaults to False. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`TypeError`	slice_ must be None or slice.

RETURNS	DESCRIPTION
`dict`	from Table.

Source code in tablite/base.py

def display_dict(self, slice_=None, blanks=None, dtype=False):
    """helper for creating dict for display.

    Args:
        slice_ (slice, optional): python slice. Defaults to None.
        blanks (optional): fill value for `None`. Defaults to None.
        dtype (bool, optional): Adds datatype to each column. Defaults to False.

    Raises:
        TypeError: slice_ must be None or slice.

    Returns:
        dict: from Table.
    """
    if not self.columns:
        print("Empty Table")
        return

    def datatype(col):  # PRIVATE
        """creates label for column datatype."""
        types = col.types()
        if len(types) == 0:
            typ = "empty"
        elif len(types) == 1:
            dt, _ = types.popitem()
            typ = dt.__name__
        else:
            typ = "mixed"
        return typ

    row_count_tags = ["#", "~", "*"]
    cols = set(self.columns)
    for n, tag in product(range(1, 6), row_count_tags):
        if n * tag not in cols:
            tag = n * tag
            break

    if not isinstance(slice_, (slice, type(None))):
        raise TypeError(f"slice_ must be None or slice, not {type(slice_)}")
    if isinstance(slice_, slice):
        slc = slice_
    if slice_ is None:
        if len(self) <= 20:
            slc = slice(0, 20, 1)
        else:
            slc = None

    n = len(self)
    if slc:  # either we want slc or we want everything.
        row_no = list(range(*slc.indices(len(self))))
        data = {tag: [f"{i:,}".rjust(2) for i in row_no]}
        for name, col in self.columns.items():
            data[name] = list(chain(iter(col), repeat(blanks, times=n - len(col))))[
                slc
            ]
    else:
        data = {}
        j = int(math.ceil(math.log10(n)) / 3) + len(str(n))
        row_no = (
            [f"{i:,}".rjust(j) for i in range(7)]
            + ["..."]
            + [f"{i:,}".rjust(j) for i in range(n - 7, n)]
        )
        data = {tag: row_no}

        for name, col in self.columns.items():
            if len(col) == n:
                row = col[:7].tolist() + ["..."] + col[-7:].tolist()
            else:
                empty = [blanks] * 7
                head = (col[:7].tolist() + empty)[:7]
                tail = (col[n - 7 :].tolist() + empty)[-7:]
                row = head + ["..."] + tail
            data[name] = row

    if dtype:
        for name, values in data.items():
            if name in self.columns:
                col = self.columns[name]
                values.insert(0, datatype(col))
            else:
                values.insert(0, "row")

    return data

`tablite.base.BaseTable.to_ascii(slice_=None, blanks=None, dtype=False)`

returns ascii view of table as string.

PARAMETER	DESCRIPTION
`slice_`	slice to determine table snippet. TYPE: `slice` DEFAULT: `None`
`blanks`	value for whitespace. Defaults to None. TYPE: `str` DEFAULT: `None`
`dtype`	adds subheader with datatype for column. Defaults to False. TYPE: `bool` DEFAULT: `False`

Source code in tablite/base.py

def to_ascii(self, slice_=None, blanks=None, dtype=False):
    """returns ascii view of table as string.

    Args:
        slice_ (slice, optional): slice to determine table snippet.
        blanks (str, optional): value for whitespace. Defaults to None.
        dtype (bool, optional): adds subheader with datatype for column. Defaults to False.
    """

    def adjust(v, length):  # PRIVATE FUNCTION
        """whitespace justifies field values based on datatype"""
        if v is None:
            return str(blanks).ljust(length)
        elif isinstance(v, str):
            return v.ljust(length)
        else:
            return str(v).rjust(length)

    if not self.columns:
        return str(self)

    d = {}
    for name, values in self.display_dict(
        slice_=slice_, blanks=blanks, dtype=dtype
    ).items():
        as_text = [str(v) for v in values] + [str(name)]
        width = max(len(i) for i in as_text)
        new_name = name.center(width, " ")
        if dtype:
            values[0] = values[0].center(width, " ")
        d[new_name] = [adjust(v, width) for v in values]

    rows = dict_to_rows(d)
    s = []
    s.append("+" + "+".join(["=" * len(n) for n in rows[0]]) + "+")
    s.append("|" + "|".join(rows[0]) + "|")  # column names
    start = 1
    if dtype:
        s.append("|" + "|".join(rows[1]) + "|")  # datatypes
        start = 2

    s.append("+" + "+".join(["-" * len(n) for n in rows[0]]) + "+")
    for row in rows[start:]:
        s.append("|" + "|".join(row) + "|")
    s.append("+" + "+".join(["=" * len(n) for n in rows[0]]) + "+")

    if len(set(len(c) for c in self.columns.values())) != 1:
        warning = f"Warning: Columns have different lengths. {blanks} is used as fill value."
        s.append(warning)

    return "\n".join(s)

`tablite.base.BaseTable.show(slice_=None, blanks=None, dtype=False)`

prints ascii view of table.

PARAMETER	DESCRIPTION
`slice_`	slice to determine table snippet. TYPE: `slice` DEFAULT: `None`
`blanks`	value for whitespace. Defaults to None. TYPE: `str` DEFAULT: `None`
`dtype`	adds subheader with datatype for column. Defaults to False. TYPE: `bool` DEFAULT: `False`

Source code in tablite/base.py

def show(self, slice_=None, blanks=None, dtype=False):
    """prints ascii view of table.

    Args:
        slice_ (slice, optional): slice to determine table snippet.
        blanks (str, optional): value for whitespace. Defaults to None.
        dtype (bool, optional): adds subheader with datatype for column. Defaults to False.
    """
    print(self.to_ascii(slice_=slice_, blanks=blanks, dtype=dtype))

`tablite.base.BaseTable.to_dict(columns=None, slice_=None)`

columns: list of column names. Default is None == all columns. slice_: slice. Default is None == all rows.

returns: dict with columns as keys and lists of values.

Example:

>>> t.show()
+===+===+===+
| # | a | b |
|row|int|int|
+---+---+---+
| 0 |  1|  3|
| 1 |  2|  4|
+===+===+===+
>>> t.to_dict()
{'a':[1,2], 'b':[3,4]}

Source code in tablite/base.py

def to_dict(self, columns=None, slice_=None):
    """
    columns: list of column names. Default is None == all columns.
    slice_: slice. Default is None == all rows.

    returns: dict with columns as keys and lists of values.

    Example:
    ```
    >>> t.show()
    +===+===+===+
    | # | a | b |
    |row|int|int|
    +---+---+---+
    | 0 |  1|  3|
    | 1 |  2|  4|
    +===+===+===+
    >>> t.to_dict()
    {'a':[1,2], 'b':[3,4]}
    ```

    """
    if slice_ is None:
        slice_ = slice(0, len(self))
    assert isinstance(slice_, slice)

    if columns is None:
        columns = list(self.columns.keys())
    if not isinstance(columns, list):
        raise TypeError("expected columns as list of strings")

    return {name: list(self.columns[name][slice_]) for name in columns}

`tablite.base.BaseTable.as_json_serializable(row_count='row id', start_on=1, columns=None, slice_=None)`

provides a JSON compatible format of the table.

PARAMETER	DESCRIPTION
`row_count`	Label for row counts. Defaults to "row id". TYPE: `str` DEFAULT: `'row id'`
`start_on`	row counts starts by default on 1. TYPE: `int` DEFAULT: `1`
`columns`	Column names. Defaults to None which returns all columns. TYPE: `list of str` DEFAULT: `None`
`slice_`	selector. Defaults to None which returns [:] TYPE: `slice` DEFAULT: `None`

RETURNS	DESCRIPTION
	JSON serializable dict: All python datatypes have been converted to JSON compliant data.

Source code in tablite/base.py

def as_json_serializable(
    self, row_count="row id", start_on=1, columns=None, slice_=None
):
    """provides a JSON compatible format of the table.

    Args:
        row_count (str, optional): Label for row counts. Defaults to "row id".
        start_on (int, optional): row counts starts by default on 1.
        columns (list of str, optional): Column names.
            Defaults to None which returns all columns.
        slice_ (slice, optional): selector. Defaults to None which returns [:]

    Returns:
        JSON serializable dict: All python datatypes have been converted to JSON compliant data.
    """
    if slice_ is None:
        slice_ = slice(0, len(self))

    assert isinstance(slice_, slice)
    new = {"columns": {}, "total_rows": len(self)}
    if row_count is not None:
        new["columns"][row_count] = [
            i + start_on for i in range(*slice_.indices(len(self)))
        ]

    d = self.to_dict(columns, slice_=slice_)
    for k, data in d.items():
        new_k = unique_name(
            k, new["columns"]
        )  # used to avoid overwriting the `row id` key.
        new["columns"][new_k] = [
            DataTypes.to_json(v) for v in data
        ]  # deal with non-json datatypes.
    return new

`tablite.base.BaseTable.index(*args)`

param: *args: column names returns multikey index on the columns as d[(key tuple, )] = {index1, index2, ...}

Examples:

>>> table6 = Table()
>>> table6['A'] = ['Alice', 'Bob', 'Bob', 'Ben', 'Charlie', 'Ben','Albert']
>>> table6['B'] = ['Alison', 'Marley', 'Dylan', 'Affleck', 'Hepburn', 'Barnes', 'Einstein']

>>> table6.index('A')  # single key.
{('Alice',): [0],
 ('Bob',): [1, 2],
 ('Ben',): [3, 5],
 ('Charlie',): [4],
 ('Albert',): [6]})

>>> table6.index('A', 'B')  # multiple keys.
{('Alice', 'Alison'): [0],
 ('Bob', 'Marley'): [1],
 ('Bob', 'Dylan'): [2],
 ('Ben', 'Affleck'): [3],
 ('Charlie', 'Hepburn'): [4],
 ('Ben', 'Barnes'): [5],
 ('Albert', 'Einstein'): [6]})

Source code in tablite/base.py

def index(self, *args):
    """
    param: *args: column names
    returns multikey index on the columns as d[(key tuple, )] = {index1, index2, ...}

    Examples:
        ```
        >>> table6 = Table()
        >>> table6['A'] = ['Alice', 'Bob', 'Bob', 'Ben', 'Charlie', 'Ben','Albert']
        >>> table6['B'] = ['Alison', 'Marley', 'Dylan', 'Affleck', 'Hepburn', 'Barnes', 'Einstein']
        ```

        ```
        >>> table6.index('A')  # single key.
        {('Alice',): [0],
         ('Bob',): [1, 2],
         ('Ben',): [3, 5],
         ('Charlie',): [4],
         ('Albert',): [6]})
        ```

        ```
        >>> table6.index('A', 'B')  # multiple keys.
        {('Alice', 'Alison'): [0],
         ('Bob', 'Marley'): [1],
         ('Bob', 'Dylan'): [2],
         ('Ben', 'Affleck'): [3],
         ('Charlie', 'Hepburn'): [4],
         ('Ben', 'Barnes'): [5],
         ('Albert', 'Einstein'): [6]})
        ```

    """
    idx = defaultdict(list)
    iterators = [iter(self.columns[c]) for c in args]
    for ix, key in enumerate(zip(*iterators)):
        key = tuple(numpy_to_python(k) for k in key)
        idx[key].append(ix)
    return idx

`tablite.base.BaseTable.unique_index(*args, tqdm=_tqdm)`

generates the index of unique rows given a list of column names

PARAMETER	DESCRIPTION
`*args`	columns names TYPE: `any` DEFAULT: `()`
`tqdm`	Defaults to _tqdm. TYPE: `tqdm` DEFAULT: `tqdm`

RETURNS	DESCRIPTION
	np.array(int64): indices of unique records.

Source code in tablite/base.py

def unique_index(self, *args, tqdm=_tqdm):
    """generates the index of unique rows given a list of column names

    Args:
        *args (any): columns names
        tqdm (tqdm, optional): Defaults to _tqdm.

    Returns:
        np.array(int64): indices of unique records.
    """
    if not args:
        raise ValueError("*args (column names) is required")
    seen = set()
    unique = set()
    iterators = [iter(self.columns[c]) for c in args]
    for ix, key in tqdm(enumerate(zip(*iterators)), disable=Config.TQDM_DISABLE):
        key_hash = hash(tuple(numpy_to_python(k) for k in key))
        if key_hash in seen:
            continue
        else:
            seen.add(key_hash)
            unique.add(ix)
    return np.array(sorted(unique))

Functions

`tablite.base.register(path)`

registers path in file_registry

The method is used by Table during init when the working directory path is set, so that python can clean all temporary files up at exit.

PARAMETER	DESCRIPTION
`path`	typically tmp/tablite-tmp/PID-{os.getpid()} TYPE: `Path`

Source code in tablite/base.py

def register(path):
    """registers path in file_registry

    The method is used by Table during init when the working directory path
    is set, so that python can clean all temporary files up at exit.

    Args:
        path (Path): typically tmp/tablite-tmp/PID-{os.getpid()}
    """
    global file_registry
    file_registry.add(path)

`tablite.base.shutdown()`

method to clean up temporary files triggered at shutdown.

Source code in tablite/base.py

def shutdown():
    """method to clean up temporary files triggered at shutdown."""
    for path in file_registry:
        if Config.pid in str(path):  # safety feature to prevent rm -rf /
            log.debug(f"shutdown: running rmtree({path})")
            shutil.rmtree(path)

Base

tablite.base

Attributes

tablite.base.log = logging.getLogger(__name__) module-attribute

tablite.base.file_registry = set() module-attribute

Classes

tablite.base.SimplePage(id, path, len, py_dtype)

Attributes

tablite.base.SimplePage.ids = count(start=1) class-attribute instance-attribute

tablite.base.SimplePage.refcounts = {} class-attribute instance-attribute

tablite.base.SimplePage.autocleanup = True class-attribute instance-attribute

tablite.base.SimplePage.path = Path(path) / 'pages' / f'{id}.npy' instance-attribute

tablite.base.SimplePage.len = len instance-attribute

tablite.base.SimplePage.dtype = py_dtype instance-attribute

Functions

tablite.base.SimplePage.__setstate__(state)

tablite.base.SimplePage.next_id(path) classmethod

tablite.base.SimplePage.__len__()

tablite.base.SimplePage.__repr__() -> str

tablite.base.SimplePage.__hash__() -> int

tablite.base.SimplePage.owns()

tablite.base.SimplePage.__del__()

tablite.base.SimplePage.get()

tablite.base.Page(path, array)

Attributes

tablite.base.Page.ids = count(start=1) class-attribute instance-attribute

tablite.base.Page.refcounts = {} class-attribute instance-attribute

tablite.base.Page.autocleanup = True class-attribute instance-attribute

tablite.base.Page.path = Path(path) / 'pages' / f'{id}.npy' instance-attribute

tablite.base.Page.len = len instance-attribute

tablite.base.Page.dtype = py_dtype instance-attribute

Functions

tablite.base.Page.__setstate__(state)

tablite.base.Page.next_id(path) classmethod

tablite.base.Page.__len__()

tablite.base.Page.__repr__() -> str

tablite.base.Page.__hash__() -> int

tablite.base.Page.owns()

tablite.base.Page.__del__()

tablite.base.Page.get()

tablite.base.Column(path, value=None)

Attributes

tablite.base.Column.path = path instance-attribute

tablite.base.Column.pages = [] instance-attribute

Functions

tablite.base.Column.__len__()

tablite.base.Column.__repr__()

tablite.base.Column.repaginate()

tablite.base.Column.extend(value)

tablite.base.Column.clear()

tablite.base.Column.getpages(item)

tablite.base.Column.iter_by_page()

tablite.base.Column.__getitem__(item)

tablite.base.Column.__setitem__(key, value)

tablite.base.Column.__delitem__(key)

tablite.base.Column.get_by_indices(indices: Union[List[int], np.ndarray]) -> np.ndarray

tablite.base.Column.__iter__()

tablite.base.Column.__eq__(other)

tablite.base.Column.__ne__(other)

tablite.base.Column.copy()

tablite.base.Column.__copy__()

tablite.base.Column.__imul__(other)

tablite.base.Column.__mul__(other)

tablite.base.Column.__iadd__(other)

tablite.base.Column.__contains__(item)

tablite.base.Column.remove_all(*values)

tablite.base.Column.replace(mapping)

tablite.base.Column.types()

tablite.base.Column.index()

tablite.base.Column.unique()

tablite.base.Column.histogram()

tablite.base.Column.statistics()

tablite.base.Column.count(item)

tablite.base.BaseTable(columns: [dict, None] = None, headers: [list, None] = None, rows: [list, None] = None, _path: [Path, None] = None)

Attributes

tablite.base.BaseTable.path = _path instance-attribute

tablite.base.BaseTable.columns = {} instance-attribute

tablite.base.BaseTable.rows property

Functions

tablite.base.BaseTable.__str__()

`tablite.base`

`tablite.base.log = logging.getLogger(name)` `module-attribute`

`tablite.base.file_registry = set()` `module-attribute`

`tablite.base.SimplePage(id, path, len, py_dtype)`

`tablite.base.SimplePage.ids = count(start=1)` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.refcounts = {}` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.autocleanup = True` `class-attribute` `instance-attribute`

`tablite.base.SimplePage.path = Path(path) / 'pages' / f'{id}.npy'` `instance-attribute`

`tablite.base.SimplePage.len = len` `instance-attribute`

`tablite.base.SimplePage.dtype = py_dtype` `instance-attribute`

`tablite.base.SimplePage.setstate(state)`

`tablite.base.SimplePage.next_id(path)` `classmethod`

`tablite.base.SimplePage.len()`

`tablite.base.SimplePage.repr() -> str`

`tablite.base.SimplePage.hash() -> int`

`tablite.base.SimplePage.owns()`

`tablite.base.SimplePage.del()`

`tablite.base.SimplePage.get()`

`tablite.base.Page(path, array)`

`tablite.base.Page.ids = count(start=1)` `class-attribute` `instance-attribute`

`tablite.base.Page.refcounts = {}` `class-attribute` `instance-attribute`

`tablite.base.Page.autocleanup = True` `class-attribute` `instance-attribute`

`tablite.base.Page.path = Path(path) / 'pages' / f'{id}.npy'` `instance-attribute`

`tablite.base.Page.len = len` `instance-attribute`

`tablite.base.Page.dtype = py_dtype` `instance-attribute`

`tablite.base.Page.setstate(state)`

`tablite.base.Page.next_id(path)` `classmethod`

`tablite.base.Page.len()`

`tablite.base.Page.repr() -> str`

`tablite.base.Page.hash() -> int`

`tablite.base.Page.owns()`

`tablite.base.Page.del()`

`tablite.base.Page.get()`

`tablite.base.Column(path, value=None)`

`tablite.base.Column.path = path` `instance-attribute`

`tablite.base.Column.pages = []` `instance-attribute`

`tablite.base.Column.len()`

`tablite.base.Column.repr()`

`tablite.base.Column.repaginate()`

`tablite.base.Column.extend(value)`

`tablite.base.Column.clear()`

`tablite.base.Column.getpages(item)`

`tablite.base.Column.iter_by_page()`

`tablite.base.Column.getitem(item)`

`tablite.base.Column.setitem(key, value)`

`tablite.base.Column.delitem(key)`

`tablite.base.Column.get_by_indices(indices: Union[List[int], np.ndarray]) -> np.ndarray`

`tablite.base.Column.iter()`

`tablite.base.Column.eq(other)`

`tablite.base.Column.ne(other)`

`tablite.base.Column.copy()`

`tablite.base.Column.copy()`

`tablite.base.Column.imul(other)`

`tablite.base.Column.mul(other)`

`tablite.base.Column.iadd(other)`

`tablite.base.Column.contains(item)`

`tablite.base.Column.remove_all(*values)`

`tablite.base.Column.replace(mapping)`

`tablite.base.Column.types()`

`tablite.base.Column.index()`

`tablite.base.Column.unique()`

`tablite.base.Column.histogram()`

`tablite.base.Column.statistics()`

`tablite.base.Column.count(item)`

`tablite.base.BaseTable(columns: [dict, None] = None, headers: [list, None] = None, rows: [list, None] = None, _path: [Path, None] = None)`

`tablite.base.BaseTable.path = _path` `instance-attribute`

`tablite.base.BaseTable.columns = {}` `instance-attribute`

`tablite.base.BaseTable.rows` `property`

`tablite.base.BaseTable.str()`

`tablite.base.BaseTable.repr()`

`tablite.base.BaseTable.nbytes()`

`tablite.base.BaseTable.items()`

`tablite.base.BaseTable.delitem(key)`

`tablite.base.BaseTable.setitem(key, value)`

`tablite.base.BaseTable.getitem(keys)`

`tablite.base.BaseTable.len()`

`tablite.base.BaseTable.eq(other) -> bool`

`tablite.base.BaseTable.clear()`

`tablite.base.BaseTable.save(path, compression_method=zipfile.ZIP_DEFLATED, compression_level=1)`

`tablite.base.BaseTable.load(path, tqdm=_tqdm)` `classmethod`