Imputation
tablite.imputation
Classes
Functions
tablite.imputation.imputation(T, targets, missing=None, method='carry forward', sources=None, tqdm=_tqdm, pbar=None)
In statistics, imputation is the process of replacing missing data with substituted values.
See more: https://en.wikipedia.org/wiki/Imputation_(statistics)
PARAMETER | DESCRIPTION |
---|---|
table |
source table.
TYPE:
|
targets |
column names to find and replace missing values
TYPE:
|
missing |
values to be replaced.
TYPE:
|
method |
method to be used for replacement. Options: 'carry forward': takes the previous value, and carries forward into fields where values are missing. +: quick. Realistic on time series. -: Can produce strange outliers. 'mean':
calculates the column mean (exclude 'mode':
calculates the column mode (exclude 'nearest neighbour': calculates normalised distance between items in source columns selects nearest neighbour and copies value as replacement. +: works for any datatype. -: computationally intensive (e.g. slow)
TYPE:
|
sources |
NEAREST NEIGHBOUR ONLY column names to be used during imputation. if None or empty, all columns will be used.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
table
|
table with replaced values. |
Source code in tablite/imputation.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
tablite.imputation.carry_forward(T, targets, missing, tqdm=_tqdm, pbar=None)
Source code in tablite/imputation.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
tablite.imputation.stats_method(T, targets, missing, method, tqdm=_tqdm, pbar=None)
Source code in tablite/imputation.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|