leaf_engine.etl.flag
Functions in this module flag anomalous shipments with respect to their weight, spend, rate per mile, and distance. This module exports a flag_pipeline function that assembles individual flagging functions in one function that can be used with the pandas pd.DataFrame.pipe function.
Functions
Checks distance provided by data supplier (i.e., company) versus distance |
|
|
|
|
Flags rate per mile outside of rate per mile range specified in run |
|
Flags spend outside of spend range specified in run parameters. |
|
Checks if a value is within range (exclusive at both ends). |
|
Flags weight outside of weight range specified in run parameters. |
|
Public flagging pipeline. Pipe shipments through this function to augment |
Module Contents
- leaf_engine.etl.flag._flag_distances_by_company_vs_pcmiler_miles(df: pandas.DataFrame) pandas.DataFrame
Checks distance provided by data supplier (i.e., company) versus distance determined by PCMiler. Checks are both relative (PCMiler miles / supplier miles) and.
absolute (PCMiler miles - supplier miles). Thresholds are NOT currently exposed through a public API.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Returns:
Input DataFrame with additional delta_miles boolean column indicating PCMiler and supplier distances are not similar.
- Return type:
pd.DataFrame
- leaf_engine.etl.flag._flag_distances_by_params(df: pandas.DataFrame) pandas.DataFrame
- Parameters:
df (pandas.DataFrame) –
- Return type:
- leaf_engine.etl.flag._flag_rates(df: pandas.DataFrame) pandas.DataFrame
Flags rate per mile outside of rate per mile range specified in run parameters.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Returns:
Input DataFrame with additional rpm_flagged column indicating shipment rate per mile outside of run rate per mile range.
- Return type:
pd.DataFrame
- leaf_engine.etl.flag._flag_spend(df: pandas.DataFrame) pandas.DataFrame
Flags spend outside of spend range specified in run parameters.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Returns:
Input DataFrame with additional spend_flagged column indicating shipment spend outside of run spend range.
- Return type:
pd.DataFrame
- leaf_engine.etl.flag._flag_value(value: int | float | pandas._libs.missing.NAType, min_value: int | float, max_value: int | float, flag_na: bool = False) bool
Checks if a value is within range (exclusive at both ends).
- Parameters:
value (Number) – Value to check.
min_value (Number) – Exclusive lower end of range.
max_value (Number) – Exclusive upper end of range.
flag_na (bool, optional) – Whether NA values return True. Defaults to False. NA values include np.nan, pd.NA, and None.
- Returns:
Boolean value of check.
- Return type:
- leaf_engine.etl.flag._flag_weight(df: pandas.DataFrame) pandas.DataFrame
Flags weight outside of weight range specified in run parameters.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Returns:
Input DataFrame with additional weight_flagged column indicating shipment weight outside of run weight range.
- Return type:
pd.DataFrame
- leaf_engine.etl.flag.flag_pipeline(df: pandas.DataFrame) pandas.DataFrame
Public flagging pipeline. Pipe shipments through this function to augment them with flagging columns.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Returns:
Input DataFrame with additional flagging columns.
- Return type:
pd.DataFrame