leaf_engine.etl.flag

Functions in this module flag anomalous shipments with respect to their weight, spend, rate per mile, and distance. This module exports a flag_pipeline function that assembles individual flagging functions in one function that can be used with the pandas pd.DataFrame.pipe function.

Functions

_flag_distances_by_company_vs_pcmiler_miles(...)

Checks distance provided by data supplier (i.e., company) versus distance

_flag_distances_by_params(→ pandas.DataFrame)

_flag_rates(→ pandas.DataFrame)

Flags rate per mile outside of rate per mile range specified in run

_flag_spend(→ pandas.DataFrame)

Flags spend outside of spend range specified in run parameters.

_flag_value(→ bool)

Checks if a value is within range (exclusive at both ends).

_flag_weight(→ pandas.DataFrame)

Flags weight outside of weight range specified in run parameters.

flag_pipeline(→ pandas.DataFrame)

Public flagging pipeline. Pipe shipments through this function to augment

Module Contents

leaf_engine.etl.flag._flag_distances_by_company_vs_pcmiler_miles(df: pandas.DataFrame) pandas.DataFrame

Checks distance provided by data supplier (i.e., company) versus distance determined by PCMiler. Checks are both relative (PCMiler miles / supplier miles) and.

absolute (PCMiler miles - supplier miles). Thresholds are NOT currently exposed through a public API.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional delta_miles boolean column indicating PCMiler and supplier distances are not similar.

Return type:

pd.DataFrame

leaf_engine.etl.flag._flag_distances_by_params(df: pandas.DataFrame) pandas.DataFrame
Parameters:

df (pandas.DataFrame) –

Return type:

pandas.DataFrame

leaf_engine.etl.flag._flag_rates(df: pandas.DataFrame) pandas.DataFrame

Flags rate per mile outside of rate per mile range specified in run parameters.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional rpm_flagged column indicating shipment rate per mile outside of run rate per mile range.

Return type:

pd.DataFrame

leaf_engine.etl.flag._flag_spend(df: pandas.DataFrame) pandas.DataFrame

Flags spend outside of spend range specified in run parameters.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional spend_flagged column indicating shipment spend outside of run spend range.

Return type:

pd.DataFrame

leaf_engine.etl.flag._flag_value(value: int | float | pandas._libs.missing.NAType, min_value: int | float, max_value: int | float, flag_na: bool = False) bool

Checks if a value is within range (exclusive at both ends).

Parameters:
  • value (Number) – Value to check.

  • min_value (Number) – Exclusive lower end of range.

  • max_value (Number) – Exclusive upper end of range.

  • flag_na (bool, optional) – Whether NA values return True. Defaults to False. NA values include np.nan, pd.NA, and None.

Returns:

Boolean value of check.

Return type:

bool

leaf_engine.etl.flag._flag_weight(df: pandas.DataFrame) pandas.DataFrame

Flags weight outside of weight range specified in run parameters.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional weight_flagged column indicating shipment weight outside of run weight range.

Return type:

pd.DataFrame

leaf_engine.etl.flag.flag_pipeline(df: pandas.DataFrame) pandas.DataFrame

Public flagging pipeline. Pipe shipments through this function to augment them with flagging columns.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional flagging columns.

Return type:

pd.DataFrame