leaf_engine.etl.flag ==================== .. py:module:: leaf_engine.etl.flag .. autoapi-nested-parse:: Shipment flagging pipeline ========================== Functions in this module flag anomalous shipments with respect to their weight, spend, rate per mile, and distance. This module exports a `flag_pipeline` function that assembles individual flagging functions in one function that can be used with the `pandas pd.DataFrame.pipe function `_. Functions --------- .. autoapisummary:: leaf_engine.etl.flag._flag_distances_by_company_vs_pcmiler_miles leaf_engine.etl.flag._flag_distances_by_params leaf_engine.etl.flag._flag_rates leaf_engine.etl.flag._flag_spend leaf_engine.etl.flag._flag_value leaf_engine.etl.flag._flag_weight leaf_engine.etl.flag.flag_pipeline Module Contents --------------- .. py:function:: _flag_distances_by_company_vs_pcmiler_miles(df: pandas.DataFrame) -> pandas.DataFrame Checks distance provided by data supplier (i.e., company) versus distance determined by PCMiler. Checks are both relative (PCMiler miles / supplier miles) and. absolute (PCMiler miles - supplier miles). Thresholds are **NOT** currently exposed through a public API. :param df: Shipments DataFrame. :type df: pd.DataFrame :returns: Input DataFrame with additional `delta_miles` boolean column indicating PCMiler and supplier distances are not similar. :rtype: pd.DataFrame .. py:function:: _flag_distances_by_params(df: pandas.DataFrame) -> pandas.DataFrame .. py:function:: _flag_rates(df: pandas.DataFrame) -> pandas.DataFrame Flags rate per mile outside of rate per mile range specified in run parameters. :param df: Shipments DataFrame. :type df: pd.DataFrame :returns: Input DataFrame with additional `rpm_flagged` column indicating shipment rate per mile outside of run rate per mile range. :rtype: pd.DataFrame .. py:function:: _flag_spend(df: pandas.DataFrame) -> pandas.DataFrame Flags spend outside of spend range specified in run parameters. :param df: Shipments DataFrame. :type df: pd.DataFrame :returns: Input DataFrame with additional `spend_flagged` column indicating shipment spend outside of run spend range. :rtype: pd.DataFrame .. py:function:: _flag_value(value: Union[int, float, pandas._libs.missing.NAType], min_value: Union[int, float], max_value: Union[int, float], flag_na: bool = False) -> bool Checks if a value is within range (exclusive at both ends). :param value: Value to check. :type value: Number :param min_value: Exclusive lower end of range. :type min_value: Number :param max_value: Exclusive upper end of range. :type max_value: Number :param flag_na: Whether NA values return True. Defaults to False. NA values include `np.nan`, `pd.NA`, and None. :type flag_na: bool, optional :returns: Boolean value of check. :rtype: bool .. py:function:: _flag_weight(df: pandas.DataFrame) -> pandas.DataFrame Flags weight outside of weight range specified in run parameters. :param df: Shipments DataFrame. :type df: pd.DataFrame :returns: Input DataFrame with additional `weight_flagged` column indicating shipment weight outside of run weight range. :rtype: pd.DataFrame .. py:function:: flag_pipeline(df: pandas.DataFrame) -> pandas.DataFrame Public flagging pipeline. Pipe shipments through this function to augment them with flagging columns. :param df: Shipments DataFrame. :type df: pd.DataFrame :returns: Input DataFrame with additional flagging columns. :rtype: pd.DataFrame