leaf_engine

Subpackages

Submodules

Functions

cluster_pipeline(→ pandas.DataFrame)

concat_pipeline(→ pandas.DataFrame)

filter_pipeline(df)

flag_pipeline(→ pandas.DataFrame)

Public flagging pipeline. Pipe shipments through this function to augment

fsc_pipeline(→ pandas.DataFrame)

fuel_pipeline(df)

geocode_pipeline(→ pandas.DataFrame)

load_pipeline(→ pandas.DataFrame)

map_pipeline(df, dataset_params[, lane_level])

output_pipeline(df[, lane_level])

pcmiler_pipeline(df)

pull_lighthouse_data(→ List[pandas.DataFrame])

Pulls lighthouse shipper data for internal run. Filters down to only latest

read_csv(→ pandas.DataFrame)

read_data(→ Dict[str, pandas.DataFrame])

Reads all data specified in company_params.

read_dataset(→ pandas.DataFrame | Dict[Any, ...)

read_drive(→ pandas.DataFrame)

Reads CSV, Excel, or Google Spreadsheet file from Google Drive either by

read_params(→ dict)

resolve_geocoding(→ pandas.DataFrame)

resolve_miles(→ pandas.DataFrame)

setup(→ None)

to_drive(→ str)

Write a DataFrame to a file on Google Drive, at the specified path.

upload_to_drive(→ str)

Uploads file to Google Drive.

uuid_pipeline(df[, set_shipment_uuid, set_lane_uuid])

validate_pipeline(df[, params])

Package Contents

leaf_engine.cluster_pipeline(df: pandas.DataFrame) pandas.DataFrame
Parameters:

df (pandas.DataFrame) –

Return type:

pandas.DataFrame

leaf_engine.concat_pipeline(input_dfs: List[pandas.DataFrame], lane_level=False) pandas.DataFrame
Parameters:

input_dfs (List[pandas.DataFrame]) –

Return type:

pandas.DataFrame

leaf_engine.filter_pipeline(df)
leaf_engine.flag_pipeline(df: pandas.DataFrame) pandas.DataFrame

Public flagging pipeline. Pipe shipments through this function to augment them with flagging columns.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Returns:

Input DataFrame with additional flagging columns.

Return type:

pd.DataFrame

leaf_engine.fsc_pipeline(df: pandas.DataFrame, lane_level: bool = False) pandas.DataFrame
Parameters:
Return type:

pandas.DataFrame

leaf_engine.fuel_pipeline(df)
leaf_engine.geocode_pipeline(df: pandas.DataFrame) pandas.DataFrame
Parameters:

df (pandas.DataFrame) –

Return type:

pandas.DataFrame

leaf_engine.load_pipeline(df: pandas.DataFrame, run_type: str, dry_run: bool = False, overwrite: bool = False) pandas.DataFrame
Parameters:
Return type:

pandas.DataFrame

leaf_engine.map_pipeline(df, dataset_params, lane_level=False)
leaf_engine.output_pipeline(df, lane_level=False)
leaf_engine.pcmiler_pipeline(df)
leaf_engine.pull_lighthouse_data(params) List[pandas.DataFrame]

Pulls lighthouse shipper data for internal run. Filters down to only latest batch.

Parameters:

adapt_params (dict) – Adapt params.

Returns:

DataFrame’s containing lighthouse lanes.

Return type:

DataFrame

leaf_engine.read_csv(input_path, **kwargs) pandas.DataFrame
Return type:

pandas.DataFrame

leaf_engine.read_data(company_params: dict) Dict[str, pandas.DataFrame]

Reads all data specified in company_params.

Returns:

Dictionary mapping dataset labels to DataFrames that can be passed to pd.concat directly to create one DataFrame.

Return type:

Dict[str, pd.DataFrame]

Parameters:

company_params (dict) –

leaf_engine.read_dataset(dataset_params) pandas.DataFrame | Dict[Any, pandas.DataFrame]
Return type:

pandas.DataFrame | Dict[Any, pandas.DataFrame]

leaf_engine.read_drive(url: str | None = None, path: str | None = None, cache: bool = True, **kwargs) pandas.DataFrame

Reads CSV, Excel, or Google Spreadsheet file from Google Drive either by url or path.

Parameters:
  • url (Optional[str], optional) – File URL. Can be copied from browser navigation bar. Defaults to None.

  • path (Optional[str], optional) – File path. First part needs to be drive name. Defaults to None.

  • cache (bool, optional) – Whether to cache the result to disk. Defaults to True. This makes subsequent calls to read_drive faster.

  • **kwargs – Keyword arguments passed to pandas.read_csv or pandas.read_excel.

Return type:

pandas.DataFrame

Examples: >>> df = read_drive(url=”https://drive.google.com/file/d/XXYYZZ/view?usp=sharing”) >>> df = read_drive(path=”Data Science/folder1/folder2/file.csv”)

Raises:
  • LeafGoogleDriveException – If both url and path are None or both are not None.

  • LeafGoogleDriveException – If file is not found on Google Drive.

  • LeafGoogleDriveException – If unable to download file from Google Drive.

  • ValueError – If neither pd.read_csv nor pd.read_excel can read the file.

Returns:

DataFrame read from file.

Return type:

pd.DataFrame

Parameters:
  • url (Optional[str]) –

  • path (Optional[str]) –

  • cache (bool) –

leaf_engine.read_params(params_path: str | pathlib.Path) dict
Parameters:

params_path (Union[str, pathlib.Path]) –

Return type:

dict

leaf_engine.resolve_geocoding(df: pandas.DataFrame) pandas.DataFrame
Parameters:

df (pandas.DataFrame) –

Return type:

pandas.DataFrame

leaf_engine.resolve_miles(df: pandas.DataFrame) pandas.DataFrame
Parameters:

df (pandas.DataFrame) –

Return type:

pandas.DataFrame

leaf_engine.setup(input_params: dict | str | pathlib.Path, log_file_name: str | None = None, enable_log_git: bool = True, enable_log_params: bool = True) None
Parameters:
Return type:

None

leaf_engine.to_drive(df: pandas.DataFrame, path: str, overwrite: bool = False, **kwargs) str

Write a DataFrame to a file on Google Drive, at the specified path.

Parameters:
  • df (pd.DataFrame) – DataFrame to write.

  • path (str) – Path to write to. First part of path is the drive name.

  • overwrite (bool, optional) – Overwrite existing file. Defaults to False.

Return type:

str

Examples: >>> df.pipe(gdrive.to_drive, “drive_name/folder1/folder2/file_name.csv”) >>> df.pipe(gdrive.to_drive, “drive_name/folder1/folder2/file_name.xlsx”)

Raises:
  • LeafGoogleDriveException – Raised if overwrite is False and file exists.

  • LeafGoogleDriveException – Raised if path is invalid.

  • LeafGoogleDriveException – Raised if write response does not contain file ID.

Returns:

Google Drive file URL.

Return type:

str

Parameters:
leaf_engine.upload_to_drive(local_path: str | pathlib.Path, drive_path: str, overwrite: bool = False) str

Uploads file to Google Drive.

Parameters:
  • local_path (Union[str, Path]) – Local path of file to upload.

  • drive_path (str) – Google Drive path to uploaded to. First part of path is the drive name. Must include file name: “drive_name/folder1/folder2/file_name.zip”.

  • overwrite (bool) – Overwrite existing file. Defaults to False.

Raises:
  • LeafGoogleDriveException – Raised if overwrite is False and file exists.

  • LeafGoogleDriveException – Raised if unable to upload file.

Returns:

Google Drive URL of uploaded file.

Return type:

str

leaf_engine.uuid_pipeline(df, set_shipment_uuid=True, set_lane_uuid=False)
leaf_engine.validate_pipeline(df, params=None)