leaf_engine.etl.uuid

This module generates UUIDs for all entities in the shipment data: locations, lanes, and shipments. These UUIDs are used when inserting entities into the analytics database and should be preserved throughout our data processing pipelines so that we can cross- reference entities across (any) data stores.

For instance, links between the analytics DB and the platform DB will make use of the UUIDs set here.

The Location entity can be of two types: - POINT location - CLUSTER location

A CLUSTER location is a collection of one or more POINT locations. A POINT location has a CLUSTER as its parent (i.e., it has a parent ID referencing a CLUSTER location).

The Lane entity can be of two types: - PTP lane - POWER lane

A POWER lane is a collection of one or more PTP lanes. A PTP lane has a POWER lane as its parent (i.e., it has a parent ID referencing a POWER location).

UUIDs are set by comparing the shipper provided data with existing data in the analytics DB. For instance, POINT locations are compared using their geometries. If a shipper’s location is geocoded to the same geometry as a location in the DB (i.e., a previously loaded location for the same shipper) its UUID is set to the DB ID of the matching location. Similarly for CLUSTER locations.

Lane IDs are set by comparing origin-destination location IDs. If a lane already has both origin and destination UUIDs, its ID is looked up in the API lane records for that shipper. If the lane does not have both origin and destination UUIDs, it’s ID is generated here. Similarly, power lane IDs are set by checking cluster IDs.

Functions

_get_lane_id_map(→ Dict[str, str])

Returns a mapping from API origin-destination location IDs to API lane IDs.

_get_location_id_map(→ Dict[str, str])

Returns a mapping from API location geometry strings to API location IDs.

_set_lane_uuids(→ pandas.DataFrame)

Sets internal lane ID to row hash values.

_set_or_get_uuid(→ str)

_set_point_location_uuids(→ pandas.DataFrame)

Sets point location ID for points where it is not already set.

_set_power_lane_uuids(→ pandas.DataFrame)

_set_ptp_lane_uuids(→ pandas.DataFrame)

_set_routing_uuids(→ pandas.DataFrame)

Sets routing route/lane IDs for power lanes where available (based on

_set_shipment_uuids(→ pandas.DataFrame)

Sets internal shipment ID to row hash values.

compute_row_hash(df, column_name, hash_columns)

uuid_pipeline(df[, set_shipment_uuid, set_lane_uuid])

Module Contents

leaf_engine.etl.uuid._get_lane_id_map(api_lanes_df: pandas.DataFrame) Dict[str, str]

Returns a mapping from API origin-destination location IDs to API lane IDs.

The key in the returned dictionary is a tuple of the form: (origin_location_id, destination_location_id) -> id (API ID of lane).

Parameters:

api_lanes_df (pandas.DataFrame) –

Return type:

Dict[str, str]

leaf_engine.etl.uuid._get_location_id_map(api_locations_df: pandas.DataFrame) Dict[str, str]

Returns a mapping from API location geometry strings to API location IDs.

Parameters:

api_locations_df (pandas.DataFrame) –

Return type:

Dict[str, str]

leaf_engine.etl.uuid._set_lane_uuids(df: pandas.DataFrame) pandas.DataFrame

Sets internal lane ID to row hash values.

There is an ID associated with each lane in the id column. This ID can be set in the mapping process to the shipper provided TMS ID, but is sometimes set to a random UUID. If lane_id values are set to UUIDs, raises a ValueError indicating that the mappings should be updated to use row hashing instead of UUIDs where unique IDs are needed.

Only the internal_lane_id column set below is used when loading lanes into the DB.

Parameters:

df (pd.DataFrame) – lanes DF.

Raises:
  • ValueError – Raised when the lane_id column contains UUIDs.

  • These UUIDs should be replaced with row hashes at the mapping stage.

Returns:

lanes DF with internal_lane_id column.

Return type:

pd.DataFrame

leaf_engine.etl.uuid._set_or_get_uuid(value: str | tuple, type: str, id_map: Dict[Hashable, str]) str
Parameters:
  • value (Union[str, tuple]) –

  • type (str) –

  • id_map (Dict[Hashable, str]) –

Return type:

str

leaf_engine.etl.uuid._set_point_location_uuids(df: pandas.DataFrame, api_points_df: pandas.DataFrame) pandas.DataFrame

Sets point location ID for points where it is not already set.

Parameters:
Return type:

pandas.DataFrame

leaf_engine.etl.uuid._set_power_lane_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) pandas.DataFrame
Parameters:
Return type:

pandas.DataFrame

leaf_engine.etl.uuid._set_ptp_lane_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) pandas.DataFrame
Parameters:
Return type:

pandas.DataFrame

leaf_engine.etl.uuid._set_routing_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) pandas.DataFrame

Sets routing route/lane IDs for power lanes where available (based on power_lane_id), generates ULIDs for routing routes/lanes where IDs are not available.

Parameters:
  • df (pd.DataFrame) – Shipments DataFrame.

  • api_lanes_df (pd.DataFrame) – Lanes DataFrame from API.

Returns:

Shipments DataFrame with routing route/lane IDs set.

Return type:

pd.DataFrame

leaf_engine.etl.uuid._set_shipment_uuids(df: pandas.DataFrame) pandas.DataFrame

Sets internal shipment ID to row hash values.

There is an ID associated with each shipment in the shipment_id column. This ID can be set in the mapping process to the shipper provided TMS ID, but is sometimes set to a random UUID. If shipment_id values are set to UUIDs, raises a ValueError indicating that the mappings should be updated to use row hashing instead of UUIDs where unique IDs are needed.

Only the internal_shipment_id column set below is used when loading shipments into the DB.

Parameters:

df (pd.DataFrame) – Shipments DF.

Raises:
  • ValueError – Raised when the shipment_id column contains UUIDs.

  • These UUIDs should be replaced with row hashes at the mapping stage.

Returns:

Shipments DF with internal_shipment_id column.

Return type:

pd.DataFrame

leaf_engine.etl.uuid.compute_row_hash(df, column_name, hash_columns)
leaf_engine.etl.uuid.uuid_pipeline(df, set_shipment_uuid=True, set_lane_uuid=False)