leaf_engine.etl.uuid ==================== .. py:module:: leaf_engine.etl.uuid .. autoapi-nested-parse:: Entity UUID generation pipeline. ================================ This module generates UUIDs for all entities in the shipment data: locations, lanes, and shipments. These UUIDs are used when inserting entities into the analytics database and should be preserved throughout our data processing pipelines so that we can cross- reference entities across (any) data stores. For instance, links between the analytics DB and the platform DB will make use of the UUIDs set here. The Location entity can be of two types: - POINT location - CLUSTER location A CLUSTER location is a collection of one or more POINT locations. A POINT location has a CLUSTER as its parent (i.e., it has a parent ID referencing a CLUSTER location). The Lane entity can be of two types: - PTP lane - POWER lane A POWER lane is a collection of one or more PTP lanes. A PTP lane has a POWER lane as its parent (i.e., it has a parent ID referencing a POWER location). UUIDs are set by comparing the shipper provided data with existing data in the analytics DB. For instance, POINT locations are compared using their geometries. If a shipper's location is geocoded to the same geometry as a location in the DB (i.e., a previously loaded location for the same shipper) its UUID is set to the DB ID of the matching location. Similarly for CLUSTER locations. Lane IDs are set by comparing origin-destination location IDs. If a lane already has both origin and destination UUIDs, its ID is looked up in the API lane records for that shipper. If the lane does not have both origin and destination UUIDs, it's ID is generated here. Similarly, power lane IDs are set by checking cluster IDs. Functions --------- .. autoapisummary:: leaf_engine.etl.uuid._get_lane_id_map leaf_engine.etl.uuid._get_location_id_map leaf_engine.etl.uuid._set_lane_uuids leaf_engine.etl.uuid._set_or_get_uuid leaf_engine.etl.uuid._set_point_location_uuids leaf_engine.etl.uuid._set_power_lane_uuids leaf_engine.etl.uuid._set_ptp_lane_uuids leaf_engine.etl.uuid._set_routing_uuids leaf_engine.etl.uuid._set_shipment_uuids leaf_engine.etl.uuid.compute_row_hash leaf_engine.etl.uuid.uuid_pipeline Module Contents --------------- .. py:function:: _get_lane_id_map(api_lanes_df: pandas.DataFrame) -> Dict[str, str] Returns a mapping from API origin-destination location IDs to API lane IDs. The key in the returned dictionary is a tuple of the form: `(origin_location_id, destination_location_id) -> id` (API ID of lane). .. py:function:: _get_location_id_map(api_locations_df: pandas.DataFrame) -> Dict[str, str] Returns a mapping from API location geometry strings to API location IDs. .. py:function:: _set_lane_uuids(df: pandas.DataFrame) -> pandas.DataFrame Sets internal lane ID to row hash values. There is an ID associated with each lane in the `id` column. This ID can be set in the mapping process to the shipper provided TMS ID, but is sometimes set to a random UUID. If `lane_id` values are set to UUIDs, raises a ValueError indicating that the mappings should be updated to use row hashing instead of UUIDs where unique IDs are needed. Only the `internal_lane_id` column set below is used when loading lanes into the DB. :param df: lanes DF. :type df: pd.DataFrame :raises ValueError: Raised when the `lane_id` column contains UUIDs. :raises These UUIDs should be replaced with row hashes at the mapping stage.: :returns: lanes DF with `internal_lane_id` column. :rtype: pd.DataFrame .. py:function:: _set_or_get_uuid(value: Union[str, tuple], type: str, id_map: Dict[Hashable, str]) -> str .. py:function:: _set_point_location_uuids(df: pandas.DataFrame, api_points_df: pandas.DataFrame) -> pandas.DataFrame Sets point location ID for points where it is not already set. .. py:function:: _set_power_lane_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) -> pandas.DataFrame .. py:function:: _set_ptp_lane_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) -> pandas.DataFrame .. py:function:: _set_routing_uuids(df: pandas.DataFrame, api_lanes_df: pandas.DataFrame) -> pandas.DataFrame Sets routing route/lane IDs for power lanes where available (based on power_lane_id), generates ULIDs for routing routes/lanes where IDs are not available. :param df: Shipments DataFrame. :type df: pd.DataFrame :param api_lanes_df: Lanes DataFrame from API. :type api_lanes_df: pd.DataFrame :returns: Shipments DataFrame with routing route/lane IDs set. :rtype: pd.DataFrame .. py:function:: _set_shipment_uuids(df: pandas.DataFrame) -> pandas.DataFrame Sets internal shipment ID to row hash values. There is an ID associated with each shipment in the `shipment_id` column. This ID can be set in the mapping process to the shipper provided TMS ID, but is sometimes set to a random UUID. If `shipment_id` values are set to UUIDs, raises a ValueError indicating that the mappings should be updated to use row hashing instead of UUIDs where unique IDs are needed. Only the `internal_shipment_id` column set below is used when loading shipments into the DB. :param df: Shipments DF. :type df: pd.DataFrame :raises ValueError: Raised when the `shipment_id` column contains UUIDs. :raises These UUIDs should be replaced with row hashes at the mapping stage.: :returns: Shipments DF with `internal_shipment_id` column. :rtype: pd.DataFrame .. py:function:: compute_row_hash(df, column_name, hash_columns) .. py:function:: uuid_pipeline(df, set_shipment_uuid=True, set_lane_uuid=False)