Address Transformers

Extract, validate, and standardize address components from unstructured text.

Usage


| address                                       | street_number   | street_name   | city        | state   | zip   |
| 123 Main St,   New York, NY 10001             | 123             | Main          | New York    | NY      | 10001 |
| 456 oak ave apt 5b, los angeles, ca 90001     | 456             | Oak           | Los Angeles | CA      | 90001 |
| 789 ELM STREET CHICAGO IL  60601              | 789             | Elm           | Chicago     | IL      | 60601 |
| 321 pine rd. suite 100,, boston massachusetts | 321             | Pine          | Boston      | MA      | null  |
| PO Box 789, Atlanta, GA 30301                 | null            | null          | Atlanta     | GA      | 30301 |

Installation

datacompose add addresses

API Reference

Extract Functions

addresses.extract_street_number

Extract street/house number from address. Extracts the numeric portion at the beginning of an address. Handles various formats: 123, 123A, 123-125, etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_street_prefix

Extract directional prefix from street address. Extracts directional prefixes like N, S, E, W, NE, NW, SE, SW.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_street_name

Extract street name from address. Extracts the main street name, excluding number, prefix, and suffix.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_street_suffix

Extract street type/suffix from address. Extracts street type like Street, Avenue, Road, Boulevard, etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_full_street

Extract complete street address (number + prefix + name + suffix). Extracts everything before apartment/suite and city/state/zip.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_apartment_number

Extract apartment/unit number from address. Extracts apartment, suite, unit, or room numbers including: Apt 5B, Suite 200, Unit 12, #4A, Rm 101, etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_floor

Extract floor number from address. Extracts floor information like: 5th Floor, Floor 2, Fl 3, Level 4, etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_building

Extract building name or identifier from address. Extracts building information like: Building A, Tower 2, Complex B, Block C, etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_unit_type

Extract the type of unit (Apt, Suite, Unit, etc.) from address.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_secondary_address

Extract complete secondary address information (unit type + number). Combines unit type and number into standard format: "Apt 5B", "Ste 200", "Unit 12", etc.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_zip_code

Extract US ZIP code (5-digit or ZIP+4 format) from text. Returns empty string for null/invalid inputs.

addresses.extract_city

Extract city name from US address text. Extracts city by finding text before state abbreviation or ZIP code. Handles various formats including comma-separated and multi-word cities.

Parameters

Property Type Description
col required
Column
Column containing address text
custom_cities required
Column
Optional list of custom city names to recognize (case-insensitive)

addresses.extract_state

Extract and standardize state to 2-letter abbreviation. Handles both full state names and abbreviations, case-insensitive. Returns standardized 2-letter uppercase abbreviation.

Parameters

Property Type Description
col required
Column
Column containing address text with state information
custom_states required
Column
Optional dict mapping full state names to abbreviations

addresses.extract_country

Extract country from address. Extracts country names from addresses, handling common variations and abbreviations. Returns standardized country name.

Parameters

Property Type Description
col required
Column
Column containing address text with potential country

addresses.extract_po_box

Extract PO Box number from address. Extracts PO Box, P.O. Box, POB, Post Office Box numbers. Handles various formats including with/without periods and spaces.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.extract_private_mailbox

Extract private mailbox (PMB) number from address. Extracts PMB or Private Mail Box numbers, commonly used with commercial mail receiving agencies (like UPS Store).

Parameters

Property Type Description
col required
Column
Column containing address text

Transform Functions

addresses.standardize_street_prefix

Standardize street directional prefixes to abbreviated form. Converts all variations to standard USPS abbreviations: North/N/N. → N, South/S/S. → S, etc.

Parameters

Property Type Description
col required
Column
Column containing street prefix
custom_mappings required
Column
Optional dict of custom prefix mappings (case insensitive)

addresses.standardize_street_suffix

Standardize street type/suffix to USPS abbreviated form. Converts all variations to standard USPS abbreviations per the config: Street/St/St. → St, Avenue/Ave/Av → Ave, Boulevard → Blvd, etc.

Parameters

Property Type Description
col required
Column
Column containing street suffix
custom_mappings required
Column
Optional dict of custom suffix mappings (case insensitive)

addresses.standardize_unit_type

Standardize unit type to common abbreviations. Converts all variations to standard abbreviations: Apartment/Apt. → Apt, Suite → Ste, Room → Rm, etc.

Parameters

Property Type Description
col required
Column
Column containing unit type
custom_mappings required
Column
Optional dict of custom unit type mappings

addresses.standardize_zip_code

Standardize ZIP code format. - Removes extra spaces - Ensures proper dash placement for ZIP+4 - Returns empty string for invalid formats

Parameters

Property Type Description
col required
Column
Column containing ZIP codes to standardize

addresses.standardize_city

Standardize city name formatting. - Trims whitespace - Normalizes internal spacing - Applies title case (with special handling for common patterns) - Optionally applies custom city name mappings

Parameters

Property Type Description
col required
Column
Column containing city names to standardize
custom_mappings required
Column
Optional dict for city name corrections/standardization

addresses.standardize_state

Convert state to standard 2-letter format. Converts full names to abbreviations and ensures uppercase.

Parameters

Property Type Description
col required
Column
Column containing state names or abbreviations

addresses.standardize_country

Standardize country name to consistent format. Converts various country representations to standard names.

Parameters

Property Type Description
col required
Column
Column containing country name or abbreviation
custom_mappings required
Column
Optional dict of custom country mappings

addresses.standardize_po_box

Standardize PO Box format to consistent representation. Converts various PO Box formats to standard "PO Box XXXX" format.

Parameters

Property Type Description
col required
Column
Column containing PO Box text

Validation Functions

addresses.has_apartment

Check if address contains apartment/unit information.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.validate_zip_code

Validate if a ZIP code is in correct US format. Validates: - 5-digit format (e.g., "12345") - ZIP+4 format (e.g., "12345-6789") - Not all zeros (except "00000" which is technically valid) - Within valid range (00001-99999 for base ZIP)

Parameters

Property Type Description
col required
Column
Column containing ZIP codes to validate

addresses.is_valid_zip_code

Alias for validate_zip_code for consistency.

Parameters

Property Type Description
col required
Column
Column containing ZIP codes to validate

addresses.validate_city

Validate if a city name appears valid. Validates: - Not empty/null - Within reasonable length bounds - Contains valid characters (letters, spaces, hyphens, apostrophes, periods) - Optionally: matches a list of known cities

Parameters

Property Type Description
col required
Column
Column containing city names to validate
known_cities required
Column
Optional list of valid city names to check against
min_length required
Column
Minimum valid city name length
max_length required
Column
Maximum valid city name length

addresses.validate_state

Validate if state code is a valid US state abbreviation. Checks against list of valid US state abbreviations including territories.

Parameters

Property Type Description
col required
Column
Column containing state codes to validate

addresses.has_country

Check if address contains country information.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.has_po_box

Check if address contains PO Box.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.is_po_box_only

Check if address is ONLY a PO Box (no street address).

Parameters

Property Type Description
col required
Column
Column containing address text

Utility Functions

addresses.remove_secondary_address

Remove apartment/suite/unit information from address. Removes secondary address components to get clean street address.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.get_zip_code_type

Determine the type of ZIP code.

Parameters

Property Type Description
col required
Column
Column containing ZIP codes

addresses.split_zip_code

Split ZIP+4 code into base and extension components.

Parameters

Property Type Description
col required
Column
Column containing ZIP codes

addresses.get_state_name

Convert state abbreviation to full name.

Parameters

Property Type Description
col required
Column
Column containing 2-letter state abbreviations

addresses.remove_country

Remove country from address. Removes country information from the end of addresses.

Parameters

Property Type Description
col required
Column
Column containing address text

addresses.remove_po_box

Remove PO Box from address. Removes PO Box information while preserving other address components.

Parameters

Property Type Description
col required
Column
Column containing address text