Datetimes Transformers
Transform and validate datetimes data.
Usage
| datetime_text | standardized | date_only | time_only | year | month | quarter |
| 2024-01-15T14:30:00Z | 2024-01-15 14:30:00 | 2024-01-15 | 14:30:00 | 2024 | 1 | 1 |
| 01/15/2024 2:30 PM | 2024-01-15 14:30:00 | 2024-01-15 | 14:30:00 | 2024 | 1 | 1 |
| January 15, 2024 | 2024-01-15 00:00:00 | 2024-01-15 | 00:00:00 | 2024 | 1 | 1 |
| 15-Jan-2024 | 2024-01-15 00:00:00 | 2024-01-15 | 00:00:00 | 2024 | 1 | 1 |
| Q3 2024 | 2024-07-01 | 2024-07-01 | null | 2024 | 7 | 3 |
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from transformers.pyspark.datetimes import datetimes
# Initialize Spark
spark = SparkSession.builder.appName("DataCleaning").getOrCreate()
# Create sample data
data = [
("Meeting scheduled for 2024-01-15T14:30:00Z",),
("Invoice date: 01/15/2024",),
("Event on January 15, 2024 at 10:00 AM",),
("Report due 15-Jan-2024",),
]
df = spark.createDataFrame(data, ["text"])
# Extract and standardize datetime components
result_df = df.select(
F.col("text"),
datetimes.extract_datetime_from_text(F.col("text")).alias("extracted"),
datetimes.standardize_iso(F.col("text")).alias("standardized"),
datetimes.standardize_date(F.col("text")).alias("date_only"),
datetimes.extract_year(F.col("text")).alias("year"),
datetimes.extract_month(F.col("text")).alias("month"),
)
# Show results
result_df.show(truncate=False)
# Filter to valid dates
valid_dates = result_df.filter(datetimes.is_valid_date(F.col("text")))Installation
datacompose add datetimes API Reference
Extract Functions
datetimes.extract_datetime_from_text
Extract first datetime mention from free text. Supports ISO, US, EU formats, named months, natural language, and more.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing text with potential datetime |
datetimes.extract_year
Extract year from datetime string.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.extract_month
Extract month from datetime string (1-12).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.extract_day
Extract day from datetime string (1-31).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.extract_quarter
Extract quarter from datetime string (1-4).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.extract_week_of_year
Extract week number from datetime string (1-53).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.extract_day_of_week
Extract day of week from datetime string. Returns string name (Monday, Tuesday, etc).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
Transform Functions
datetimes.standardize_iso
Convert datetime strings to ISO 8601 format (YYYY-MM-DD HH:MM:SS). Attempts to parse common formats and standardize them.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.standardize_date
Extract and standardize just the date portion to YYYY-MM-DD format.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.standardize_time
Extract and standardize just the time portion to HH:MM:SS format.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
Validation Functions
datetimes.is_valid_date
Check if a string represents a valid date. Validates month (1-12), day (valid for month), and handles leap years.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text to validate |
datetimes.is_valid_datetimes
Check if a string represents a valid datetime. More comprehensive than is_valid_date, includes time validation.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text to validate |
datetimes.is_business_day
Check if a date falls on a business day (Monday-Friday).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
datetimes.is_future_date
Check if a date is in the future.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
reference_date optional | Column | Reference date column |
datetimes.is_past_date
Check if a date is in the past.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
reference_date optional | Column | Reference date column |
Utility Functions
datetimes.detect_format
Detect the datetime format of a string. Returns format string that can be used with to_timestamp().
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.parse_flexible
Parse datetime strings with multiple possible formats. Handles ISO, US, EU, named months, quarters, fiscal years, and more.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.parse_natural_language
Parse natural language date expressions.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing natural language date text |
reference_date optional | Column | Reference date for relative calculations |
datetimes.remove_timezone
Remove timezone information, keeping local time.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing timezone-aware datetime text |
datetimes.format_date
Format date according to specified pattern. Uses Java SimpleDateFormat patterns.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date/datetime text |
format required | Column | Format pattern string |
datetimes.to_unix_timestamp
Convert datetime to Unix timestamp (seconds since epoch).
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing datetime text |
datetimes.from_unix_timestamp
Convert Unix timestamp to datetime string.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing Unix timestamp |
datetimes.start_of_month
Get the first day of the month for a given date.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
datetimes.end_of_month
Get the last day of the month for a given date. Handles leap years correctly.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
datetimes.start_of_quarter
Get the first day of the quarter for a given date.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
datetimes.end_of_quarter
Get the last day of the quarter for a given date.
Parameters
| Property | Type | Description |
|---|---|---|
col required | Column | Column containing date text |
datetimes.format_duration
Format duration in seconds to human-readable string.
Parameters
| Property | Type | Description |
|---|---|---|
seconds_col required | Column | Column containing duration in seconds |