CLI Reference
The DataCompose CLI provides commands to initialize projects, add transformers, and manage your data transformation pipeline.
Installation
pip install datacompose Commands Overview
Initialize a Project
datacompose init [--yes] [--force] Creates a datacompose.json configuration file with default settings.
Options:
--yes,-y: Auto-accept all defaults--force: Overwrite existing configuration file
Add Transformers
datacompose add <transformer> [--output OUTPUT] [--verbose] [--force] Generate production-ready transformation code for the specified transformer.
Arguments:
<transformer>: Name of the transformer to add (e.g.,emails,addresses,phone_numbers)
Options:
--output,-o: Output directory (default:./transformers/pyspark)--verbose,-v: Enable verbose output--force: Overwrite existing files
Examples:
# Add email transformers
datacompose add emails
# Add address transformers to custom directory
datacompose add addresses --output ./custom/path
# Add phone transformers with verbose output
datacompose add phone_numbers --verbose
# Force overwrite existing transformers
datacompose add emails --force List Available Resources
datacompose list transformers
datacompose list generators Display available transformers and code generators.
Show Version
datacompose --version Get Help
datacompose --help
datacompose <command> --help Configuration File
The datacompose.json file controls DataCompose behavior:
{
"version": "1.0.0",
"targets": {
"pyspark": {
"output": "./transformers/pyspark",
"generator": "SparkPandasUDFGenerator"
}
},
"templates": {
"directory": "src/transformers/templates"
}
} Configuration Options
- version: DataCompose configuration version
- targets: Platform-specific settings
- output: Where to generate code
- generator: Which code generator to use
- templates: Custom template settings
- directory: Path to custom templates
Project Structure
After running datacompose add, your project will have:
project/
├── datacompose.json # Configuration file
├── transformers/
│ └── pyspark/
│ ├── emails.py # Email transformation primitives
│ ├── addresses.py # Address transformation primitives
│ ├── phone_numbers.py # Phone number transformation primitives
│ └── utils.py # Core framework and PrimitiveRegistry Update Strategies
When updating transformers, you have several options:
- Regenerate: Use
datacompose add --forceto overwrite existing files - Merge: Generate to a temporary location and manually merge changes
- Extend: Create wrapper functions that call generated code
- Fork: Copy and rename for complete independence
Best Practice: Always use version control and review changes before merging.
Environment Variables
DataCompose respects the following environment variables:
DATACOMPOSE_CONFIG: Path to configuration file (default:./datacompose.json)DATACOMPOSE_OUTPUT: Default output directoryDATACOMPOSE_VERBOSE: Enable verbose output by default
Common Workflows
Starting a New Project
# Initialize DataCompose
datacompose init --yes
# Or force overwrite existing config
datacompose init --force
# Add all common transformers
datacompose add emails
datacompose add addresses
datacompose add phone_numbers Updating Existing Transformers
# Backup existing code
cp -r transformers/pyspark transformers/pyspark.backup
# Regenerate transformers
datacompose add emails --force
# Compare changes
diff -r transformers/pyspark.backup transformers/pyspark Custom Output Locations
# Generate to specific locations
datacompose add emails --output src/transformers/email
datacompose add addresses --output src/transformers/address
datacompose add phone_numbers --output src/transformers/phone Troubleshooting
Command Not Found
If datacompose is not found after installation:
# Check if it's in your PATH
which datacompose
# Or run directly with Python
python -m datacompose init Permission Errors
If you encounter permission errors:
# Install in user space
pip install --user datacompose
# Or use a virtual environment
python -m venv venv
source venv/bin/activate
pip install datacompose Configuration Issues
To reset configuration:
# Remove existing config
rm datacompose.json
# Reinitialize
datacompose init Next Steps
- Getting Started Guide - Learn the basics
- Transformers Documentation - Explore available transformers
- API Reference - Detailed API documentation