Versioning and Deprecation Policy
Zalando is committed to maintaining stability for your integrations while continuously improving our shared datasets. This document outlines how we handle dataset changes and what you can expect when updates are released.
Understanding Dataset Changes
When datasets are updated, changes fall into two categories:
Non-Breaking Updates
These updates are applied directly to your current dataset without requiring you to change your integration. A new version of the dataset is not released for non-breaking updates, so the reference to the shared object remains the same. Non-breaking updates include:
- New Columns: Additional metrics or dimensions are added.
- Documentation Improvements: Column descriptions and documentation are enhanced for clarity.
- Historical Data Expansion: Data retention is extended with efficient filtering capabilities.
For non-breaking changes, we will update the changelog on the dataset page to document the updates and ensure you are informed of all modifications. You are responsible for ensuring that your data processing pipelines can handle the non-breaking changes. For more details, please see the best practices.
Breaking Changes
These updates require you to update your integration and migrate to a new dataset version with different table reference (see Version Naming below). Breaking changes include:
- Schema Changes: Columns are removed, renamed, or their data types are changed.
- Calculation Logic Changes: KPI formulas or calculation methods are significantly modified, affecting reporting results.
- Structural Changes: The dataset structure changes (for example, moving from a single table to multiple tables, or changing table/share names).
- Partition Changes: Table partition columns are added, modified, or removed.
How We Manage Versioning
We follow semantic versioning for all dataset releases. Version numbers are structured as MAJOR.MINOR.PATCH:
- MAJOR: Incremented when breaking changes are introduced. This results in a new dataset table reference with an updated version suffix (e.g.,
_v2,_v3). Major changes are visible in the Databricks share and table descriptions. - MINOR: Incremented when non-breaking changes are added (e.g., new columns, improved documentation). These changes do not affect the table share name and are applied directly to your current dataset.
- PATCH: Incremented for non-breaking fixes and updates to existing data. Patch releases may happen regularly, and we do not provide constant changelogs for every patch update.
For information about minor and patch changes, refer to the dataset documentation and changelog on the dataset page.
When a breaking change is necessary, we release it as a new version of the dataset alongside your current version, giving you time to prepare.
Migration Timeline
- Parallel Availability: New version is published alongside the previous version, allowing you time to update your pipelines and test the new structure. The old version will remain available for two months after the new version is released, unless otherwise specified in our communications.
- Communication: You will be notified in advance of any breaking changes and migration timelines.
Version Naming
When breaking changes are introduced, datasets use a standardized naming convention to make versions easy to identify:
- Format:
<share_object_name>:<schema_name>:<table_name{_vN}> - Version Numbers: The version suffix
_vN(where N is the version number) is appended to the table name for versions higher than 1. - Example:
product_perf_metrics_config_daily_share.direct_data_sharing.product_perf_metrics_config_daily_v2
Dataset Deprecation
If a dataset version is deprecated without a replacement, the deprecated dataset will remain available for four months after the deprecation announcement, unless otherwise specified in our communications.
Contact Support