Skip to main content
Version: Next

DataHub Releases

Summary

VersionRelease DateLinks
v0.12.02023-10-25Release Notes, View on GitHub
v0.11.02023-09-08Release Notes, View on GitHub
v0.10.52023-08-02Release Notes, View on GitHub
v0.10.42023-06-09View on GitHub
v0.10.32023-05-25View on GitHub
v0.10.22023-04-13View on GitHub
v0.10.12023-03-23View on GitHub
v0.10.02023-02-07View on GitHub
v0.9.6.12023-01-31View on GitHub
v0.9.62023-01-13View on GitHub
v0.9.52022-12-23View on GitHub
v0.9.42022-12-20View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub
v0.8.412022-07-15View on GitHub
v0.8.402022-06-30View on GitHub
v0.8.392022-06-24View on GitHub
v0.8.382022-06-09View on GitHub
v0.8.372022-06-09View on GitHub
v0.8.362022-06-02View on GitHub
v0.8.352022-05-18View on GitHub
v0.8.342022-05-04View on GitHub
v0.8.332022-04-15View on GitHub
v0.8.322022-04-04View on GitHub

v0.12.0

Released on 2023-10-25 by @pedro93.

v0.12.0 Release Highlights

User Experience

Nested Domains

Nested Domains are here! This provides flexibility in organizing your entities within Domains to match the unique organizational structure of your company. <img width="1209" alt="CleanShot 2023-10-27 at 14 30 43@2x" src="https://github.com/datahub-project/datahub/assets/15873986/07e6754c-95cd-4552-8120-50bb2d3fa9ce">

DataHub Chrome Extension Improvements

The Acryl DataHub Chome extension now supports PowerBI! This is a super powerful way for your business users to gain DataHub-specific insights directly in the BI tools they use most. Additionally, we now support making edits back to DataHub Entities directly from the Chrome extension.

Access Management Tab for Datasets

Shoutout to @Ramendra761 from the PayPal Team for contributing a new Access Management tab in Dataset Entity pages! The aim of this feature is to enable users to view the required roles for accessing the Dataset, as defined by Roles and/or Policies in the organization’s Access Management System. It also introduces the ability to request access directly from the page. <img width="912" alt="CleanShot 2023-10-27 at 14 09 51@2x" src="https://github.com/datahub-project/datahub/assets/15873986/29d7bdda-864f-4cf8-bd7a-5be46413bba8">

Metadata Ingestion

Miscellaneous Improvements
  • Sampling-Based Profiling: You can now configure sampling-based profiling to address query performance concerns in Snowflake and BigQuery
  • Kafka Connect > Snowflake: We now support automatically defining lineage between the two platforms
  • Athena: Support for complex and nested schemas
Column-Level Lineage

We are incubating CLL support for the following:

  • Airflow plugin v2 now supports automatic extraction of CLL for certain operators, removing the need to annotate DAGs
  • dbt
  • Redshift
  • PowerBI (support for Column-Level Lineage for M-Query)
Incubating Sources
  • MLflow
  • Teradata
  • Unity Catalog Notebooks
  • DynamoDB

Developer Experience

  • Data Contracts: v0.12.0 introduces underlying models and CLI; UI support to follow
  • We now support creating custom models without requiring a fork of the main DataHub project
  • Updates to support OpenSearch 2.x and alternate Postgres db in postgres-setup
Other Notable Changes
  • Session token configuration has changed, all previously created session tokens will be invalid and users will be prompted to log in. Expiration time has also been shortened which may result in more login prompts with the default settings. There should be no other interruption due to this change.
Breaking Changes

Find full details here

  • #9044 - GraphQL APIs for adding ownership now expect either an ownershipTypeUrn referencing a customer ownership type or a (deprecated) type. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use the type parameter which will get translated to a custom ownership type internally if one exists for the type being added.
  • #9010 - In Redshift source's config incremental_lineage is set default to off.
  • #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
  • #8942 - Removed urn:li:corpuser:datahub owner for the Measure, Dimension and Temporal tags emitted by Looker and LookML source connectors.
  • #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
  • #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include pip install 'acryl-datahub-airflow-plugin[plugin-v2]'. To continue using the v1 plugin, set the DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN environment variable to true.
  • #8943 - The Unity Catalog ingestion source has a new option include_metastore, which will cause all urns to be changed when disabled. This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future. If stateful ingestion is enabled, simply setting include_metastore: false will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI: datahub delete --platform databricks --soft and then reingesting with include_metastore: false.
  • #8846 - Changed enum values in resource filters used by policies. RESOURCE_TYPE became TYPE and RESOURCE_URN became URN. Any existing policies using these filters (i.e. defined for particular urns or types such as dataset) need to be upgraded manually, for example by retrieving their respective dataHubPolicyInfo aspect and changing part using filter i.e.
   "resources": {
"filter": {
"criteria": [
{
"field": "RESOURCE_TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}

into

   "resources": {
"filter": {
"criteria": [
{
"field": "TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}

for example, using datahub put command. Policies can also be removed and re-created via UI.

  • #9077 - The BigQuery ingestion source by default sets match_fully_qualified_names: true. This means that any dataset_pattern or schema_pattern specified will be matched on the fully qualified dataset name, i.e. &lt;project_name>.&lt;dataset_name>. We attempt to support the old pattern format by prepending .*\\. to dataset patterns lacking a period, so in most cases this should not cause any issues. However, if you have a complex dataset pattern, we recommend you manually convert it to the fully qualified format to avoid any potential issues.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.11.0...v0.12.0

v0.11.0

Released on 2023-09-08 by @iprentic.

Release Highlights

Potential Downtime

This release introduces substantial improvements to search ranking which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as a rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

User Experience

New Search and Browse Experience

We have some really exciting improvements to the DataHub user experience in this release! The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.

<div> <a href="https://www.loom.com/share/10a5de90e7084e98b3a84fa1dc83a825"> <p> Learn all about the new Search and Browse experience! </p> </a> <a href="https://www.loom.com/share/10a5de90e7084e98b3a84fa1dc83a825"> <img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/10a5de90e7084e98b3a84fa1dc83a825-with-play.gif"> </a> </div>

In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.

<div> <a href="https://www.loom.com/share/97abf74703d04457b96da3fed041089d"> <p>See the Search improvements in action!</p> </a> <a href="https://www.loom.com/share/97abf74703d04457b96da3fed041089d"> <img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/97abf74703d04457b96da3fed041089d-1693606777695-with-play.gif"> </a> </div>

Manage Home Page Posts

In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.

OpenAPI Endpoints Expanded

OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.

Metadata ingestion

Added support for Confluent S3 Sink Connector, extracting stored procedures and jobs from mssql, and snowflake shares. Additionally, sql parsing source now converts query logs into CLL and usage.

Developer Experience

The CLI now supports recursive deletes.

Versioned documentation

Starting from this release, we support versioned documentation on the datahub docs site! Select the version you’re on and browse docs specifically at that version.

Performance Improvements

  • Batching of default aspects on initial ingestion (SQL)
  • Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
  • Gradle 7 upgrade moderately improves build speed
  • DataHub Ingestion slim images reduced in size by 2GB+

Important Bug Fixes

  • Glue Schema Registry fixed

Deprecation Notice

  • MAE Events are no longer produced. MAE events have been deprecated for over a year.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.5...v0.11.0

v0.10.5

Released on 2023-08-02 by @david-leifker.

Release Highlights

NEW: Unified Search and Browse Experience

It’s here, it’s here! We are incredibly excited to roll out our re-designed, streamlined Search and Browse experience. End-users now have a one-stop-shop to search for specific data entities and browse across systems, making it easier than ever to find the most relevant and meaningful resources within DataHub.

Checkout the screenshot below and get a full walk-through in this video!

<img width="1041" alt="CleanShot 2023-08-03 at 14 47 55@2x" src="https://github.com/datahub-project/datahub/assets/15873986/2f47d033-6c2b-483a-951d-e6d6b807f0d0">

User Experience

  • Column-Level Lineage (CLL) visualization update: you can now visualize CLL relationships through DataJobs (i.e. Airflow DAGs)
  • Unique Glossary Terms: We now prevent creating duplicate Glossary Term names within a Term Group
  • Domains: You can now configure the Documentation tab to be the default landing page within a Domain
  • Formatting updates to Row Count to make large numbers more human readable (ie. 3283337 > 3.2M)
  • Stats Tab: Y-axis scale now dynamically set to reflect the minimum & maximum values, improving readability

Metadata ingestion

Ingestion Enhancements:

  • BigQuery: Set platform_instance using project_id
  • PowerBI: Ingest datasets not used in visualizations (tiles/pages
  • Kafka Connect: Ability to set platform_instance
  • Nifi: Support for basic auth
  • Presto on Hive: Extract all table properties from Hive Metastore
  • Elasticsearch: Support for basic profiling
  • Add advanced configuration for LDAP manager ingestion

Lineage Improvements:

  • Schema-aware SQL parsing to derive column-level lineage
  • Column-level lineage support for BigQuery, Tableau, and Snowflake View definitions
  • Snowflake: Extract Snowpipe S3 lineage

Developer Experience

  • Fine-grained ownership policies
  • PATCH support for DataJob Inputs/Outputs
  • New endpoints to extract size of time-series indices and truncate/cleanup time-series indices in Elasticsearch; support for bulk-deletes
  • Initial support for exception reporting via Sentry
  • New OpenAPI endpoint to get Task Status
  • SDK: Easily generate container URNs

Docs

  • Improvements to our File-Based Lineage doc, specifically focused on Fine-Grained Lineage config components (link)
  • Code examples of how to manage Posts within DataHub (link)
  • Guide to generating custom browse paths for the new search experience (link)

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.4...v0.10.5

v0.10.4

Released on 2023-06-09 by @pedro93.

View the release notes for v0.10.4 on GitHub.

v0.10.3

Released on 2023-05-25 by @iprentic.

View the release notes for v0.10.3 on GitHub.

DataHub v0.10.2

Released on 2023-04-13 by @iprentic.

View the release notes for DataHub v0.10.2 on GitHub.

DataHub v0.10.1

Released on 2023-03-23 by @aditya-radhakrishnan.

View the release notes for DataHub v0.10.1 on GitHub.

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

View the release notes for DataHub v0.10.0 on GitHub.

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

View the release notes for DataHub v0.9.6.1 on GitHub.

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.

View the release notes for DataHub v0.9.6 on GitHub.

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

View the release notes for DataHub v0.9.5 on GitHub.

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.

v0.8.41

Released on 2022-07-15 by @anshbansal.

View the release notes for v0.8.41 on GitHub.

v0.8.40

Released on 2022-06-30 by @gabe-lyons.

View the release notes for v0.8.40 on GitHub.

v0.8.39

Released on 2022-06-24 by @maggiehays.

View the release notes for v0.8.39 on GitHub.

[!] DataHub v0.8.38

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.38 on GitHub.

[!] DataHub v0.8.37

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.37 on GitHub.

DataHub V0.8.36

Released on 2022-06-02 by @treff7es.

View the release notes for DataHub V0.8.36 on GitHub.

[!] DataHub v0.8.35

Released on 2022-05-18 by @dexter-mh-lee.

View the release notes for [!] DataHub v0.8.35 on GitHub.

v0.8.34

Released on 2022-05-04 by @maggiehays.

View the release notes for v0.8.34 on GitHub.

DataHub v0.8.33

Released on 2022-04-15 by @dexter-mh-lee.

View the release notes for DataHub v0.8.33 on GitHub.

DataHub v0.8.32

Released on 2022-04-04 by @dexter-mh-lee.

View the release notes for DataHub v0.8.32 on GitHub.