-
Notifications
You must be signed in to change notification settings - Fork 35
Add save_results_in_table function #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
def save_results_in_table( | ||
self, | ||
quarantine_df: DataFrame | None = None, | ||
output_df: DataFrame | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add overwrite: bool = True
option so that new data can also be appended.
assume_user: bool = True, | ||
): | ||
""" | ||
Save quarantine and output data to the `quarantine_table` and `output_table` mentioned in the installation config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save quarantine and output data to the `quarantine_table` and `output_table` mentioned in the installation config file. | |
Save quarantine and output data to the `quarantine_table` and `output_table` defined in the installation config file. |
@@ -618,6 +618,38 @@ def save_checks_in_installation( | |||
) | |||
installation.upload(run_config.checks_file, yaml.safe_dump(checks).encode('utf-8')) | |||
|
|||
def save_results_in_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update documentation and tool demo accordingly.
@@ -0,0 +1,76 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to unit tests (tests/unit
folder). I would generally avoid extensive mocking as it makes refactoring later difficult and tests hard to understand. Also please don't relay on private methods for testing.
For integration testing please setup a proper dataframe and make sure the data is actually saved in the tables as defined in the config. We have plenty of test examples where we setup and use a config.
mock_run_config = MagicMock() | ||
mock_run_config.quarantine_table = "quarantine_table" | ||
mock_run_config.output_table = None | ||
mock_engine._load_run_config.return_value = mock_run_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use private methods in tests. It breaks encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new save_results_in_table
method to DQEngine
that writes quarantine and output DataFrames to Delta tables based on the run configuration, and includes integration tests to verify its behavior.
- Introduces
save_results_in_table
inengine.py
with parameters for quarantine and output DataFrames, run config name, product, and install assumption. - Writes DataFrames to Delta tables in overwrite or append mode, guarded by presence of corresponding table names in the config.
- Adds integration tests covering scenarios: quarantine only, output only, both, and neither.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/databricks/labs/dqx/engine.py | Added save_results_in_table method with Delta writes |
tests/integration/test_save_results_in_table.py | Added integration tests for the new method |
Comments suppressed due to low confidence (1)
tests/integration/test_save_results_in_table.py:67
- This test checks config loading but does not verify that no write operations occur on DataFrames. Consider adding assertions that neither
quarantine_df.write
noroutput_df.write
is called when both are None.
def test_save_results_in_table_no_data(mock_engine):
@@ -0,0 +1,76 @@ | |||
import pytest | |||
from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The patch
import is never used in this file. Consider removing it to clean up unused imports.
from unittest.mock import MagicMock, patch | |
from unittest.mock import MagicMock |
Copilot uses AI. Check for mistakes.
if quarantine_df is not None and run_config.quarantine_table: | ||
logger.info( | ||
f"Saving quarantine data to {run_config.quarantine_table} table" | ||
) | ||
quarantine_df.write.format("delta").mode("overwrite").saveAsTable(run_config.quarantine_table) | ||
|
||
if output_df is not None and run_config.output_table: | ||
logger.info( | ||
f"Saving output data to {run_config.output_table} table" | ||
) | ||
output_df.write.format("delta").mode("append").saveAsTable(run_config.output_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The write logic for quarantine and output DataFrames is nearly identical except for mode and table name. Consider extracting a helper function to reduce duplication and improve readability.
if quarantine_df is not None and run_config.quarantine_table: | |
logger.info( | |
f"Saving quarantine data to {run_config.quarantine_table} table" | |
) | |
quarantine_df.write.format("delta").mode("overwrite").saveAsTable(run_config.quarantine_table) | |
if output_df is not None and run_config.output_table: | |
logger.info( | |
f"Saving output data to {run_config.output_table} table" | |
) | |
output_df.write.format("delta").mode("append").saveAsTable(run_config.output_table) | |
def _save_dataframe(self, df: DataFrame, table_name: str, mode: str, description: str): | |
""" | |
Helper method to save a DataFrame to a Delta table. | |
:param df: The DataFrame to save | |
:param table_name: The name of the Delta table | |
:param mode: The write mode (e.g., "overwrite", "append") | |
:param description: Description of the data being saved (for logging purposes) | |
""" | |
logger.info(f"Saving {description} to {table_name} table") | |
df.write.format("delta").mode(mode).saveAsTable(table_name) | |
if quarantine_df is not None and run_config.quarantine_table: | |
self._save_dataframe( | |
df=quarantine_df, | |
table_name=run_config.quarantine_table, | |
mode="overwrite", | |
description="quarantine data" | |
) | |
if output_df is not None and run_config.output_table: | |
self._save_dataframe( | |
df=output_df, | |
table_name=run_config.output_table, | |
mode="append", | |
description="output data" | |
) |
Copilot uses AI. Check for mistakes.
Changes
Provide functionality to write output and quarantine data to tables as defined in the config.
Linked issues
Resolves #167
Tests