Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should append sometimes or always create UNION ALL BY NAME? #5165

Open
kgutwin opened this issue Feb 27, 2025 · 0 comments
Open

Should append sometimes or always create UNION ALL BY NAME? #5165

kgutwin opened this issue Feb 27, 2025 · 0 comments
Labels
needs-discussion Undecided dilemma

Comments

@kgutwin
Copy link
Collaborator

kgutwin commented Feb 27, 2025

What's up?

Currently a PRQL append transform will result in UNION ALL:

from tbl_a
append tbl_b

gives

SELECT * FROM tbl_a UNION ALL SELECT * FROM tbl_b -- Generated by PRQL compiler version:0.13.3-39-ge393ab4d (https://prql-lang.org)

However, this results in issues like #4724, #2680, and #3184, where the underlying cause is that UNION ALL is interpreted by the database as "unify by column position" rather than "unify by column name". See the DuckDB docs:

Traditional set operations unify queries by column position, and require the to-be-combined queries to have the same number of input columns. If the columns are not of the same type, casts may be added. The result will use the column names from the first query.
DuckDB also supports UNION [ALL] BY NAME, which joins columns by name instead of by position. UNION BY NAME does not require the inputs to have the same number of columns. NULL values will be added in case of missing columns.

Questions:

  1. Should append always behave as UNION ALL BY NAME to simplify semantics from the user perspective, and also make the compiler's job easier? This would resolve all of the linked issues above without needing to dive into compiler details, but would be a breaking change for users expecting traditional UNION ALL behavior.
  2. If "no" to the above question, can we add a by:name or by:position argument to append to allow users to use UNION ALL BY NAME when that makes sense for their use case?
@kgutwin kgutwin added the needs-discussion Undecided dilemma label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion Undecided dilemma
Projects
None yet
Development

No branches or pull requests

1 participant