Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure matching syntax #95

Open
wants to merge 44 commits into
base: master
Choose a base branch
from
Open

Conversation

dphblox
Copy link

@dphblox dphblox commented Jan 29, 2025

Gathering consensus on the destructuring syntax inside of the braces (i.e. punting on other issues explicitly), so we can align on this with the community.

Rendered.

@jackdotink
Copy link
Contributor

I think that we should leave the proposed unpack syntax and other similar syntaxes out of this RFC. At the moment, they would not serve much use, and unpack already exists as a global function for that use. I think it will streamline the process of getting this RFC done and accepted or rejected if that is left for a later time.

@dphblox
Copy link
Author

dphblox commented Jan 29, 2025

From my POV, the thing that sunk the previous RFCs were exactly the questions that the unpack syntax and similar features answer. From Andy's prior comments, it's clear that the Luau team need to feel confident that destructuring can actually serve as a strong, intuitive, forwards-compatible language feature before its accepted, so I disagree that less detail is needed here. In fact, we should be proving to satisfaction that we can avoid all of the pitfalls beyond a "just about sufficient" spec.

That's the reason this is not an implementation RFC, by the way. The intention is that we align on overall direction first, and then implement in parts, as needed.

To quote Andy from the previous thread:

One of the difficult things about syntax extensions to programming languages is that reversing course after the fact is essentially impossible. This RFC does an admirable job of navigating constraints passed down to us from Lua and presenting a least-worst solution to this specific problem, but it doesn't handle assignments or array destructuring and there doesn't seem to be any path at all to get there.

We need the bar to be higher than least-worst if we're going to successfully maintain Luau's identity as a small, predictable language.

It's worth noting that this RFC already punts on many of the integration problems we found, such as how destructuring assignments should work (or if they should work, given myself and Hunter don't know of any languages that support this). If we punt on substantially more than we do already, we'll just be back to the old RFC, at which point I don't know what different result we expect to find.

So we need to decide on what our approaches will be for those facets of the feature, hence all of the extra syntax above what we already agreed upon. Whether we decide to support each feature or drop them from the spec, the decision must be documented, not deferred.

@dphblox
Copy link
Author

dphblox commented Jan 29, 2025

Dropping the unpack syntax from the spec in favour of basic indexing.

@rihok
Copy link

rihok commented Jan 30, 2025

Coming into this a bit late I guess, but wanted to just ask some questions. Is there some issue with copying the JavaScript syntax wholesale? It's quite well thought out and tried and tested. For example one question I have is, why is it necessary to have a dot prefix for the destructured keys instead of:

-- keyed table desctructuring
local { foo, bar } = t

-- indexed table destructuring
local a, b, ...rest = t
-- or
local [a, b, ...rest] = t

@dphblox
Copy link
Author

dphblox commented Jan 30, 2025

I don't entirely disagree with that idea personally, but IIRC others shot this down already.

The main trouble comes from the fact that array literals would look the same as keyed table destructuring:

-- this would not work as expected, despite looking visually similar
local { foo, bar } = { "foo", "bar" }

This was considered a showstopper for that syntax in particular.

Beyond that, the square brackets idea has been floated a few times, but each time there's generally some disapproval since the only other time Luau uses square brackets is for indexing; we dont use [] to declare arrays like JS does.

I think JS has a well designed syntax for JS, but in the context of Luau there's multiple arguments against some of its choices.

As for:

local foo, bar, ...rest = t

This would be backwards incompatible, as Luau supports multiple returns. Today, this runs:

local foo, bar, baz = { "hello" }

print(foo, bar, baz) --> { [1] = "hello" } nil nil

@surfbryce
Copy link

surfbryce commented Jan 31, 2025

I very much prefer something along the lines of what @rihok suggested at the very bottom of the code sample, possibly something that outlines similarly to:

local KeyedTable = {
    Hello = true;
    [5] = false;
    [Instance] = 5;
}
local PackedTable = table.pack(1, 2)
local NestedTable = {
    Nested = {
        World = true;
    };
}

local [ ["Hello"] = A, [5] = B, [Instance] = C ] = KeyedTable
local [ [1] = D, [2] = E, ["n"] = F ] = PackedTable
local [ ["Nested"] = [ ["World"] = G ] ] = NestedTable

I assume square brackets here would be an understandable extension of the Luau language given that square brackets are only used for indexing (a point illustrated by @dphblox) and table destructuring is simply an abstraction of that.

A few things to touch on surrounding this:

  1. Most of the time destructuring will be done on string literal indices, so potentially string literals can be written without the index syntax to appear as:
local [ Hello = A, [5] = B, [Instance] = C ] = KeyedTable
  1. There is concern about whether or not the equals sign could indicate assignment. Another alternative is to use colons instead, this would align with the ideology that table types use where we understand it as mapping a property to something else (in the case of table types, another type): so - in this case - we would be mapping a property to a variable name.
local [ Nested: [ World: G ] ] = NestedTable
  1. When used for global variable instantiation we run into the issue where it can be confused for actual indexing by the parser:
local something = someTable
[ SomeIndex: GlobalVariable ] = someTable

This has the same behavior as parentheses do in the language and results in roughly the same error contextually. This shouldn't be seen as a drawback and instead a reinforcement of the languages already well established behavior.

I don't know what the implications are on parser complexity in regards to this or whether or not this fits within the purview/scope of this RFC. But I believe it's a fun and reasonable suggestion purely in terms of syntax.

@dphblox
Copy link
Author

dphblox commented Feb 7, 2025

I assume square brackets here would be an understandable extension of the Luau language given that square brackets are only used for indexing (a point illustrated by @dphblox) and table destructuring is simply an abstraction of that.

I would be somewhat OK with this, but it's a little weird that we would use square brackets both to surround the keys and to surround the whole matcher.

Perhaps it would help with the "array problem":

local [ these, are, keys ] = data
local { these, are, consecutive, values } = data

But this would still be inverse of JS, which could be a stumbling block. IIRC, this is part of what caused the original RFC to fail, hence the dot prefixes. The older RFC has more details on that whole thing.

There is concern about whether or not the equals sign could indicate assignment. Another alternative is to use colons instead

Colons are used for type annotations, so it would be almost certainly inappropriate to use them here.

@dphblox
Copy link
Author

dphblox commented Feb 7, 2025

Here, let me throw out a few crazy syntax ideas, specifically trying to figure out how best to make array/dict difference obvious:

local {in these, are, keys} = data
local {...these, are, consecutive, values} = data
local in {these, are, keys} = data
local ...{these, are, consecutive, values} = data

@dphblox
Copy link
Author

dphblox commented Feb 7, 2025

I think I'll try and write down my full chain of thoughts at the moment around the array and reassignment comments from the old RFC, as well as some syntax comments more generally. Take all this syntax with a pinch of salt - it's just to illustrate.

Unlike JS, Luau allows you to shuttle around multiple values at once, making unpacking the natural paradigm for array destructuring:

local these, are, consecutive, values = table.unpack(data, 1, data.n)

So if we were to suggest any syntax there, it really feels like it ought to be expression-side, not assignment-side:

local these, are, consecutive, values = ...data

Naturally I would be inclined to extend that to keys (thus making this RFC redundant), but that introduces the problem of having to specify the keys twice in almost all situations:

local these, are, keys = ...data["these", "are", "keys"]

That's the grounds upon which I think the original destructuring RFC was made, and probably what we should focus on solving. That also aligns with the discussions we've had above. Trying to make this work for arrays seems misguided and I don't reckon it's even relevant.

Now onto syntax. One of my earlier explorations was having a "destructuring assignment", though I was concerned about it looking too much like a compound assignment, or like it's unpacking values positionally:

local these, are, keys ...= data

In my eyes, the problem there is that throughout the rest of Luau, the names of those identifiers are not significant - only their positions are. Therefore, if the names are going to be significant, it's worthwhile to alter how those names are presented to make that clear. That's what the original RFC did.

I'd say dot prefixes seem like a sensible and low-syntax way to indicate those names are keys.

local .foo, .bar, .baz = data

Of course, without the local indicating this is a new variable declaration, this would be ambiguous syntax. I personally only believe there is one valid use case for that, though, and that would be forward declaration of uninitialised variables:

local thing

local function usesThing()
    task.delay(1, function()
        print(thing)
    end)
end

local otherThings, foo, bar
.thing, .otherThings, .foo, .bar = getThings()

Forward declaration is a little bit of a code smell IMO (though not always!), it feels unintuitive for people to destructure in this position, and JS doesn't support destructuring when assigning - only when declaring - so I would be personally OK dropping this requirement, though I'm not the person who brought it up in the first place.

So all of that is my bias coming into this. I'm not saying we should do any of the above, but that's the general direction I lean in philosophically.

My aim is to try and enmesh that with the previous RFC and the syntax everyone seemed to be OK with at the time.

@dphblox
Copy link
Author

dphblox commented Mar 3, 2025

I cleaned up the proposal - I would consider this to be the apex of what an agreeable 100%-brace-based syntax would look like.

To summarise, here's what's left at this point:

  • Identifiers on the left, fully qualified paths on the right: { foo = [1], bar = [2] }
  • Dot keys can be used instead of string indexes: { foo = .red, bar = .blue }
  • Both kinds of key can be chained: { foo = [1].bar.baz }
  • If a dot key is used as the final index, the name can be inferred: { .foo, .bar[2].baz }

Things that have been explicitly rejected;

  • Positional array/tuple unpacking.
    • This was mentioned as a potential showstopper last RFC, but almost everyone I've talked to since then seems to have changed their mind. It's clear that nobody on the Luau team has the appetite to ever support { foo, bar } even if we did want to do array unpacking, and that alternate syntax ({ unpack foo, bar } / { foo = [1], bar = [2] }) or just standard assignment (local foo, bar = table.unpack(data)) more than suffices for both arrays and tuple-style tables.
  • Nested matchers.
    • As it stands in this proposal, the only way to match nested structure is to chain together keys (.foo.bar.baz). Matchers cannot be put inside of other matchers - this was deemed to be too confusing especially when considering questions around whether those nested matchers would introduce a member to the namespace. This can always be revisited later, but for now there's not enough motivation to include it.

There's some other suggestions in this thread for syntaxes based on brackets (e.g. [ foo, bar ]) that haven't really been discussed in as much depth, so I think that's the next topic to tackle here. If we're not interested in incorporating brackets, I think we're about as far as we'll get with substantial design changes - the rest will be bikeshedding.

@dphblox
Copy link
Author

dphblox commented Mar 3, 2025

All this being said, I'm personally less convinced that we even need everything that's in this RFC as it stands. Given the direction the discussions have drifted towards, combined with the practical examples that've arose, I struggle to see much of the use case beyond simple named destructuring. There's an argument to be made that we should elide much of the complex concerns and stick to something much simpler to retain the spirit of the Luau language.

Much of what's in here is my attempt at serving as many use cases as possible, but if we wanted to get most of the value with less complexity, we could just do something like

local .foo, .bar in data

and accept that we'll never have a full destructuring system, primarily because we'll never need one. Array destructuring has unpack and variadics going for it already, and nested structures are rare - the primary missing use case is for named destructuring, and I feel like we should be careful not to lose sight of that.

@hgoldstein
Copy link

Identifiers on the left, fully qualified paths on the right ...

Check out the conversation about that (. We shouldn't do this because, AFACT, all other languages that implement some sort of pattern matching opt to mirror their construction / destruction syntax.

... the primary missing use case is for named destructuring, and I feel like we should be careful not to lose sight of that.

I chatted about this offline with some folks (mainly @aatxe): destructuring syntax / pattern matching in general is a power user feature. The unfortunate part is that it's also the principled way to do named imports without adding a feature for it, such as JavaScript imports.

I wonder if we should support exactly one syntax to start:

local { a, b, c = foo } = { a = 3, b = 4, c = 5 }
print(a, b, foo) -- prints 3 4 5

No nested syntax, no support for custom indexers, no support for array-like members. As-is I think the local { prefix should be unambiguous, you just need one token of lookahead (which we already have, pretty sure): I don't think taking this as a first step would preclude us from expanding the syntax later. The thing we would lose would be committing to a potentially ambiguous syntax. I've been convinced that the existence of table.pack / table.unpack means that the syntax above won't be confused for an array-like de-structuring, and type checking can hopefully push folks in the right direction.

@dphblox
Copy link
Author

dphblox commented Mar 3, 2025

Identifiers on the left, fully qualified paths on the right ...

We shouldn't do this because, AFACT, all other languages that implement some sort of pattern matching opt to mirror their construction / destruction syntax.

Yeah this is part of the bikeshedding we need to figure out still. I just tried to match the mood of some other comments but don't personally have a strong position on which I'd prefer.

I will say that this feels unclear to me:

local { foo = bar } = getData()

It isn't obvious which side is the identifier and which side is the key. Since this isn't an ambiguous position, a contextual keyword would help clarify this and give a natural "orientation" to the statement. I also think it's more "Luau-like" to use keywords over symbols.

-- for demonstration purposes
local { foo as bar } = getData()
local { foo in bar } = getData()

But the ambiguity could get resolved by symbols too, just more noisily:

-- for demonstration purposes
local { foo -> bar } = getData()
local { foo = .bar } = getData()
local { .foo = bar } = getData()
local { [foo] = bar } = getData()
-- etc...

I think I'll just defer to the better judgement of the team for this one, but my personal preference would be as or in since they naturally give that "orientation" without introducing line noise or potentially easy-to-miss symbols.

@dphblox
Copy link
Author

dphblox commented Mar 3, 2025

I wonder if we should support exactly one syntax to start:

Yeah this is the way I'm leaning at this point too. I might pare down the RFC to literally just this.

I'm happy we spent the time to explore the other options if only so we could figure out why not to pursue them.

@dphblox
Copy link
Author

dphblox commented Mar 26, 2025

Alright, I've whittled down the RFC to just the basic feature as suggested. Happy to integrate further thoughts if desired :)

@deviaze
Copy link

deviaze commented Mar 26, 2025

Nice whittle! Could we also note syntax like

local .foo, .bar in data
local .foo, .bar = data
local foo, bar in data

in the alternatives section?

In drawbacks, maybe note that local { foo, bar, meow } = data syntax might not be as intuitive to users of only/mainly Luau, because it doesn't convey Luau indexing as clearly as alternatives like local .foo, .bar in data do.

As an another alternative, we could generalize .foo, .bar in data into a multiret expression like data.foo, data.bar,
with the syntactic sugar of local .foo, .bar in data being similar sugar to it as local function foo() end is to local function initialization and assignment.

Since local function sugars local assignment without requiring an =, I think a destructuring assignment syntax without = is perfectly reasonable. Any ambiguity in the expression context (nested .x in y exprs) should be resolvable with parens.

For example:

local x = .x in somevector
assert(x == somevector.x)
-- w/ syntax sugar:
local .x in somevector
local .x, .y in somevector

type Profile = {
    ip: string,
    hostname: string,
    last_seen: DateTimeUtc,
}

type DateTimeUtc = {
    unix_timestamp: number,
    format: (self: DateTimeUtc, fmtstring: string) -> string,
}

local profile: Profile

local ip, hostname, timestamp = .ip, .hostname, .last_seen.unix_timestamp in profile

-- w/ sugar
local .ip, .hostname, .timestamp.unix_timestamp as timestamp in profile -- for demonstration purposes; `as` might not be the right choice here `as` usual

@dphblox
Copy link
Author

dphblox commented Mar 28, 2025

I'm not against the idea of a braceless syntax, but I do have one or two concerns with it.

Firstly, in the case of something like React props, is it nice to format long lists of identifiers when there's no block implied by braces?

local
    position,
    size,
    bgColor,
    textColor
    onClick,
    onHover,
    onFocus in props

versus

local {
    position,
    size,
    bgColor,
    textColor
    onClick,
    onHover,
    onFocus
} in props

Secondly is that it can get weird if we use keywords for both renaming and assignment:

local foo, bar in baz in data

versus

local { foo, bar in baz } in data

And thirdly is that I wonder how scalable the pattern is to other places in Luau where matching structure may be desirable, for example the original RFC mentions function arguments:

local function foo(
    foo, bar in baz
)

versus

local function foo(
    { foo, bar } in baz
)

or even, if we allow anonymous parameters:

local function foo(
    foo, bar
)

versus

local function foo(
   { foo, bar }
)

So for the time being I'd rather stick to a brace-based syntax for the benefits of block delineation. Open to discuss whether = or in is appropriate for a destructuring assignment but that semi-escapes the scope of this RFC; we can decide that when it's time to implement, after we have the basic format of structure matching agreed upon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants