API review #7

achingbrain · 2018-11-29T10:14:48Z

Proposal

To change the return type and codify the behaviour of this module.

Current API

const exporter = require('ipfs-unixfs-exporter')
const pull = require('pull-stream')
const through = require('pull-stream/throughs/through')
const onEnd = require('pull-stream/sinks/on-end')

pull(
  exporter('QmFoo.../bar/baz.txt', ipld)
  through(file => {
    console.info(file)

  // {
  //   depth: 2,
  //   name: 'baz.txt',
  //   path: 'QmFoo.../bar/baz.txt',
  //   size: ...
  //   hash: Buffer
  //   content: <Pull stream>
  //   type: 'file'
  // }

  }),
  onEnd(...)
)

Observations

If the user wants any extra information about the node, they must turn the hash into a CID then pass it to an IPLD resolver, but we already had the CID and DAGNodes during the graph traversal so it's just unnecessary work
Using external libraries dictates how the user should structure their code and their choice of dependencies. They came here for the files, just give them the files

Proposal

Emit entries with object versions of CIDs and nodes
Use core JS types, e.g. async iterators

const exporter = require('ipfs-unixfs-exporter')

for await (const entry of exporter('QmFoo.../bar/baz.txt', ipld)) {
  console.info(entry)

  // {
  //   depth: 2,
  //   name: 'baz.txt',
  //   path: 'QmFoo.../bar/baz.txt',
  //   cid: CID
  //   node: DAGNode (cid.codec === 'dag-pb') ||  Object (cid.codec === 'dag-cbor') || Buffer (cid.codec === 'raw')
  //   content: Function // returns an async iterable of file or directory contents
  //   type: 'file' // or 'directory'
  // }

  for await (const content of entry.content()) {
    // content is either a buffer (when entry.type === 'file') or an entry (when entry.type === 'directory')
  }
}

Options

{
  offset: Number,  // If a file is at the end of the path `.content()` will start at this offset
  length: Number, // If a file is at the end of the path `.content()` will only return this many bytes
  fullPath: Boolean, // Emit entries for every path component
  maxDepth: Boolean // Only descend this number of components into the path
}

These would remain the same.

Sharding

Sharded directories are treated like normal directories. That is, they have a type of directory.

Behaviour

When requesting a path: /ipfs/Qmfoo/bar/baz, if baz resolves to a directory, the contents of the directory will be emitted, otherwise baz itself will be emitted.

If maxDepth is specified and matches the zero-indexed number of path components (excluding /ipfs), baz will be emitted even if it is a directory.

If fullPath is specified, each path component encountered will be emitted (excluding /ipfs).

Examples

When `baz` is a directory

exporter('/ipfs/Qmfoo/bar/baz')

// emits the files contained in `baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  fullPath: true
})

// emits `Qmfoo`, `bar`, `baz`, then the files contained in `baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  maxDepth: 2
})

// emits`baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  fullPath: true,
  maxDepth: 2
})

// emits `Qmfoo`, `bar`, `baz`

When `baz` is a file

exporter('/ipfs/Qmfoo/bar/baz')

// emits `baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  fullPath: true
})

// emits `Qmfoo`, `bar`, `baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  maxDepth: 2
})

// emits`baz`

exporter('/ipfs/Qmfoo/bar/baz', {
  fullPath: true,
  maxDepth: 2
})

// emits `Qmfoo`, `bar`, `baz`

The text was updated successfully, but these errors were encountered:

achingbrain · 2018-11-29T10:15:06Z

cc @ipfs/javascript-team

alanshaw · 2018-11-29T12:58:30Z

When entry is a directory, does content() get the files in the directory?

achingbrain · 2018-11-29T13:13:03Z

I haven't come across a situation where I needed that, but I can see how it'd be useful, will add that in.

achingbrain · 2018-11-29T13:16:14Z

offset and length in that case might mean slices of files?

Maybe they should be passed into the content() function?

E.g.

File:

const exporter = require('ipfs-unixfs-exporter')

for await (const entry of exporter('QmFoo.../bar/baz.txt', ipld)) {
  for await (const content of entry.content({
    offset: 0,
    length: 10
  })) {
    // content is a buffer with the first 10 bytes of `baz.txt`
  }
}

Dir:

const exporter = require('ipfs-unixfs-exporter')

for await (const entry of exporter('QmFoo.../bar', ipld)) {
  for await (const entry of entry.content({
    offset: 0,
    length: 10
  })) {
    // entry will be one of (up to) the first 10 files in the directory 
  }
}

vmx · 2018-11-29T13:46:49Z

When baz is a directory

What happens if the maxDepth is greater than the depth?

exporter('/ipfs/Qmfoo/bar/baz', {
  maxDepth: 3
})

Will it "emit baz" or "the files contained in baz"?

If it "emit baz"s, then you could use maxDepth: Infinity to always "emit baz".

lidel · 2018-11-30T00:02:33Z

Sharded directories are treated like normal directories. That is, they have a type of directory.

Just to ensure I got this right: the idea is to hide HAMT structures and always expose flattened one, indistinguishable from regular directory, right? ("They came here for the files, just give them the files")

achingbrain · 2018-11-30T08:01:19Z

What happens if the maxDepth is greater than the depth?

The intention behind maxDepth is to allow halting the graph traversal early. My thinking is if maxDepth is greater than the possible depth, it'll just do what it would have done if you hadn't specified maxDepth, which is the same as the current behaviour. From your example it would emit the files contained in baz.

The idea is to hide HAMT structures and always expose flattened one, indistinguishable from regular directory, right?

Yes, I just meant that the user wouldn't have to do anything different to extract files from a sharded directory than from a normal one, similar to what it does now, see dir-hamt-sharded.js and dir-flat.js.

If people think it's useful we can include a sharded flag or similar, alternatively you can do ipfs.files.stat(path) and it'll come back with a type of hamt-sharded-directory (works for both MFS and IPFS paths).

achingbrain · 2018-11-30T08:19:39Z

Actually, you also have the node property, which if it's a dag-pb node, the data from which can be passed through UnixFS.unmarshal(buf) and it'll tell you if it's a sharded directory or not.

I guess we could include the unmarshaled UnixFS entry too, since we've unmarshaled it during our traversal already?

achingbrain · 2018-11-30T08:48:04Z

I guess actually if we include the unmarshaled UnixFS entry we can omit some of the other fields like type and size 🤔

alanshaw · 2018-11-30T09:02:39Z

Is maxDepth necessary? You can just break out of the loop after the nth iteration right?

achingbrain · 2018-11-30T10:23:56Z

Good point. Thinking a bit more about this:

FSEntry

{
  depth: 2,
  name: 'baz.txt',
  path: 'QmFoo.../bar/baz.txt',
  cid: CID
  node: DAGNode (cid.codec === 'dag-pb') ||  Object (cid.codec === 'dag-cbor') || Buffer (cid.codec === 'raw')
  content: Function // returns an async iterable of file or directory contents
  entry: UnixFS // or null if cid.codec === 'raw' https://github.com/ipfs/js-ipfs-unixfs
}

API

Export a node

const exporter = require('ipfs-unixfs-exporter')
const result = await exporter('/ipfs/Qmfoo/bar', ipld)

console.info(`Exported a ${result.entry.type}`)

for await (const content of result.content({ offset, length })) {
  // if `entry` was a file, `content` is a buffer, if it was a directory, `content` is an FSEntry
}

Export all nodes on the path to a node

const exporter = require('ipfs-unixfs-exporter')

for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {
  // ...
}

🤖 Future tech zone

Wildcards?

const exporter = require('ipfs-unixfs-exporter')

for await (const fsEntry of exporter('/ipfs/Qmfoo/bar/**/*.js', ipld)) {
  // ...
}

Search?

const exporter = require('ipfs-unixfs-exporter')

for await (const fsEntry of exporter('/ipfs/Qmfoo/bar', {
  query: 'some search string'
}, ipld)) {
  // ... can be combined with globs, above
}

vmx · 2018-11-30T11:05:57Z

I like symmetric/consistent APIs. Having

const result = await exporter('/ipfs/Qmfoo/bar', ipld)
for await (const content of result.content({ offset, length })) {}

but on the other hand

for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {}

seems strange. Without thinking about how to implement it, I'd rather do either

const result = await exporter('/ipfs/Qmfoo/bar', ipld)
for await (const content of result.content({ offset, length })) {}
for await (const fsEntry of result.path() {}

or

for await (const content of exporter.content('/ipfs/Qmfoo/bar', ipld, { offset, length })) {}
for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {}

or something similar. I guess you got the point.

achingbrain · 2018-11-30T11:24:03Z

I think the second option is ok:

for await (const content of exporter.content('/ipfs/Qmfoo/bar', ipld, { offset, length })) {}
for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {}

This will be expensive as you'd have to do the traversal twice, once to return the initial result and once to return the path, or you store it in memory as you go which doesn't sound great either:

const result = await exporter('/ipfs/Qmfoo/bar', ipld)
for await (const content of result.content({ offset, length })) {}
for await (const fsEntry of result.path() {}

I think you'd end up doing both:

const result = await exporter('/ipfs/Qmfoo/bar', ipld) // allows inspection of `result` properties
for await (const content of result.content({ offset, length })) {}

// shortcut for the above, just give me the content!
for await (const content of exporter.content('/ipfs/Qmfoo/bar', ipld, { offset, length })) {}

// also this, i want to know how i got to '/ipfs/Qmfoo/bar'
for await (const fsEntry of result.path('/ipfs/Qmfoo/bar', ipld) {}

vmx · 2018-11-30T11:56:09Z

This will be expensive as you'd have to do the traversal twice, once to return the initial result and once to return the path, or you store it in memory as you go which doesn't sound great either:

That's not what I meant. This shouldn't be read as one program, but as two cases. I want either the content or the path.

// also this, i want to know how i got to '/ipfs/Qmfoo/bar'
for await (const fsEntry of result.path('/ipfs/Qmfoo/bar', ipld) {}

Wouldn't this be just result.path()?

alanshaw · 2018-11-30T13:11:20Z

FSEntry

{
  depth: 2,
  name: 'baz.txt',
  path: 'QmFoo.../bar/baz.txt',
  cid: CID
  node: DAGNode (cid.codec === 'dag-pb') ||  Object (cid.codec === 'dag-cbor') || Buffer (cid.codec === 'raw')
  content: Function // returns an async iterable of file or directory contents
  entry: UnixFS // or null if cid.codec === 'raw' https://github.com/ipfs/js-ipfs-unixfs
}

API

Export a node

const exporter = require('ipfs-unixfs-exporter')
const result = await exporter('/ipfs/Qmfoo/bar', ipld)

console.info(`Exported a ${result.entry.type}`)

for await (const content of result.content({ offset, length })) {
  // if `entry` was a file, `content` is a buffer, if it was a directory, `content` is an FSEntry
}

Export all nodes on the path to a node

const exporter = require('ipfs-unixfs-exporter')

for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {
  // ...
}

I really like all of this.

Would the exporter.content function provide any significant performance benefit (or any other benefit) other than saving 1 line of code? If not I'd probably drop it.

My only worry is your future tech section, where exporter() is returning an iterable not a promise - how would we handle that in the future? Right now you'll always get one entry, so it makes sense to return a promise not an iterable.

alanshaw · 2018-11-30T13:15:49Z

@vmx

Wouldn't this be just result.path()?

Yeah I think so I'm assuming that's a copy paste error. However Alex's point on it being expensive still stands.

achingbrain · 2018-11-30T14:07:04Z

This shouldn't be read as one program, but as two cases
Wouldn't this be just result.path()?

Yes, I also meant that as two cases and not one program. I think we're on the same page here 😄

vmx · 2018-11-30T14:50:33Z

I don't understand the expensiveness problem of

for await (const content of result.content({ offset, length })) {}
for await (const fsEntry of result.path() {}

Couldn't the exporter be lazy and just initialise without doing any actual work?

achingbrain · 2018-11-30T15:25:13Z

Calling .content() on a FSEntry is cheap as we've already resolved the FSEntry's CID during the initial traversal and use that to export the content.

Calling .path() on an FSEntry will mean either making the traversal again or caching it in memory just in case the user calls .path().

I think if we put .path() on FSEntry it encourages the user to call it in a loop which may result in surprise when their program does not perform as expected.

For that reason I think we should stick .path() on the exporter root only:

for await (const fsEntry of exporter.path('/ipfs/Qmfoo/bar', ipld)) {}

They can obviously still call it in a loop, but it's not a natural thing to do as it doesn't get given to you by an iterable.

achingbrain · 2018-11-30T15:30:48Z

Would the exporter.content function provide any significant performance benefit?

No, it's just syntactic sugar.

My only worry is your future tech section, where exporter() is returning an iterable not a promise - how would we handle that in the future?

I was thinking to look at the string and return a promise that resolved to an iterable if a wildcard character was seen or a search query was passed.

We might end up having them as proper methods on the exporter root instead. The other solution would be making it be an iterator all the time which is kind of what we have now (in that it's always a pull stream) which doesn't feel very ergonomic.

Anyway, it's all future tech, we can solve that problem on a separate issue.

vmx · 2018-11-30T15:36:19Z

@achingbrain thanks for making things more explicit. At #7 (comment) I meant that exporter('/ipfs/Qmfoo/bar', ipld) wouldn't return an FSEntry, but some "Exporter object". You would then call either content() or path() on it.

achingbrain · 2019-10-14T14:01:09Z

This has been implemented.

alanshaw mentioned this issue Nov 29, 2018

Awesome Endeavour: Async Iterators ipfs/js-ipfs#1670

Closed

achingbrain mentioned this issue Nov 29, 2018

RFC: Do not return string CIDs from core ipfs-inactive/interface-js-ipfs-core#394

Closed

11 tasks

achingbrain mentioned this issue Dec 3, 2018

perf: only descend into hamt subshard that has the target entry #10

Merged

achingbrain mentioned this issue Apr 25, 2019

feat: convert to async/await #21

Merged

3 tasks

achingbrain closed this as completed Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API review #7

API review #7

achingbrain commented Nov 29, 2018 •

edited

Loading

achingbrain commented Nov 29, 2018

alanshaw commented Nov 29, 2018

achingbrain commented Nov 29, 2018

achingbrain commented Nov 29, 2018 •

edited

Loading

vmx commented Nov 29, 2018

lidel commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

alanshaw commented Nov 30, 2018

achingbrain commented Nov 30, 2018 •

edited

Loading

vmx commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

alanshaw commented Nov 30, 2018

FSEntry

API

Export a node

Export all nodes on the path to a node

alanshaw commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

achingbrain commented Oct 14, 2019

API review #7

API review #7

Comments

achingbrain commented Nov 29, 2018 • edited Loading

Proposal

Current API

Observations

Proposal

Options

Sharding

Behaviour

Examples

When baz is a directory

When baz is a file

achingbrain commented Nov 29, 2018

alanshaw commented Nov 29, 2018

achingbrain commented Nov 29, 2018

achingbrain commented Nov 29, 2018 • edited Loading

vmx commented Nov 29, 2018

lidel commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

alanshaw commented Nov 30, 2018

achingbrain commented Nov 30, 2018 • edited Loading

FSEntry

API

Export a node

Export all nodes on the path to a node

🤖 Future tech zone

Wildcards?

Search?

vmx commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

alanshaw commented Nov 30, 2018

FSEntry

API

Export a node

Export all nodes on the path to a node

alanshaw commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

achingbrain commented Nov 30, 2018

achingbrain commented Nov 30, 2018

vmx commented Nov 30, 2018

achingbrain commented Oct 14, 2019

achingbrain commented Nov 29, 2018 •

edited

Loading

When `baz` is a directory

When `baz` is a file

achingbrain commented Nov 29, 2018 •

edited

Loading

achingbrain commented Nov 30, 2018 •

edited

Loading