Skip to content

FileNotFoundError: Unable to resolve remote path #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
0x2b3bfa0 opened this issue May 27, 2025 · 4 comments · Fixed by #1121
Open

FileNotFoundError: Unable to resolve remote path #1120

0x2b3bfa0 opened this issue May 27, 2025 · 4 comments · Fixed by #1121
Assignees
Labels
bug Something isn't working

Comments

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented May 27, 2025

Description

Apparently, we can't index an empty S3 bucket, and it produces the following exceptions:

  • FileNotFoundError: Unable to resolve remote path: when running locally.
  • PermissionError: No AWSAccessKey was presented. when running from Studio.1

Query

import datachain

datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")

Traceback (Local)

Traceback (most recent call last):                                 
  File "/.../main.py", line 3, in <module>
    datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
    query=self._query.save(
        name=name,
    ...<4 lines>...
        **kwargs,
    )
  File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
    query = self.apply_steps()
  File "/.../datachain/src/datachain/query/dataset.py", line 1222, in apply_steps
    self.listing_fn()
    ~~~~~~~~~~~~~~~^^
  File "/.../datachain/src/datachain/lib/dc/storage.py", line 157, in <lambda>
    lambda ds_name=list_ds_name, lst_uri=list_uri: lst_fn(ds_name, lst_uri)
                                                   ~~~~~~^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/storage.py", line 153, in lst_fn
    .save(ds_name, listing=True, version=version)
     ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
    query=self._query.save(
        name=name,
    ...<4 lines>...
        **kwargs,
    )
  File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
    query = self.apply_steps()
  File "/.../datachain/src/datachain/query/dataset.py", line 1251, in apply_steps
    result = step.apply(
        result.query_generator, self.temp_table_names
    )  # a chain of steps linked by results
  File "/.../datachain/src/datachain/query/dataset.py", line 614, in apply
    self.populate_udf_table(udf_table, query)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/query/dataset.py", line 532, in populate_udf_table
    process_udf_outputs(
    ~~~~~~~~~~~~~~~~~~~^
        warehouse,
        ^^^^^^^^^^
    ...<3 lines>...
        cb=generated_cb,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/.../datachain/src/datachain/query/dataset.py", line 343, in process_udf_outputs
    for row in udf_output:
               ^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/udf.py", line 477, in _process_row
    for result_obj in result_objs:
                      ^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/listing.py", line 56, in list_func
    for entries in iter_over_async(client.scandir(path.rstrip("/")), get_loop()):
                   ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/asyn.py", line 280, in iter_over_async
    done, obj = asyncio.run_coroutine_threadsafe(get_next(), loop).result()
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/.../lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/.../lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/.../datachain/src/datachain/asyn.py", line 273, in get_next
    obj = await ait.__anext__()
          ^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/client/fsspec.py", line 247, in scandir
    await main_task
  File "/.../datachain/src/datachain/client/s3.py", line 133, in _fetch_default
    await self._fetch_flat(start_prefix, result_queue)
  File "/.../datachain/src/datachain/client/s3.py", line 124, in _fetch_flat
    await consumer
  File "/.../datachain/src/datachain/client/s3.py", line 98, in process_pages
    raise FileNotFoundError(f"Unable to resolve remote path: {prefix}")
FileNotFoundError: Unable to resolve remote path: 

Traceback (Studio)

Traceback (most recent call last):
  File "/.../site-packages/s3fs/core.py", line 114, in _error_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/aiobotocore/client.py", line 412, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: No AWSAccessKey was presented.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/.../site-packages/datachain/lib/listing.py", line 144, in _reraise_as_client_error
    yield
  File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/datachain/fs/utils.py", line 28, in isfile
    return not _isdir(fs, path)
               ^^^^^^^^^^^^^^^^
  File "/.../site-packages/datachain/fs/utils.py", line 10, in _isdir
    info = fs.info(path)
           ^^^^^^^^^^^^^
  File "/.../site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/.../site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 1471, in _info
    out = await self._call_s3(
          ^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 371, in _call_s3
    return await _error_wrapper(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 146, in _error_wrapper
    raise err
PermissionError: No AWSAccessKey was presented.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "/.../site-packages/datachain/lib/dc/storage.py", line 150, in read_storage
    list_ds_name, list_uri, list_path, list_ds_exists = get_listing(
                                                        ^^^^^^^^^^^^
  File "/.../site-packages/datachain/lib/listing.py", line 173, in get_listing
    if not glob.has_magic(uri) and not uri.endswith("/") and isfile(client.fs, uri):
                                                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 80, in inner
    with self._recreate_cm():
         ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/.../site-packages/datachain/lib/listing.py", line 146, in _reraise_as_client_error
    raise ClientError(message=str(e), error_code=getattr(e, "code", None)) from e
datachain.error.ClientError: No AWSAccessKey was presented.

Version Info

0.18.4
Python 3.12.10

Footnotes

  1. This one is especially obscure, and I guess it's the result of attempting anonymous (?) access after the first attempt fails with an exception?

@0x2b3bfa0 0x2b3bfa0 added the bug Something isn't working label May 27, 2025
@shcheklein
Copy link
Member

shcheklein commented May 27, 2025

Reproduced locally and Studio. Getting the same result:

FileNotFoundError: Unable to resolve remote path

Can it be because I don't use an OpenID connected team (demo-1) 🤔 ?

@shcheklein shcheklein mentioned this issue May 27, 2025
3 tasks
@shcheklein
Copy link
Member

https://github.com/iterative/datachain/pull/1121/files - fixes the FileNotFoundError

@0x2b3bfa0 where / how did you run it to get the No AWSAccessKey was presented error?

@shcheklein shcheklein reopened this May 28, 2025
@shcheklein shcheklein self-assigned this May 28, 2025
@shcheklein
Copy link
Member

First part of the fix is merged, I'm looking into the Unable to resolve remote path: part

@shcheklein
Copy link
Member

Closing this as we were not able to reproduce the last piece here after all the redeployments. @0x2b3bfa0 feel free to destroy the cluster if it's still running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants