Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files missing during dry-run #2762

Closed
Ajmal-1301 opened this issue Mar 8, 2025 · 23 comments
Closed

Files missing during dry-run #2762

Ajmal-1301 opened this issue Mar 8, 2025 · 23 comments
Assignees
Labels
category: Question Further information is requested topic: Input Data Related to input data

Comments

@Ajmal-1301
Copy link

Your name

Ajmal

Your affiliation

North Carolina State University

Please provide a clear and concise description of your question or discussion topic.

Hi,
I was downloading data to do a dry-run in GCClassic v14.5, and have the following files missing, I tried changing the download mirror but these files still cannot be downloaded. Any help will be appreciated

Thanks

Image

@Ajmal-1301 Ajmal-1301 added the category: Question Further information is requested label Mar 8, 2025
@yantosca
Copy link
Contributor

Thanks for writing @Ajmal-1301. I can confirm that the files in question are present both on geoschemdata.wustl.edu and s3://geos-chem.

Image

Image

@yuyao-cyber: Would you have any ideas why the file transfer failed?

@yantosca yantosca added the topic: Input Data Related to input data label Mar 10, 2025
@yuyao-cyber
Copy link

Hi @Ajmal-1301, I can also confirm the permissions are good for these files. Since it's saying "No such file or directory", I believe it is because there is at least one destination directory path is missing:

/rsl/researchers/n/nmeskhi/Ajmal_files/GC_InputData/gegrid/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09

Please check if these folders all exist. Please let me know if it doesn't work out. Thank you.

@Ajmal-1301
Copy link
Author

Hi @yuyao-cyber and @yantosca ,
Thank you for replying, I checked the folder and the folders you mentioned do not exist. I thought they would get created during the dry-run simulation. Should I create a directory myself at that location?

@yuyao-cyber
Copy link

Hi @yantosca , I am not too familiar with the dry run simulation process yet... Do you have an idea on this?

@yantosca
Copy link
Contributor

@yuyao-cyber I think the problem is in the file transfer. It doesn't seem like it's an issue on the WashU end.

@Ajmal-1301: Would you be able to attach your log.dryrun file to this issue? (You'll need to rename it to log.dryrun.txt). Also you can try downloading from the s3://geos-chem bucket via anonymous login. Go to the download_data.yml file in your run directory and look for the entry:

---
#
# Configuration file for the download_data.py script.
# You should not have to modify this file unless a new data portal
# comes online, or the default restart files are updated.
#
#
# GEOS-Chem data portals
portals:

  # GEOS-Chem Input Data portal, download via AWS CLI
  geoschem+aws:
    short_name: ga
    s3_bucket: True
    remote: s3://geos-chem
    command: 'aws s3 cp '
    quote: ""

then edit the command line to command: 'aws s3 cp --no-sign-request ', which will allow you to download data from s3://geos-chem` even if you do not already have an Amazon account.

Best,
Bob Y.

@Ajmal-1301
Copy link
Author

Hi @yantosca ,
I have uploaded the log.dryrun file. Also I was watching your tutorial video, and from what I understood downloading from aws has charges associated to it (it is a paid service, if I am running my simulation on a university HPC) or did I understand that incorrectly?

Thanks
Ajmal

log.dryrun_spinup2.txt

@yantosca
Copy link
Contributor

Thanks for the update @Ajmal-1301. The data on the GEOS-Chem Input Data portal (s3://geos-chem) is free to download since it is covered by the Amazon Sustainable Data Initiative. You can download it using the AWS command-line interface (AWSCLI) tool.

It is true, if you try to open an AWS account they will ask you for payment information. But you can still download the data for free
by using the AWSCLI command aws s3 cp --no-sign-request, which will use anonymous login instead.

We have recently updated our documentation on this, see:

@yantosca
Copy link
Contributor

@Ajmal-1301: A version of the AWSCLI tool may already be installed on your cluster, maybe as loadable software module. That would save you from having to install it yourself.

@Ajmal-1301
Copy link
Author

Hi @yantosca
Thank you for the reply, I used awscli and tried downloading from aws server. I got the restart file, but CloudJ files are still missing. I have also added the new log file

Thanks
Ajmal

Image

log.dryrun_spinup3.txt

@yantosca
Copy link
Contributor

Thanks for your patience. We have been having some issues with the data sync to the AWS s3://geos-chem bucket.

Also note the dry run bug in issue #2771 and the fix in PR #2772.

@yantosca yantosca self-assigned this Mar 14, 2025
@Ajmal-1301
Copy link
Author

Hi @yantosca ,
I tried changing the download_data.py as shown in PR #2772. But I am still having the same error

@yantosca
Copy link
Contributor

Thanks for your patience @Ajmal-1301. @yuyao-cyber just fixed some issues with the data sync to AWS. Try to see if you can download the data now.

@Ajmal-1301
Copy link
Author

Hi @yantosca , @yuyao-cyber ,
Unfortunately, I still face the same issue when using geoschem+aws portal "error 404 when calling the headobject operation..."

Thanks
Ajmal

@Ajmal-1301
Copy link
Author

Hi,
Just an update, I tried using "(base) [arashee2@login02 CHEM_INPUTS]$ aws s3 cp --no-sign-request s3://geos-chem/CHEM_INPUTS/CLOUD_J ./ --recursive" command inside the geoschem input directory to see if I can copy the files, but it is showing up as permission denied

Thanks

@yantosca
Copy link
Contributor

Thanks @Ajmal-1301. I was able to run the command above without issue.

[holylogin06 data]$  aws s3 cp --no-sign-request s3://geos-chem/CHEM_INPUTS/CLOUD_J ./ --recursive
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_j2j.dat to v2023-05/FJX_j2j.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/CJ77_inp.dat to v2023-05/CJ77_inp.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/README~ to v2023-05/README~
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_scat-cld.dat to v2023-05/FJX_scat-cld.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_scat-aer.dat to v2023-05/FJX_scat-aer.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_spec.dat to v2023-05/FJX_spec.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_scat-ssa.dat to v2023-05/FJX_scat-ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_scat-geo.dat to v2023-05/FJX_scat-geo.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/README to v2023-05/README
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/FJX_scat-UMa.dat to v2023-05/FJX_scat-UMa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/h2so4.dat to v2023-05/h2so4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/brc.dat to v2023-05/brc.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/dust.dat to v2023-05/dust.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/ssc.dat to v2023-05/ssc.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/ssa.dat to v2023-05/ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/org.dat to v2023-05/org.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/soot.dat to v2023-05/soot.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/so4.dat to v2023-05/so4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/atmos_h2och4.dat to v2023-05/atmos_h2och4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/atmos_std.dat to v2023-05/atmos_std.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/CJ77_inp.dat to v2023-11-Hg/CJ77_inp.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_j2j.dat to v2023-11-Hg/FJX_j2j.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_scat-aer.dat to v2023-11-Hg/FJX_scat-aer.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_scat-geo.dat to v2023-11-Hg/FJX_scat-geo.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_scat-ssa.dat to v2023-11-Hg/FJX_scat-ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-05/atmos_geomip.dat to v2023-05/atmos_geomip.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_spec.dat to v2023-11-Hg/FJX_spec.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_spec.dat~ to v2023-11-Hg/FJX_spec.dat~
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_scat-UMa.dat to v2023-11-Hg/FJX_scat-UMa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/README to v2023-11-Hg/README
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_scat-cld.dat to v2023-11-Hg/FJX_scat-cld.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/brc.dat to v2023-11-Hg/brc.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/h2so4.dat to v2023-11-Hg/h2so4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/jv_spec_mie.dat to v2023-11-Hg/jv_spec_mie.datdownload: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/org.dat to v2023-11-Hg/org.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/dust.dat to v2023-11-Hg/dust.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/FJX_spec.dat_ to v2023-11-Hg/FJX_spec.dat_
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/atmos_geomip.dat to v2023-11-Hg/atmos_geomip.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/ssa.dat to v2023-11-Hg/ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/atmos_std.dat to v2023-11-Hg/atmos_std.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/so4.dat to v2023-11-Hg/so4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/ssc.dat to v2023-11-Hg/ssc.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/atmos_h2och4.dat to v2023-11-Hg/atmos_h2och4.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2023-11-Hg/soot.dat to v2023-11-Hg/soot.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/FJX_j2j.dat to v2024-09-Hg/FJX_j2j.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/FJX_scat-ssa.dat to v2024-09-Hg/FJX_scat-ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/FJX_spec.dat to v2024-09-Hg/FJX_spec.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/FJX_scat-cld.dat to v2024-09-Hg/FJX_scat-cld.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/FJX_scat-aer.dat to v2024-09-Hg/FJX_scat-aer.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-aer.dat to v2024-09/FJX_scat-aer.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09-Hg/README to v2024-09-Hg/README
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-aer.txt to v2024-09/FJX_scat-aer.txt
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_j2j.dat to v2024-09/FJX_j2j.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-ssa.dat to v2024-09/FJX_scat-ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_spec.dat to v2024-09/FJX_spec.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/README to v2024-09/README
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01-Hg/FJX_j2j.dat to v2025-01-Hg/FJX_j2j.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-cld.dat to v2024-
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01-Hg/README to v2025-01-Hg/Rdownload: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01-Hg/FJX_spec.dat to v2025-01
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/FJX_scat-cld.dat to v2025-01
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/FJX_spec.dat to v2025-01/FJX_spec.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/FJX_scat-ssa.dat to v2025-01/FJX_scat-ssa.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/FJX_scat-aer.dat to v2025-01/FJX_scat-aer.dat
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/FJX_j2j.dat to v2025-01/FJX_j2j.dat 
download: s3://geos-chem/CHEM_INPUTS/CLOUD_J/v2025-01/README to v2025-01/README

@yuyao-cyber: I wonder if there is some kind of connection issue that prevents download from @Ajmal-1301's server. What do you think?

@yuyao-cyber
Copy link

Hi @Ajmal-1301 @yantosca, I think if you are inside CHEM_INPUTS, you may not have permission to write or cp files in it. Trying to go to a folder where you have write access might help.

@Ajmal-1301
Copy link
Author

Hi @yuyao-cyber
Yes, the command works in a separate folder, but how will I transfer those files to CHEM_INPUTS folder?

Thanks

@yuyao-cyber
Copy link

@Ajmal-1301 , I believe in this case you don't have the write permission to make changes in the CHEM_INPUTS folder. You can run ls -ld . in CHEM_INPUTS. It should only show that the owner group has the written permission. Please let me know if this answers your question.

@yantosca
Copy link
Contributor

Thanks @Ajmal-1301 and @yuyao-cyber for clarifying. If the folder where you keep your shared GEOS-Chem data is managed by e.g. your sysadmin, you will need to ask them to copy it there for you.

@Ajmal-1301
Copy link
Author

Ajmal-1301 commented Mar 18, 2025

Hi @yantosca , @yuyao-cyber
It is showing that the owner is another PhD student from our group, unfortunately, she graduated a year ago and I am not sure if she has access to university servers, can I create a new empty input folder and change my default input directory?

@yantosca
Copy link
Contributor

Now I undertstand @Ajmal-1301. Yes, you can definitely download the data to a folder that you own and then change the paths in your geoschem_config.yml files.

In the long run, if the GEOS-Chem input data is going to be shared by more than one person in your group, you might want to see if you can get your sysadmin staff to change the permissions of that folder to be group-readable.

@Ajmal-1301
Copy link
Author

Hi @yantosca ,
I decided against a shared directory as I am still learning and did not want to cause any accidental overwriting or corruption of files, that she might be using. But I copied her Extdata folders onto my new directory which is causing the ownership issues I guess, I will try changing the path I gave when I registered as Geos-chem user in ".geoschem/config" file and try again

Thanks

@yantosca
Copy link
Contributor

No worries @Ajmal-1301. Glad we could figure it out. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Question Further information is requested topic: Input Data Related to input data
Projects
None yet
Development

No branches or pull requests

3 participants