Skip to content

Python Wrapper for Internet Archives Insufferble CLI tool

License

Notifications You must be signed in to change notification settings

harrypm/IA-Interact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Internet Archive Interact

An interactive command-line tool for managing Internet Archive repositories.

Use this script to list files, upload files, delete files, move files, and create new repositories with detailed metadata input.

Table of Contents

Features

  • Interactive Menu: Choose options to list files, upload files, delete or move files, or create a new repository.
  • Test Mode & Permanent Mode: Run in simulation (Test Mode, where no changes are made) or execute actual changes (Permanent Mode).
  • Metadata Support: Input metadata including title, description, creator, date, language, license URL, collection, subject tags, and test item status.
  • Collection Options: Supports collections such as community, opensource, texts, movies, audio, image, etree, folksoundomy, games, and software.
  • Progress Bars: Uses tqdm to display file upload progress.
  • S3 Authentication: Uses S3 access keys (set as environment variables) for secure communication with the Internet Archive.

Prerequisites

Before you begin, ensure you have:

  • A Linux environment (Ubuntu, Debian, etc.)
  • Python 3 installed
  • S3 Access Keys from Internet Archive
  • An active internet connection
  • Internet Archive Python Tool

Installation

1. Preparing Your System

Update Your System:

sudo apt update && sudo apt upgrade -y

Install Python 3 and pip:

sudo apt install -y python3 python3-pip

Install internet archive CLI via pipx:

pipx install internetarchive

2. Setting Up the Script

Download the Script into your home or internet archive folder

alt text

  1. Open your text editor and create a file named ia-interact.py:
  2. Paste the full script code into the file, then save and exit.

(Optional) Make the Script Executable:

chmod +x ia-interact.py

3. Installing Python Libraries

Install Required Libraries:

pip3 install requests tqdm

Verify the Library Installation:

pip3 show requests tqdm

alt text

4. Configuring Environment Variables

Obtain Your S3 Access Keys:

Retrieve your S3 keys from Internet Archive Account Page.

Set Up Environment Variables:

Edit your shell configuration file (e.g., ~/.bashrc or ~/.zshrc) and add:

export S3_ACCESS_KEY="your-access-key"
export S3_SECRET_KEY="your-secret-key"

Replace "your-access-key" and "your-secret-key" with your actual keys.

Reload the Configuration:

source ~/.bashrc

Test the Environment Variables:

echo $S3_ACCESS_KEY
echo $S3_SECRET_KEY

Usage

alt text

Running the Script

To execute the script, run:

python3 ia-interact.py

To interact with a repo, simply copy the web link like such:

https://archive.org/details/xxxxxxxxxx

Script Options

When the script runs, it displays an interactive menu with the following options:

  • List Files: Display the contents of an existing repository.
  • Upload Files: Add files to a repository.
  • Delete/Move Files: Manage files within a repository.
  • Create a New Repository: Upload an entire folder and configure repository metadata.

During repository creation, you will be prompted to:

  • Input metadata (title, description, creator, date, language, license URL).
  • Select a collection from the provided list.
  • Enter subject tags (e.g., music, history).
  • Specify if the repository is a test item (Note: Test items are automatically deleted after 30 days).
  • Choose between Test Mode (simulate actions without an actual upload) and Permanent Mode (execute actual uploads).

Troubleshooting

  • Missing Libraries:
    If you encounter errors about missing libraries, run:

    pip3 install requests tqdm

  • Environment Variables Not Set:
    Ensure your environment variables are defined in your shell configuration file and reload it:

    source ~/.bashrc

  • API or Network Issues:
    Verify that your S3 keys are correct and that your internet connection is stable.

  • Logging:
    To help with debugging, you can redirect output to a log file:

    python3 ia-interact.py > script_output.log 2>&1

Full Breakdown and Feature Notes

Overview

"IA Interact" is an interactive command-line tool designed to manage repositories on the Internet Archive. It supports operations such as:

  • Uploading Files: Upload individual files or entire folders to an Internet Archive repository.
  • Listing Repository Contents: Retrieve and display the contents of a repository using the metadata API.
  • Deleting Files: Remove specified files from a repository.
  • Moving Files: Change a file’s location within a repository by copying it and then deleting the original.
  • Creating a New Repository: Upload a folder as a new repository and submit metadata.
  • User Interaction: Offers an interactive menu with a help option, test mode (simulation) vs. permanent mode, and filtering to avoid showing files from ".thumbs" directories.

This script uses the Internet Archive’s S3-compatible interface and Metadata API, and it includes robust file upload logic (with chunking, progress bars, and retry strategies).


Detailed Breakdown by Function

1. get_repo_identifier(repo_link)

  • Purpose:
    Extracts the repository identifier from a full Internet Archive URL.
  • How It Works:
    Uses a regular expression to capture the pattern /details/{identifier}. If the URL is invalid, it alerts the user and returns None.

2. upload_file_with_progress(identifier, file_path, directory)

  • Purpose:
    Uploads a file to a specified directory within a repository.
  • Key Features:
    • Chunking: Reads the file in 2MB chunks.
    • Progress Tracking: Uses the tqdm library to display a real-time progress bar.
    • Retry Logic: Implements retry strategy (5 retries) using an HTTPAdapter.
    • S3 Authentication: Reads S3 keys from environment variables and includes them in the request headers.
  • Note:
    The function sends HTTP PUT requests to the URL https://s3.us.archive.org/{identifier}/{directory}/{filename} to perform the upload.

3. list_repository_files(identifier)

  • Purpose:
    Retrieves and lists the files in a repository.
  • Key Features:
    • Metadata API: Sends a GET request to https://archive.org/metadata/{identifier} to fetch repository metadata (in JSON).
    • Filtering: Excludes any files that reside in directories with names ending in ".thumbs" (i.e. if any component of the path ends with ".thumbs").
    • Display: Prints out a numbered list of the filtered file names.

4. delete_file(identifier, file_path)

  • Purpose:
    Deletes a file from the repository.
  • Key Features:
    • HTTP DELETE: Sends a DELETE request to the S3 endpoint https://s3.us.archive.org/{identifier}/{file_path}.
    • S3 Authentication: Utilizes S3 credentials stored in environment variables.
    • Feedback: Notifies the user whether the file deletion succeeded (checks for HTTP 200 or 204).

5. move_file(identifier, file_name, source_dir, target_dir)

  • Purpose:
    Moves a file from one location in the repository to another.
  • Key Features:
    • Copy-Delete Approach:
      1. Copy: Uses an HTTP PUT request with the x-amz-copy-source header to copy the file to the target directory.
      2. Delete: If the copy is successful, deletes the original file.
    • S3 Authentication: Requires S3 keys from environment variables.
    • Error Handling: Provides error messages if the copy or delete fails.

6. create_rules_file(folder_path)

  • Purpose:
    Ensures that a local folder contains a _rules.conf file.
  • Key Features:
    • Default Rules: If _rules.conf does not exist, it is created with the default content CAT.ALL.
    • Usage: This file can help control file visibility during repository uploads.

7. prompt_metadata()

  • Purpose:
    Collects metadata from the user required to create a new repository.
  • Collected Metadata Includes:
    • Basic Information: Title, description, creator, date, language, license URL.
    • Collection: The user selects one from a predefined list (e.g., community, opensource, texts, movies, audio, image, etree, folksoundomy, games, software).
    • Subject Tags: A comma-separated list, such as "music, history".
    • Test Item Flag: A flag to indicate if the repository is a test item (if "yes" is entered, it sends "true"; if "no", the field is omitted).

8. initialize_repository(folder_path, identifier, metadata, mode)

  • Purpose:
    Uploads all files from a specified folder as a new repository.
  • Key Features:
    • Mode Selection:
      • Test Mode: Simulates the upload process without transferring any files.
      • Permanent Mode: Uploads each file via HTTP PUT requests.
    • Recursive Upload: Iterates through all files in the folder (using os.walk).
    • Metadata Submission: After file uploads, sends repository metadata via a POST request.

9. print_help()

  • Purpose:
    Displays a help message describing each menu option and its usage.
  • Features:
    Provides detailed instructions for each operation and references the official Internet Archive CLI documentation for further details.

10. main()

  • Purpose:
    Serves as the entry point of the script with an interactive menu.
  • Key Features:
    • Main Menu: Displays options for uploading, listing, deleting, moving files, creating a repository, or viewing help.
    • Conditional Prompts:
      • For Existing Repositories (options 1–4): Prompts for the repository URL after the option selection.
      • For Folder-based Repository Creation (option 5): Gathers folder path, mode, and metadata.
    • Action Dispatch: Calls the corresponding function based on the user’s selection.

Creator Notes

This script was built using Microsoft Copilot, and 5 hours of Harry Munday's lifespan, Enjoy.

About

Python Wrapper for Internet Archives Insufferble CLI tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages