Developer Guide

Intro

The Discourse Analysis Tool Suite (DATS) is developed as a client-server application and gets deployed via docker-compose. Since backend and frontend development are very different, this guide is split in two.

The setup described here uses docker-compose to run auxiliary software like PostgreSQL, Redis, etc., while running the backend and/or frontend outside of docker in a development environment.

Setting up the developer environment

You will need a Linux machine with Docker, optimally with NVIDIA Container Toolkit for GPU support. Other operation systems are not supported!

Install VSCode for development. We curated a list of extensions that are required for developing DATS and wrote several launch configurations that help with the development.
Clone this repository git clone git@github.com:uhh-lt/dats.git
Install the recommended VSCode plugins. Once you open the dats folder, a popup asks if the recommended extensions should be installed automatically. Answer with yes.
Install python environment. We use Micromamba to manage Python versions.
1. See micromamba install instructions here. We use these commands curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba, then ./bin/micromamba shell init -s bash -r ~/micromamba. Make sure you install mircomamba in your home direction (cd ~). After installing, it is required to restart the terminal, otherwise, the new micromamba command is not recognized.
2. We recommend to add an alias to your .bashrc: alias mm='micromamba'. Use this to edit the file nano ~/.bashrc.
3. Install the environment: First navigate to the backend with cd backend, then install the dats environment with micromamba env create -f environment.yml
4. Activate the environment with micromamba activate dats
5. Install additional dependencies from our ray_model_worker service: pip install -r src/app/preprocessing/ray_model_worker/requirements.txt
6. Finally, install the ray dependency: pip install ray==2.32.0
7. Set the newly created dats python environment as the Python Interpreter in VS Code. Use the shortcut CTRL/CMD + SHIFT + P and type Select Interpreter. From the dropdown, select the micromamba dats environment.
Install the Node.js environment. We use node version manager (nvm)
1. See the nvm install instructions here. We use these commands curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash. Make sure you install nvm in your home direction (cd ~). After installation, restarting the terminal is required; otherwise, the new nvm command is not recognized.
2. Install Node 20.17.0: nvm install 20.17.0
3. Install the environment: First navigate to the frontend with cd frontend, then install the dependencies with npm install
Install pre-commit. We use pre-commit for automatic code linting for the backend. (Prettier is used for the frontend).
1. Install the pre-commit dependency with pip install pre-commit.
2. Initialize pre-commit with pre-commit install
Set up the environment files. We use env.example files in the backend, frontend, and docker directories to configure various settings, including, for example, port mappings, user IDs, or the project name. To ease the process, we provide a simple script: ./bin/setup-envs.sh --project_name <user_name>-dats --port_prefix <3 digits>. Please adjust the port_prefix accordingly and take care to prevent conflicts with other services running on your machine.

Done! Now you have everything installed to get started with DATS development.

Docker Guide

We start with setting up the services running inside docker containers:

Configure the docker containers. The .env file was created automatically with the previously executed setup-envs.sh script. You can check that file and make adjustments.
1. Adjust COMPOSE_PROFILES to change which services will be started. For development, we recommend removing the backend and frontend from this list. We won't need it because we will start the backend and frontend ourselves outside of docker. Add development to the list.
2. We recommend to use RAY_CONFIG=config_cpu.yaml for development.
Run the script ./setup-folders.sh to automatically create directories for storing and caching ML models as well as uploaded documents.
Run docker compose up -d to start all docker containers. Use docker compose ps to check that all containers are running. On the first start, this will take quite a while, as the container will download some large AI models.

Resetting docker services

For a full, complete restart of the docker containers, we recommend the following procedure:

docker compose down -v - Stop all containers and remove the volumes.
rm -r backend_repo - Remove all uploaded user data
../bin/setup-folders.sh - Create the backend_repo folder again
docker compose pull - Download the lastest docker containers of DATS
docker compose up -d - Restart all docker containers

Backend Guide

Intro and Structure

The backend is written in Python and is organized in directories each responsible for different tasks.

backend/src/api -> REST API endpoints consumed by the frontend (or other clients)
backend/src/app/core -> core backend logic such as the data model, internal service modules, search and analysis functionality
backend/src/app/docprepro -> document preprocessing logic
backend/src/test -> unit, integration, and e2e tests
backend/src/configs -> configuration files to customize the backend behavior (handle with care!)

Development setup

Configure the backend. The .env file was created automatically with the previously executed setup-envs.sh script. You can check that file and make adjustments.
In VSCODE, navigate to 'Run and Debug' and launch 'backend'.

Building for Production With Docker

This section describes deploying the backend API to production.

One container is built for the backend-api and the celery-background-jobs-worker as they share the same dependencies. The startup command (entrypoint) decides if the container is serving the API or running as a worker. The different entrypoints can be found at backend/src/*_entrypoint.sh, where every script corresponds to precisely one worker type. The backend source code is copied into the container, and so are the config files in backend/src/configs, which can be altered to configure various settings.

Build docker container: docker build -f Dockerfile -t uhhlt/dats_backend:{version}.
Push docker container: docker push uhhlt/dats_backend:{version}

Data Model

The Data Model (DM) (located at backend/src/app/core/data/) is the core of the DATS and represents all it's entities. The DM is based on sqlalchemy and pydantic. Although sqlalchemy is (mostly) database agnostic, the intended database for the DATS DM is PostgreSQL.

The entities (i.e. database tables) are based on sqlalchemy and are located in backend/src/app/core/data/orm -- ORM stands for Object-Relational Mapping and defines the bridge between python classes and objects and the database. To perform Create-Read-Update-Delete (CRUD) operations we use the interfaces defined in backend/src/app/core/data/crud. To transfer entities between the frontend and the backend (and also often internally in the backend) we make use the DTOs (Data Transfer Objects) based on pydantic, which are located in backend/src/app/core/data/dto.

How to extend the Data Model

In this guide, you will learn all steps necessary to extend the DATS Data Model.

Please always have a look at implementations of other existing entities for examples and coding/naming conventions!

Step 1: Define the ORM

To define the ORM, first create a new file in backend/src/app/core/data/orm. For examples and further help, have a look at other files defined in this directory and the sqlalchemy documentation.

If you need an ObjectHandle (basically a pointer) for that entity (e.g. to enable Memos for that entity), you also have to extend the ObjectHandleORM accordingly.

Step 2: Register the ORM and create a migration

To register the ORM, import the class in the import_all_orms module (backend/src/app/core/data/db/import_all_orms.py). Then tell alembic to look at that file and generate a migration for models that are not present in the database yet. In VSCODE, navigate to

alembic revision --autogenerate -m "short description"

Step 3: Define the DTOs

To define the DTOs, first, create a new file in backend/src/app/core/data/dto with the same filename as the ORM file. In that file, add the DTOs as separate classes that inherit from each other (if possible).

Step 4: Implement CRUD

To implement the CRUD object, first, create a new file in backend/src/app/core/data/crud with the same filename as the ORM and DTO file. In that file, create a CRUD class that inherits from CRUDBase with the DTOs from the previous step. This already provides basic CRUD operations. If you need to customize or add operations, implement the respective methods in this class.

Frontend Guide

Intro and Structure

The frontend is written in TypeScript using React and Vite. It is organized in directories each responsible for different tasks.

frontend/src/api -> custom TanStack Query hooks to communicate with the backend via the automatically generated API client in openapi.
frontend/src/components -> reusable UI components
frontend/src/features -> components that implement logic and access the API to build a feature. These features are used across the app and are not specific to a certain route.
frontend/src/layouts -> different page layouts used by the routing library.
frontend/src/plugins -> configuration files to customize the behavior of various plugins.
frontend/src/router -> configuration of the React Router.
frontend/src/store -> configuration of the global store - Redux Toolkit.
frontend/src/view -> the main directory of the app. Every subfolder corresponds to a route. All features specific only to the corresponding route are implemented here.

Development Setup

Configure the frontend. The .env file was created automatically with the previously executed setup-envs.sh script. You can check that file and make adjustments.
Run npm run dev to start the frontend. Check https://localhost:{PORT} to see if everything is working.
Run npm run tsc-watch to watch for errors during development. Run this command in a separate terminal.

Build for Production With Docker

When deploying the frontend, the React code is first bundled and then served via an NGINX web server. In production mode, the NGINX web server also acts as a reverse proxy for communication with the backend. frontend/.env.production/ is configured to make the backend api available at /api and the content server available at /content. The matching configuration of the NGINX web server is located at /docker/nginx.conf.

Build docker container: docker build -f Dockerfile -t uhhlt/dats_frontend:{version} .
(optional) Push docker container: docker push uhhlt/dats_frontend:{version}

Communication with the backend

The backend uses FastAPI to serve an accessible API that follows the OpenAPI standard. We consume this API by automatically generating an OpenAPI client with OpenAPI Typescript Codegen and build hooks around that with TanStack Query, that can be conveniently used in all components.

In case the backend is updated and offers new API endpoints, the following steps must be performed to make them available in the frontend:

npm run update-api - download the new OpenAPI specification of the backend.
npm run generate-api - generate the new client. This command deletes everything in frontend/src/api/openapi, generates new code, and formats it with prettier.
npm run dev - start the frontend again.
Implement a new Hook in frontend/src/api.

Versioning

We use Semantic Versioning as explained below:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes

MINOR version when you add functionality in a backward compatible manner

PATCH version when you make backward compatible bug fixes

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

For reference, see https://semver.org/.

Releasing a New Version

In the root of the repository, run:

bin/release.sh 1.0.3

This will update all necessary files, create a commit and matching tag, and push it. The GitHub action will then take care of building containers and creating a GitHub release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer Guide

Intro

Setting up the developer environment

Docker Guide

Resetting docker services

Backend Guide

Intro and Structure

Development setup

Building for Production With Docker

Data Model

How to extend the Data Model

Step 1: Define the ORM

Step 2: Register the ORM and create a migration

Step 3: Define the DTOs

Step 4: Implement CRUD

Frontend Guide

Intro and Structure

Development Setup

Build for Production With Docker

Communication with the backend

Versioning

Releasing a New Version

Clone this wiki locally