Developer guide#

SIRA application consists of:

Flask (Python) backend application
React (Javascript) frontend application
Postgresql database
Python daemon for observing the database changes and reacting accordingly

The deployment and development is supported by the Docker containers. Therefore, some basic knowledge on how to manage docker containers is required. Additionally, the configuration for docker container is done through docker-compose.

Those tools are also the only application software requirements:

Due to the nature of the application the hardware reccommendations are:

16GB RAM
2 CPUs
30GB disk space

High RAM usage is a result of loading the word-vectors model, used for text representation, into memory and the way docker image building works.

High disk space usage is a result of word-vectors model used, the data that is loaded into the database, and the generated docker images. In case of smaller disk space availability, frequent cleaning of unused docker images, containers and volumes is neccessary.

Running development version#

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml up -d

It is recommended to use Ctrl+F5 to reload the page during development. Otherwise it is likely for the static files to be loaded from cache.

Backend#

All the application logic is located in the ./email_app directory. Dependencies and virtual environment package management is done with poetry tool.

The tests for the backend logic are located in the ./tests directory and split further by module. pytest is used for the testing framework.

Frontend#

All javascript stuff should be located in the ./email_app_frontend directory. The main entrypoint is index.js.

Stylesheet and other static content location

While all the javascript files are located in ./email_app_frontend, other static content is located in ./email_app/src/dist. This was used to simplify the deployment.

The ./email_app/src/dist/main.js file is generated by webpack and should not be edited.

npm is used as the package manager.

Database#

Postgresql is used for the database. Python access is enabled through the sqlalchemy package. To support migrations, the alembic package is utilized. The migration-related scripts are located in the ./migrations directory.

See Databases section for more information on databases.

Scripts#

During the development, several useful scripts were compiled, to ease everyday project management. Scripts are located in the ./scripts directory and further divided into:

Local scripts: used for managing application through docker commands
Docker scripts: used for managing the application within docker

Data#

Some of the initial data, used for filling up the database is downloaded before the application start into the ./db_data directory. See Databases for additional information.

Another larger data used for running the application are the word vectors, located in the word_vectors directory. Gziped word vectors file amount to nearly 4GB and are downloaded before the application start. After inflating the gziped file, nearly 7GB file is obtained.

Because of the data heavy requirements the initial application run may take some time and can take up a significant amount of disk space.

Cleaning the dev setup#

Sometimes you may have to completely clean your dev setup. You can do this in two ways:

Quick
Slow

All-in-one clean#

Docker-compose tool comes with a command to quickly clean everything related to the services specified in docker-compose files.

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml down --rmi 'all' --volumes

One-by-one clean#

If problems occur using the all-in-one approach, then follow these steps.

Stop containers#

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml down

Remove containers and images#

The names could be different. You have to check what containers were just stopped. You can check the stopped containers with docker ps -a.

docker rm nicelabel-email-app-app nicelabel-email-app-db
docker rmi email-app_app

Remove volumes#

You have to remove the volumes used by the service containers.

docker volume rm email-app_nicelable-email-app-data email-app_nicelabel-runtime-data

The name could be different on your machine. Check the docker volumes created with

docker volume ls

and search for the appropriate name.

Rebuild services#

Maybe you'll want to use this command before: docker login registry-dis.ijs.si and use your credentials to log in if you have not done that yet.

To rebuild the app without cache use the --no-cache and --pull flags. Due to the unpredicted nature of the Internet and package providers, you may have to issue the command multiple times.

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml build --no-cache --pull

Start the services#

It is useful to start the services using the -d flag, which will run the services in the background. This way, you cannot accidentally stop the services.

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml up -d

View the logs in another terminal#

If you have started the services with the -d flag as suggested above, then you can view the logs in another terminal using the following command.

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml logs -f

Add --tail NUM_LINES option to only see the last NUM_LINES lines of the log.

If you want to view the logs of a particular service only, specify the service at the end of command

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml logs -f app

Populate the DB with query columns#

When starting the server the DB should already be populated with the columns used for querying. This is done in the events container and may take some time.

If this is not the case you can do that manually following the next steps:

docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml exec app bash
poetry install
python scripts/docker/run_clean_all.py

This will take some time. Search on all issues will not work properly until each and every issue was processed.

Example of issue id with results: 76746

Development images in CI/CD#

If you modify the Docker images and would like to perform tests with them using the CI/CD, you can do the following.

First, build and push the images to the remote image repository.

./scripts/local/db-build-push.sh <YOUR-TAG>
./scripts/local/app-build-push.sh <YOUR-TAG>

YOUR-TAG should not be latest

Only use the tag latest just before merging the merge request. Else the CI/CD may get broken.

Second, update your .gitlab-ci.yml file to use the newly pushed images.

Only just before succesfull merge request merge, one should also issue the following commands:

./scripts/local/app-build-push.sh latest
./scripts/local/db-build-push.sh latest

The commands update the latest docker images in docker image repository.

Databases#

Development Database#

The database will be downloaded from the OwnCloud on first start and put into the ./db_data directory. If you want a smaller database for faster set-up, you can use the test database. See ./run-test.sh for inspiration on how to download that smaller data.

Test Database#

The database will be downloaded from the OwnCloud during the test start-up and put into correct place.

Test Database generation#

In order to make a test database, which is a subsample of the dev database, you first have to have the dev database running. After that you exec into the database container using the psql command to access the database:

docker-compose exec db psql -U postgres

The following commands were issued in order to retrieve a subsample of the data:

In order to subsample issues:

copy (SELECT id from issues OFFSET 250 LIMIT 50) to '/tmp/issues_small.csv' DELIMITER ',' CSV HEADER;

In order to subsample related emails:

copy (select id,issue_id,sender,subject,full_text,sent_time from emails WHERE issue_id IN (SELECT id from issues OFFSET 250 LIMIT 50)) to '/tmp/emails_small.csv' DELIMITER ',' CSV HEADER;

Now get the data outside of the container (cd into the directory, where you want the data to be placed):

docker cp nicelabel-email-app-db:/tmp/issues_small.csv issues_small.csv
docker cp nicelabel-email-app-db:/tmp/emails_small.csv emails_small.csv

And update the owner (cd into the directory with the data):

sudo chown $USER:$USER *

Now you have to replace the first line (header) in the emails_small.csv file with;

-original_id,issue_id,sender,subject,full_text,sent_time
+original_id,issue_id,sender,subject,full_text,time

Random notes#

Frontend#

s2p means "map state to properties". Function that start with this take the current value of the store, usually refered to as state, and return an object whose properties are accessible through the this.props object in the component, the function is connected to.

When components are defined, things often look like:

class Component extends React.Component ...

function s2pComponent ...

const ComponentC = connect(s2pComponent)(Component)

The ComponentC in this case is the connected component and the one to be used in other comopnents. the C in the end stands for connected. When a component is exported as default the last line in the above is replaced with

export default connect(s2pComponent)(Component);

and no ComponentC is defined.

When e is an argument to any function it stands for javascript event.

React + Redux overview#

React uses components that are capable of rerendering when their properties change. The method called to render a component is called render.

Redux is a tool to keep all of the application state in a single place called the store. To change the state, use store.dispatch(action), where action can be any object with a type property. This action is then actualized through a "reducer" (a function that takes the old state and an action and returs the new state), that has the final effect of updating the state. Components see the state through "map state to properties" functions. The role of these functions is to take the state and produce the properties of a component that are changeable and should be sychronized from state.

In order to enable asynchronous actions, we are using the thunk middleware. Basicaly, this allows us to pass functions into store.dispatch. The function passed should accept two arguments:

dispatch -> a copy of the store.dispatch function, passed so it can be used in callbacks, used to change state after the async action is finished
getState -> a function that, called without arguments will return the state

This is only ever used for ajax.

Releases#

Releases can be tracked at the Releases page.

Making a release#

In order to make a release, the following steps are required.

Make a new merge request titled Release YYYY.MM.DD
Update the CHANGELOG.md file by renaming the section Current into release version YYYY.MM.DD.
Generate docker image artifacts from the source code using the following script:

./scripts/local/release-build-publish.sh YYYY.MM.DD
Commit and push the changes to the remote repository.
Accept merge request.
Go to Releases page and click New release
Set the Tag name to release name YYYY.MM.DD and copy the related changelog from the CHANGELOG.md to the Release notes field.
Click Create tag and the release is done.

Troubleshooting#

Send an email to one of the JSI - DIS contact point for help in troubleshooting.

FAQ#

Here will be a list of frequently asked questions and issues and provided answers.