Developer guide#
SIRA application consists of:
- Flask (Python) backend application
- React (Javascript) frontend application
- Postgresql database
- Python daemon for observing the database changes and reacting accordingly
The deployment and development is supported by the Docker containers. Therefore, some basic knowledge on how to manage docker containers is required. Additionally, the configuration for docker container is done through docker-compose.
Those tools are also the only application software requirements:
Due to the nature of the application the hardware reccommendations are:
- 16GB RAM
- 2 CPUs
- 30GB disk space
High RAM usage is a result of loading the word-vectors model, used for text representation, into memory and the way docker image building works.
High disk space usage is a result of word-vectors model used, the data that is loaded into the database, and the generated docker images. In case of smaller disk space availability, frequent cleaning of unused docker images, containers and volumes is neccessary.
Running development version#
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml up -d
It is recommended to use Ctrl+F5 to reload the page during
development. Otherwise it is likely for the static files to be loaded
from cache.
Backend#
All the application logic is located in the ./email_app directory. Dependencies
and virtual environment package management is done with poetry tool.
The tests for the backend logic are located in the ./tests directory and split further
by module. pytest is used for the testing framework.
Frontend#
All javascript stuff should be located in the ./email_app_frontend
directory. The main entrypoint is index.js.
Stylesheet and other static content location
While all the javascript files are located in ./email_app_frontend, other
static content is located in ./email_app/src/dist. This was used to simplify
the deployment.
The ./email_app/src/dist/main.js file is generated by webpack and
should not be edited.
npm is used as the package manager.
Database#
Postgresql is used for the database. Python access is enabled through the
sqlalchemy package.
To support migrations, the alembic package is utilized.
The migration-related scripts are located in the ./migrations directory.
See Databases section for more information on databases.
Scripts#
During the development, several useful scripts were compiled, to ease everyday project management.
Scripts are located in the ./scripts directory and further divided into:
- Local scripts: used for managing application through docker commands
- Docker scripts: used for managing the application within docker
Data#
Some of the initial data, used for filling up the database is downloaded before the application start
into the ./db_data directory.
See Databases for additional information.
Another larger data used for running the application are the word vectors, located in the word_vectors
directory. Gziped word vectors file amount to nearly 4GB and are downloaded before the application start.
After inflating the gziped file, nearly 7GB file is obtained.
Because of the data heavy requirements the initial application run may take some time and can take up a significant amount of disk space.
Cleaning the dev setup#
Sometimes you may have to completely clean your dev setup. You can do this in two ways:
All-in-one clean#
Docker-compose tool comes with a command to quickly clean everything related to the services specified in docker-compose files.
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml down --rmi 'all' --volumes
One-by-one clean#
If problems occur using the all-in-one approach, then follow these steps.
Stop containers#
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml down
Remove containers and images#
The names could be different. You have to check what containers were just stopped.
You can check the stopped containers with docker ps -a.
docker rm nicelabel-email-app-app nicelabel-email-app-db
docker rmi email-app_app
Remove volumes#
You have to remove the volumes used by the service containers.
docker volume rm email-app_nicelable-email-app-data email-app_nicelabel-runtime-data
The name could be different on your machine. Check the docker volumes created with
docker volume ls
and search for the appropriate name.
Rebuild services#
Maybe you'll want to use this command before: docker login registry-dis.ijs.si and use your credentials to log in if you have not done that yet.
To rebuild the app without cache use the --no-cache and --pull flags.
Due to the unpredicted nature of the Internet and package providers, you may have to issue the command multiple times.
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml build --no-cache --pull
Start the services#
It is useful to start the services using the -d flag, which will run the services in
the background. This way, you cannot accidentally stop the services.
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml up -d
View the logs in another terminal#
If you have started the services with the -d flag as suggested above,
then you can view the logs in another terminal using the following command.
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml logs -f
Add --tail NUM_LINES option to only see the last NUM_LINES lines of the log.
If you want to view the logs of a particular service only, specify the service at the end of command
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml logs -f app
Populate the DB with query columns#
When starting the server the DB should already be populated with the columns used
for querying. This is done in the events container and may take some time.
If this is not the case you can do that manually following the next steps:
docker-compose -f docker-compose.yaml -f docker-compose-dev.yaml exec app bash
poetry install
python scripts/docker/run_clean_all.py
This will take some time. Search on all issues will not work properly until each and every issue was processed.
Example of issue id with results: 76746
Development images in CI/CD#
If you modify the Docker images and would like to perform tests with them using the CI/CD, you can do the following.
First, build and push the images to the remote image repository.
./scripts/local/db-build-push.sh <YOUR-TAG>
./scripts/local/app-build-push.sh <YOUR-TAG>
YOUR-TAG should not be latest
Only use the tag latest just before merging the merge request. Else
the CI/CD may get broken.
Second, update your .gitlab-ci.yml file to use the newly pushed images.
Only just before succesfull merge request merge, one should also issue the following commands:
./scripts/local/app-build-push.sh latest
./scripts/local/db-build-push.sh latest
The commands update the latest docker images in docker image repository.
Databases#
Development Database#
The database will be downloaded from the OwnCloud on first start and put into the ./db_data directory.
If you want a smaller database for faster set-up, you can use the test database.
See ./run-test.sh for inspiration on how to download that smaller data.
Test Database#
The database will be downloaded from the OwnCloud during the test start-up and put into correct place.
Test Database generation#
In order to make a test database, which is a subsample of the dev database, you first have to have the dev database running.
After that you exec into the database container using the psql command to access the database:
docker-compose exec db psql -U postgres
The following commands were issued in order to retrieve a subsample of the data:
- In order to subsample issues:
copy (SELECT id from issues OFFSET 250 LIMIT 50) to '/tmp/issues_small.csv' DELIMITER ',' CSV HEADER;
- In order to subsample related emails:
copy (select id,issue_id,sender,subject,full_text,sent_time from emails WHERE issue_id IN (SELECT id from issues OFFSET 250 LIMIT 50)) to '/tmp/emails_small.csv' DELIMITER ',' CSV HEADER;
Now get the data outside of the container (cd into the directory, where you want the data to be placed):
docker cp nicelabel-email-app-db:/tmp/issues_small.csv issues_small.csv
docker cp nicelabel-email-app-db:/tmp/emails_small.csv emails_small.csv
And update the owner (cd into the directory with the data):
sudo chown $USER:$USER *
Now you have to replace the first line (header) in the emails_small.csv file with;
-original_id,issue_id,sender,subject,full_text,sent_time
+original_id,issue_id,sender,subject,full_text,time
Random notes#
Frontend#
s2p means "map state to properties". Function that start with this
take the current value of the store, usually refered to as state, and
return an object whose properties are accessible through the
this.props object in the component, the function is connected to.
When components are defined, things often look like:
class Component extends React.Component ...
function s2pComponent ...
const ComponentC = connect(s2pComponent)(Component)
The ComponentC in this case is the connected component and the one
to be used in other comopnents. the C in the end stands for
connected. When a component is exported as default the last line in
the above is replaced with
export default connect(s2pComponent)(Component);
and no ComponentC is defined.
When e is an argument to any function it stands for javascript
event.
React + Redux overview#
React uses components that are capable of rerendering when their
properties change. The method called to render a component is called
render.
Redux is a tool to keep all of the application state in a single place
called the store. To change the state, use store.dispatch(action),
where action can be any object with a type property. This action is
then actualized through a "reducer" (a function that takes the old
state and an action and returs the new state), that has the final
effect of updating the state. Components see the state through "map
state to properties" functions. The role of these functions is to take
the state and produce the properties of a component that are
changeable and should be sychronized from state.
In order to enable asynchronous actions, we are using the thunk
middleware. Basicaly, this allows us to pass functions into
store.dispatch. The function passed should accept two arguments:
- dispatch -> a copy of the
store.dispatchfunction, passed so it can be used in callbacks, used to change state after the async action is finished - getState -> a function that, called without arguments will return the state
This is only ever used for ajax.
Releases#
Releases can be tracked at the Releases page.
Making a release#
In order to make a release, the following steps are required.
- Make a new merge request titled
Release YYYY.MM.DD - Update the
CHANGELOG.mdfile by renaming the sectionCurrentinto release versionYYYY.MM.DD. -
Generate docker image artifacts from the source code using the following script:
./scripts/local/release-build-publish.sh YYYY.MM.DD -
Commit and push the changes to the remote repository.
- Accept merge request.
- Go to Releases page and click
New release - Set the
Tag nameto release nameYYYY.MM.DDand copy the related changelog from theCHANGELOG.mdto theRelease notesfield. - Click
Create tagand the release is done.
Troubleshooting#
Send an email to one of the JSI - DIS contact point for help in troubleshooting.
FAQ#
Here will be a list of frequently asked questions and issues and provided answers.