While working with the Zooniverse, I co-led the development of the back-end API for the ALICE Text Editor Tool. ALICE allows researchers to review and update transcription data from projects such as Scribes of the Cairo Geneza and the Anti-Slavery Manuscripts. Using ALICE’s interface, researchers can view the collection of transcriptions made for each line of text in a document and choose which will be used for the final transcription, as well as view consensus percentages for each line. ALICE also provides an option to export the final transcription (or all transcriptions in a project) once the data has been reviewed.
The API was built using Ruby on Rails, and uses PostgreSQL 11 as its database. The main focus of my work was on the data exports and model locking functionality, along with minor functionality such as sorting/filtering and serializing transcriptions.
- Generate a set of export files for a transcription when the transcription status changes to “approved” (when a transcription is approved, users are no longer allowed to make changes). Export files are created as temporary files, then uploaded to Blob Storage using Rails Active Storage.
- Files are removed from Blob Storage when transcription is changed from “approved” to another status, such as “in progress.”
- Add option to download export files by project, workflow, or transcription group. All relevant files are downloaded from Blob Storage into a temporary directory, zipped, and then made available to the client.
- Allow exporting only for users with editor or admin roles for the given project.
- See data exports code here.
Model locking functionality was also implemented so that only one user at a time could edit a transcription. This prevents changes made by one user from being overwritten by another user who had the same transcription open at the same time.
- Add attributes to save the username of the user currently updating the transcription (“locked by” user) as well as a “lock timeout” timestamp, which is set by default to be 3 hours from the point when the transcription is initially locked
- Expose an API method to lock a transcription. This method gets called by the front-end whenever a user opens a transcription for editing, and only locks the transcription if the user has editor permission.
- Expose an API method to unlock a transcription. This method gets called by the front-end when a user navigates away from a transcription or closes the web browser.
- Serialize “locked by” user, such that the “locked by” username is displayed on the transcription list view page, and team members know who to contact when a transcription is locked.
- See model locking code here.
Check out a demo video of the application.
- Code github.com/zooniverse/tove
- Demo https://vimeo.com/454881046
- Tools Ruby on Rails, PostgreSQL, Azure Blob Storage, Docker