A non-exhaustive list of our open-source software:

ADAMS

A flexible workflow engine written in Java, for quickly building and maintaining data-driven, reactive workflows, easily integrated into business processes.

More...

Docker images

Most of the software that we employ for our projects is open source. Therefore, the Docker images that we maintain for these projects (including extensions/enhancements) are publicly available and ready for you to use.

More...

audio-dataset-converter

Training speech-to-text (STT) models requires data to be available in particular formats. Check out the examples for using our Python library for converting and processing audio dataset formats or even extracting speech from audio files.

More...

image-dataset-converter

Thinking about building image classification, object detection or image segmentation models? Then have a look the examples for using our Python library for converting and processing image dataset formats. You can even incorporate (pre-)trained models to generate annotations to further refine.

More...

llm-dataset-converter

Working with diverse data sources (plain text, PDF, MS Word, Parquet DBs, etc.) can be challenging for compiling quality data for your large language model (LLM) training run. With our Python library for converting and processing LLM datasets, this will become much easier and you can do that straight from the command-line.

More...

Github

Github organizations that encompass our code repositories, including libraries and Docker images:

waikato-datamining waikato-llm