image-dataset-converter release

Based on lessons learned from our wai-annotations library, we simplified and streamlined the design of a data processing library (though limited to just image datasets). Of course, it makes use of the latest seppl version, which also simplified how plugins are being located at runtime and development time.

The new kid on the block is called image-dataset-converter and its code is located here:

https://github.com/waikato-datamining/image-dataset-converter

Whilst it is based on wai-annotations, it already contains additional functionality.

And, of course, we also have resources demonstrating how to use the new library:

https://www.data-mining.co.nz/image-dataset-converter-examples/

XTuner Docker images available

Docker images for XTuner 0.1.18 are now available:

  • In-house registry:

    • public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-xtuner:0.1.18_cuda11.7

  • Docker hub:

    • waikatodatamining/pytorch-xtuner:0.1.18_cuda11.7

XTuner 0.1.18 now supports the just released llama-3 models (e.g., Meta-Llama-3-8B-Instruct).

XTuner Docker images available

XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM) and released under the Apache 2.0 license. The advantage of this framework is that it is not tied down to a specific LLM architecture, but supports multiple ones out of the box. With the just released version v0.2.0 of our llm-dataset-converter Python library, you can read and write the XTuner JSON format (and apply the usual filtering, of course).

Here are the newly added image tags:

  • In-house registry:

    • public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-xtuner:2024-02-19_cuda11.7

  • Docker hub:

    • waikatodatamining/pytorch-xtuner:2024-02-19_cuda11.7

Of course, you can use these Docker images in conjunction with our gifr Python library for gradio interfaces as well (gifr-textgen). Just now we released version 0.0.4 of the library, which is more flexible in regards to text generation: it can now support send and receive the conversation history and also parse JSON responses.

Text classification support

Large language models (LLMs) for chatbots are all the rage at the moment, but there is plenty of scope of simpler tasks like text classification. Requiring less resources and being a lot faster is nice as well.

We turned the HuggingFace example for sequence classification into a docker image to make it easy for building such classification models.

  • In-house registry:

    • public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.36.0_cuda11.7_classification

  • Docker hub:

    • waikatodatamining/pytorch-huggingface-transformers:4.36.0_cuda11.7_classification

Our gifr Python library for gradio received an interface for text classification (gifr-textclass) in version 0.0.3.

The llm-dataset-converter library obtained native support for text classification formats with version 0.1.1.