| Abstract: |
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of general-purpose tasks. However, their effectiveness in specialized domains remains limited by outdated knowledge, insufficient domain-specific precision, and significant computational requirements. This project explores the application of fine-tuning and Retrieval-Augmented Generation (RAG) techniques to develop specialized agents tailored to the Portuguese legal domain. This work presents a design and implementation of a fine-tuning (FT) pipeline based on the Phi-1.5 model using Low-Rank Adaptation (LoRA), trained on curated legal texts extracted from Diário da República, the official gazette of Portuguese legislation. The development environment was stabilized through the version alignment of key machine learning libraries, including PyTorch, Transformers, PEFT, and Accelerate, ensuring reproducibility and compatibility across the training workflow. Preliminary training results show a consistent reduction in loss values throughout the fine-tuning process, supporting the technical feasibility of the proposed methodology. Although the RAG component remains part of future work, current results establish a reproducible foundation for fine-tuning LLMs in resource-constrained environments, particularly for Portuguese legal texts. Complementarily this work identifies and discusses key challenges and opportunities associated with adapting LLMs to high-stakes professional domains, such as the legal sector. |