This paper presents a text-conditioned face generation framework that fine-tunes a pretrained diffusion model using Low-Rank Adaptation (LoRA). The proposed method uses the CelebA dataset, in which facial attributes are converted into descriptive captions to enable controlled image synthesis. Stable Diffusion v1.5 is fine-tuned using LoRA to achieve efficient domain adaptation with reduced computational overhead. The model is evaluated using CLIPScore to measure semantic alignment and Fréchet Inception Distance (FID) to assess visual realism. Experimental results show that the model achieves a mean CLIPScore of approximately 0.32, indicating strong correspondence between textual prompts and generated images. However, the FID score of 107 suggests a gap in distributional similarity with real images. The study highlights the trade-off between semantic alignment and realism, as well as the importance of training dynamics in diffusion-based fine-tuning.
@artical{c1552026ijcatr15051001,
Title = "Text-Guided Face Generation Using LoRA Fine-Tuned Stable Diffusion on the CelebA Dataset",
Journal ="International Journal of Computer Applications Technology and Research (IJCATR)",
Volume = "15",
Issue ="5",
Pages ="1 - 6",
Year = "2026",
Authors ="Ceena Mathews"}