MJ

WebCraft - Where words create webpages

Natural Language ProcessingLLMFine-tuningOptimizationHuggingFace TransformersPrompt EngineeringPyTorchChatbot

Developed an NLP-driven tool to automatically generate HTML and CSS webpages from natural language prompts by fine-tuning a large language model.

Overview

WebCraft is an innovative system designed to generate complete webpages directly from natural language prompts. Leveraging advanced NLP techniques, the project simplifies webpage creation, making it accessible to users without coding expertise. By fine-tuning various large language models (LLMs), especially CodeLlama, the system reliably outputs structured HTML and CSS ready for browser rendering.

Technical Highlights

  • Fine-Tuned Language Models: Applied efficient fine-tuning methods such as LoRA (Low-Rank Adaptation) on models including GPT-2, Falcon, Llama-2, and particularly CodeLlama for optimized HTML generation.
  • Dataset Acquisition and Preprocessing: Collected over 1.5 million URLs, parsed using BeautifulSoup to extract clean HTML/CSS content, excluding JavaScript, comments, and meta tags.
  • Structured Prompt Generation: Created standardized prompts matching HTML content to efficiently train the LLMs.

Challenges and Solutions

  • Handling Extensive HTML Length: Managed the challenge of large HTML content through chunking and preprocessing techniques.
  • Model Optimization: Utilized LoRA and 8-bit quantization techniques to enhance computational efficiency during model fine-tuning.

Technologies and Tools

  • Python, PyTorch, Hugging Face Transformers
  • BeautifulSoup, LoRA, CodeLlama

My Role

I led the fine-tuning and training of the Llama-2 model, contributing to overall system functionality and demonstrating effective NLP-driven HTML generation capabilities.

Key Results

Successfully fine-tuned the CodeLlama model, resulting in accurate generation of complete HTML and CSS webpages, achieving practical usability with low training and validation losses.