📚 Learn AI 📚

Machine Learning / Generative Pretrained Transformers / Neural Networking

🎓 We are making this platform for everyone to learn from beginner to expert. Come join the community and if there's something we're missing please let us know!

🤖 Artificial Intelligence 101: If you are completely new to AI and would like to get started, you're in the right place. Here are some important places to start.

👉 This is my first day learning AI coding. Where should I begin?

🤗 HuggingFace: Take a look around and get familiar with this great resource. It can seem overwhelming at first but it's very simple. It's a very convenient hub to host and download AI models and training datasets.
🪄 Stable Diffusion / Text-Generation-WebUI: There are endless open source projects popping up. Many are great but there are 2 very important ones that should be considered first. Both are not end-goals but rather convenient tools to learn the skill and get started right away.
📦 GitHub/GitLab: These 'Version Control Systems' have been immensely valuable for programmers to collaborate. Now with AI, it's even more valuable then ever both for us and for the AI training data.

🚀 Train Your Own LLM 🚀 🤖 Prompt Engineering 🤖

🔍 Finding a Dataset

First, embark on an adventure to find a corpus of text in Esperanto. No worries, we'll make it fun!

🔍 Explore the Esperanto portion of the OSCAR corpus from INRIA.
📚 Concatenate with the Esperanto sub-corpus of the Leipzig Corpora Collection.

Voilà! You've got yourself a dataset to train your model!

# Find a dataset
# Import necessary libraries
from pathlib import Path

# Define paths to text files
paths = [str(x) for x in Path("./eo_data/").glob("**/*.txt")]

# Display paths
print(paths)

🧠 Training a Tokenizer

Let's whip up a Byte-pair encoding tokenizer like a wizard casting spells!

⚡️ Customize training with ByteLevelBPETokenizer and save it to disk.

🔥 Wow, that was fast! Now you have a powerful tokenizer ready to go!

# Train a tokenizer
# Import necessary libraries
from tokenizers import ByteLevelBPETokenizer

# Initialize a tokenizer
tokenizer = ByteLevelBPETokenizer()

# Customize training
tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2, special_tokens=[
    "",
    "",
    "",
    "",
    "",
])

# Save files to disk
tokenizer.save_model(".", "esperberto")

🎓 Train a Language Model from Scratch

Time to unleash the magic of training a language model!

🚂 Choo-choo! Use run_language_modeling.py script from transformers and experiment with hyperparameters.

🔥🔥🔥 Let the training begin! Watch your model learn and grow!

# Train a language model from scratch
# Import necessary libraries
from transformers import RobertaConfig, RobertaForMaskedLM, RobertaTokenizer

# Load tokenizer
tokenizer = RobertaTokenizer.from_pretrained("esperberto")

# Configure model
config = RobertaConfig(
    vocab_size=52_000,
    max_position_embeddings=514,
    num_attention_heads=12,
    num_hidden_layers=6,
    type_vocab_size=1,
)

# Initialize model
model = RobertaForMaskedLM(config=config)

# Print model architecture
print(model)

🔍 Checking Your LM

Is your language model actually learning something cool?

🔍 Peek into the FillMaskPipeline to see what your LM can do!

😃 Have fun with simple and complex prompts to see the magic unfold!

# Check the LM
# Import necessary libraries
from transformers import pipeline

# Create FillMaskPipeline
fill_mask = pipeline(
    "fill-mask",
    model="./models/EsperBERTo-small",
    tokenizer="./models/EsperBERTo-small"
)

# Test the LM
result = fill_mask("La suno .")
print(result)

🎯 Fine-tuning Your LM

Time to fine-tune your LM on a downstream task! Let's level up!

⚙️ Fine-tune your model for Part-of-speech tagging. Easy peasy!

🔥 Your model is evolving! Check out those losses converge!

# Fine-tune the LM
# Import necessary libraries
from transformers import RobertaForTokenClassification, Trainer, TrainingArguments

# Define fine-tuning arguments
training_args = TrainingArguments(
    output_dir="./models/EsperBERTo-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
)

# Initialize model for fine-tuning
model = RobertaForTokenClassification.from_pretrained("esperberto")

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Start fine-tuning
trainer.train()

🌟 Share Your Model

🙂 Congratulations! You've created a masterpiece! It's time to share it with the world!

📦 Upload your model using the CLI and write a cool README.md!

🎉 TADA! Your model has a page on huggingface.co/models for everyone to enjoy!

HF Learning

🔗https://jdwebprogrammer-echocardiogram-segmentation.hf.space

🔗https://jdwebprogrammer-deep-neural-networks-for-navier-96eff78.hf.space

🔗https://jdwebprogrammer-text-embeddings-transformers.hf.space

Showing the top 3 hflearn results. View all hflearn posts.

Git

🔗https://github.com/Stability-AI/generative-models

🔗https://github.com/showlab/MotionDirector

🔗https://github.com/searxng/searxng

Showing the top 3 git results. View all git posts.

Websites

🔗https://www.futuretools.io

🔗https://sd-parseq.web.app

🔗https://openfuture.ai

Showing the top 3 websites results. View all websites posts.

Give Feedback Become A Patreon

Privacy Policy | Terms Of Use

Copyright © AutoSynthetix 2024