Fashion-MNIST Baseline Classifier
Model Details
- Model Name: fashion-mnist-base
- Framework: Custom implementation in Python
- Version: 0.1
- License: Apache-2.0
Model Description
This is a neural network model developed from the ground up to classify images from the Fashion-MNIST dataset. The dataset comprises 70,000 grayscale images across 10 categories. Each example is a 28x28 grayscale image, associated with a label from 10 classes including T-shirts/tops, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots.
Intended Use
This model is intended for educational purposes and as a baseline for more complex implementations. It can be used by students and AI enthusiasts to understand the workings of neural networks and their application in image classification.
Training Data
The model was trained on the Fashion-MNIST dataset, which contains 60,000 training images and 10,000 test images. Each image is 28x28 pixels, grayscale, associated with one of 10 classes representing different types of clothing and accessories.
Architecture Details:
- Input layer: 784 neurons (flattened 28x28 image)
- Hidden layer 1: 256 neurons, ReLU activation, Dropout
- Hidden layer 2: 64 neurons, ReLU activation, Dropout
- Output layer: 10 neurons, logits
Hyperparameters:
- Learning rate: 0.005
- Batch size: 32
- Epochs: 25
The model uses a self-implemented stochastic gradient descent (SGD) optimizer.
Evaluation Results
The model achieved the following performance on the test set:
- Accuracy: 86.7%
- Precision, Recall, and F1-Score:
Label | Precision | Recall | F1-score |
---|---|---|---|
T-shirt/Top | 0.847514 | 0.767 | 0.805249 |
Trouser | 0.982618 | 0.961 | 0.971689 |
Pullover | 0.800000 | 0.748 | 0.773127 |
Dress | 0.861868 | 0.886 | 0.873767 |
Coat | 0.776278 | 0.805 | 0.790378 |
Sandal | 0.957958 | 0.957 | 0.957479 |
Shirt | 0.638587 | 0.705 | 0.670152 |
Sneaker | 0.935743 | 0.932 | 0.933868 |
Bag | 0.952381 | 0.960 | 0.956175 |
Ankle-Boot | 0.944554 | 0.954 | 0.949254 |
Limitations and Biases
Due to the nature of the training dataset, the model may not capture the full complexity of fashion items in diverse real-world scenarios. In practice, we found out that it is sensitive to background colors and article's proportions.
How to Use
import torch
import torchvision.transforms as transforms
from PIL import Image
model = torch.load('fashion-mnist-base.pt')
# Images need to be transformed to the `fashion MNIST` dataset format
transform = transforms.Compose(
[
transforms.Resize((28, 28)),
transforms.Grayscale(),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)), # Normalization
transforms.Lambda(lambda x: 1.0 - x), # Invert colors
transforms.Lambda(lambda x: x[0]),
transforms.Lambda(lambda x: x.unsqueeze(0)),
]
)
img = Image.open('fashion/dress.png')
img = transform(img)
model.predictions(img)
Sample Output
{'Dress': 84.437744,
'Coat': 7.631796,
'Pullover': 4.2272186,
'Shirt': 1.297625,
'T-shirt/Top': 1.2237197,
'Bag': 0.9053432,
'Trouser/Jeans': 0.27268794,
'Sneaker': 0.0031491981,
'Ankle-Boot': 0.00063403655,
'Sandal': 8.5103806e-05}