Neural Computing Project: Food Classification CNN
As part of the Neural Computing course we engineered a full deep learning pipeline that recognises 91 distinct food categories from a dataset of roughly 67 000 labelled photographs. The project focused on designing a strong baseline trained from scratch (no ImageNet pretraining), exploring how far careful data processing, architectural refinement, and modern optimisation could push a classic convolutional network. Beyond the core classifier, we connected the model to Google’s Gemini LLM API, which transforms raw predictions into personalised food profiles, turning label probabilities into short narrative descriptions of culinary preferences.
Pipeline Overview
Our system revisits AlexNet, updated with contemporary training practices to maximise performance on a moderately sized, imbalanced dataset.
Key components include:
- Data pipeline: 80/20 train–validation split with torchvision.transforms augmentations (random crops, flips, rotation, colour jitter) to enrich variability.
- Architecture: modified AlexNet backbone with seven convolutional layers, Batch Normalization, and ReLU activations, followed by global average pooling and a single linear classifier for 91 classes.
- Loss and optimisation: Focal Loss addressed class imbalance, and AdamW with cosine annealing learning rate scheduling improved convergence.
- Training: 40 epochs on GPU hardware with checkpointing and best-model selection based on validation accuracy.
The resulting network reached 52.35 % top-1 test accuracy, improving from about 4 % at the first epoch and exceeding earlier course baselines achieved with similar computational budgets.
Gemini-Powered Food Profiles
Predicted food labels from the CNN are compiled into a text prompt that Gemini expands into a short food-preference profile that highlights likely cuisines, flavour patterns, and sample dishes.
This hybrid workflow illustrates how discriminative vision models and large language models can collaborate. The CNN grounds the system in visual evidence, while the LLM translates predictions into human-readable insights suitable for consumer-facing recommendation or nutrition tools.