Building a final year project on the topic machine learning base classification of Emphysema from chest X-ray images #196088

OJ-tech542 · 2026-05-16T22:05:03Z

OJ-tech542
May 16, 2026

🏷️ Discussion Type

Bug

Body

I would like to be held by hand as a beginner on a step by step to building a project by myself on the topic machine learning base classification of Emphysema from chest X-ray images. the project is to develop a machine learning base classification of emphysema from chest X-ray images, train a machine learning model for emphysema detection.

Guidelines

I have read and understood this category's guidelines before making this post.

2026-05-16T22:05:39Z

github-actions[bot]
Bot May 16, 2026

💬 Your Product Feedback Has Been Submitted 🎉

Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users.

Here's what you can expect moving forward ⏩

Your input will be carefully reviewed and cataloged by members of our product teams.
- Due to the high volume of submissions, we may not always be able to provide individual responses.
- Rest assured, your feedback will help chart our course for product improvements.
Other users may engage with your post, sharing their own perspectives or experiences.
GitHub staff may reach out for further clarification or insight.
- We may 'Answer' your discussion if there is a current solution, workaround, or roadmap/changelog post related to the feedback.

Where to look to see what's shipping 👀

Read the Changelog for real-time updates on the latest GitHub features, enhancements, and calls for feedback.
Explore our Product Roadmap, which details upcoming major releases and initiatives.

What you can do in the meantime 💻

Upvote and comment on other user feedback Discussions that resonate with you.
Add more information at any point! Useful details include: use cases, relevant labels, desired outcomes, and any accompanying screenshots.

As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities.

Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐

0 replies

Khanz9664 · 2026-05-17T17:00:41Z

Khanz9664
May 17, 2026

Hi @OJ-tech542 ,

This is a great choice for a final year project. Medical image classification can feel overwhelming at first, but it gets much easier once you break the pipeline down. I actually worked on respiratory disease classification using deep learning, and the biggest piece of advice I can give you starting out is to skip building a custom CNN from scratch and leverage transfer learning right away.

For chest X-rays, look into using DenseNet121 or VGG16 as your backbone architecture, initializing them with weights pretrained on ImageNet. DenseNet architectures are highly effective for medical imaging because their feature-reuse mechanism is excellent at capturing the subtle, diffuse structural variations typical of emphysema. You can pull these models directly from torchvision.models or tf.keras.applications.

For your data, I'd highly recommend starting with a clean benchmark like the NIH ChestX-ray14 dataset or looking up curated lung disease sets on Kaggle. To keep things manageable as a beginner, filter the dataset down to a clean binary classification task: Normal vs. Emphysema.

When setting up your data pipeline, keep a few critical engineering steps in mind:

Preprocessing: Resize your X-rays to a standard dimension like $224 \times 224$ pixels and normalize the pixel intensity distribution based on your dataset's mean and standard deviation.
Data Leakage: This is the most common trap in medical ML. If your dataset contains multiple X-rays from the same patient, ensure all images from a single patient stay strictly within the same split (either entirely in train, validation, or test). Splitting by image rather than by patient ID will artificially inflate your accuracy and ruin your testing validity.
Class Imbalance: Emphysema cases will likely be heavily outnumbered by normal scans. Standard accuracy won't mean much here. You'll want to track Sensitivity (Recall) to ensure you aren't missing true positives, alongside your AUC-ROC score and a Confusion Matrix.

Lastly, since this is a final year project, review panels care immensely about model reliability. Once your binary classifier is training well, I strongly suggest implementing Grad-CAM. It generates a visual heatmap over the original X-ray, allowing you to audit whether the model is actually focusing on the lung fields or just picking up on edge artifacts, text labels, or scanner noise.

I'd suggest setting up a notebook in Google Colab to take advantage of the free GPU, and focus your first week entirely on getting your images cleanly loaded into a PyTorch DataLoader or TensorFlow Dataset.

Let me know when you get the data pipeline running, or if you run into any formatting bottlenecks early on. Good luck with the project!

0 replies

zippynx · 2026-05-20T15:22:46Z

zippynx
May 20, 2026

Hi @OJ-tech542,

Good choice of topic medical image classification is a solid final year project, and it’s very doable if you break it down properly.

Since you’re a beginner, I’d strongly suggest not trying to over-engineer anything at the start. Just focus on getting a working baseline first, then improve step by step.

Here’s a practical way to approach it:

Start simple (don’t overthink ML dulu)

Before anything else, just make sure you’re comfortable with:

basic Python
working with notebooks (Colab is fine)
loading images + labels

For framework, either PyTorch or Keras is fine. If you’re unsure, Keras will get you moving faster.

Dataset (this is where most people mess up)

Use something public like NIH ChestX-ray14 or Kaggle datasets.

Keep it simple:

Normal
Emphysema

Don’t try multi-class yet.

Also important: if the dataset has patient IDs, make sure you split by patient, not by image. Otherwise your accuracy will look fake.

Baseline model first (don’t start from scratch)

Just use transfer learning.

Good starting points:

DenseNet121 (very common in medical imaging)
ResNet50

Freeze most layers at first, just train the classifier head. Get something working before tuning anything.

Training setup (keep it basic)

You don’t need anything fancy:

Adam optimizer
Binary Cross Entropy loss
~10–20 epochs
batch size 16 or 32

If it overfits, worry about augmentation later.

Evaluation (important for defense)

Don’t just show accuracy that’s not enough for medical stuff.

At minimum:

confusion matrix
recall (very important here)
precision / F1-score
ROC-AUC if you can

Recall matters more because missing a sick patient is worse than a false alarm.

If you have time: Grad-CAM

This is a nice “bonus” that examiners usually like.

It helps show:

“the model is actually looking at lung regions, not random artifacts”

Not mandatory, but it makes your project look more serious.

Simple demo app (optional but recommended)

If you want extra marks, wrap it in something simple:

Streamlit is easiest

Flow:
upload X-ray → prediction → probability → (optional) heatmap

Reality check

The hardest part of this project is usually not the model it’s:

dataset cleaning
avoiding data leakage
interpreting results properly

A simple model that is correct is way better than a complex one you can’t explain.

0 replies

secureml-au · 2026-05-20T15:35:56Z

secureml-au
May 20, 2026

@OJ-tech542 Both answers above nailed the technical side. Since this is final year, here's what examiners actually dock marks for:

1. The "why this matters" section: Emphysema is underdiagnosed on X-rays. Your model isn't replacing radiologists - it's a triage tool for rural clinics. Say that.

2. Dataset license check: NIH ChestX-ray14 is public but read the usage terms. Some Kaggle datasets prohibit commercial use. Cite sources properly or you fail plagiarism checks.

3. Deploy in 10 min for demo day:

pip install streamlit grad-cam torch

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Building a final year project on the topic machine learning base classification of Emphysema from chest X-ray images #196088

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Building a final year project on the topic machine learning base classification of Emphysema from chest X-ray images #196088

Uh oh!

OJ-tech542 May 16, 2026

🏷️ Discussion Type

Body

Guidelines

Replies: 4 comments

Uh oh!

github-actions[bot] Bot May 16, 2026

Uh oh!

Uh oh!

Khanz9664 May 17, 2026

Uh oh!

zippynx May 20, 2026

Uh oh!

secureml-au May 20, 2026

OJ-tech542
May 16, 2026

github-actions[bot]
Bot May 16, 2026

Khanz9664
May 17, 2026

zippynx
May 20, 2026

secureml-au
May 20, 2026