✅ Part 1: Matplotlib (Step-by-Step)
📌 Step 1: Install and Import
pip install matplotlib
import matplotlib.pyplot as plt
📌 Step 2: Line Chart
x = [1, 2, 3, 4]
y = [10, 20, 30, 25]
plt.plot(x, y)
plt.title("Line Chart Example")
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.show()
📈 Used For: Show progress over time (e.g. sales growth)
📌 Step 3: Bar Chart
subjects = ['Math', 'Science', 'English']
marks = [85, 90, 78]
plt.bar(subjects, marks)
plt.title("Student Marks")
plt.xlabel("Subjects")
plt.ylabel("Marks")
plt.show()
📊 Used For: Compare items like subject-wise marks or sales.
📌 Step 4: Pie Chart
fruits = ['Apple', 'Banana', 'Orange']
quantities = [40, 35, 25]
plt.pie(quantities, labels=fruits, autopct='%1.1f%%')
plt.title("Fruit Distribution")
plt.show()
🥧 Used For: Show percentage distribution (e.g. market share)
📌 Step 5: Histogram
ages = [18, 22, 22, 25, 26, 28, 28, 30, 35, 35]
plt.hist(ages, bins=5)
plt.title("Age Group Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
📚 Used For: See how data is spread out (e.g. age of people)
📌 Step 6: Scatter Plot
hours = [1, 2, 3, 4, 5]
marks = [40, 50, 65, 75, 85]
plt.scatter(hours, marks)
plt.title("Study Time vs Marks")
plt.xlabel("Hours Studied")
plt.ylabel("Marks")
plt.show()
📌 Used For: Relationship between two things (e.g. effort vs result)
✅ Part 2: Data Cleaning with Pandas (Step-by-Step)
📌 Step 1: Import Pandas
import pandas as pd
📌 Step 2: Check Missing Values
df = pd.read_csv("students.csv")
print(df.isnull()) # Shows True/False
print(df.isnull().sum()) # Shows total missing per column
📌 Step 3: Drop Missing Rows
df_clean = df.dropna()
🗑️ Removes rows with any missing value.
📌 Step 4: Fill Missing Values
df.fillna(0, inplace=True) # Fill missing with 0
df['Marks'].fillna(df['Marks'].mean(), inplace=True) # Fill with average
📌 Step 5: Remove Duplicate Rows
df = df.drop_duplicates()
🧹 Removes repeated rows in data.
📌 Step 6: Change Data Type
df['Age'] = df['Age'].astype(int)
🧠 Convert from float or string to int.
📌 Step 7: Rename Columns
df.rename(columns={'Full Name': 'Name'}, inplace=True)
📌 Step 8: Clean Strings
df['Name'] = df['Name'].str.strip().str.title()
✍️ Clean unwanted spaces and format properly.
✅ Part 3: Excel Integration with Pandas (Step-by-Step)
📌 Step 1: Install Required Library
pip install openpyxl
(openpyxl is needed for Excel support)
📌 Step 2: Read Excel File
df = pd.read_excel("students.xlsx")
📥 Load Excel file into Pandas.
📌 Step 3: Read Specific Sheet
df = pd.read_excel("students.xlsx", sheet_name='Marks')
📄 Only read one sheet by name.
📌 Step 4: Write to Excel
df.to_excel("output.xlsx", index=False)
📤 Save DataFrame to Excel file.
📌 Step 5: Save Multiple Sheets
with pd.ExcelWriter("multi_sheet.xlsx") as writer:
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
📚 Save multiple reports in one Excel file.
✅ Real-Life Use Cases
| Feature | Real-Life Use |
|---|---|
| Line Chart | Daily/Monthly Sales Growth |
| Bar Chart | Compare performance |
| Pie Chart | Show percentage of expenses |
| Excel Reading | Read business reports or logs |
| Data Cleaning | Fix incomplete or wrong entries |
✅ Interview Questions & Answers (Matplotlib + Pandas Data Cleaning + Excel Integration)
+ Hands-on Practice Tasks for Students
🎯 Section 1: Interview Questions & Answers
🔹 1. What is Matplotlib?
Answer:
Matplotlib is a Python library used to create visualizations like line charts, bar graphs, pie charts, histograms, and scatter plots.
🔹 2. How do you create a bar chart in Matplotlib?
Answer:
You use the bar() function:
import matplotlib.pyplot as plt
plt.bar(['A', 'B'], [10, 20])
plt.show()
🔹 3. What is the use of plt.show()?
Answer:plt.show() displays the graph or plot in a new window.
🔹 4. What is the difference between plot() and scatter()?
Answer:
plot()is used for line charts (connected data).scatter()is for individual data points (used to find patterns).
🔹 5. What is Pandas?
Answer:
Pandas is a Python library used to store and analyze data in table-like formats using DataFrames.
🔹 6. How do you handle missing data in Pandas?
Answer:
You can:
- Use
dropna()to remove missing rows. - Use
fillna()to fill missing values.
🔹 7. How do you find missing values in a DataFrame?
Answer:
Use:
df.isnull()
df.isnull().sum()
🔹 8. How to remove duplicate values in a dataset?
Answer:
Use df.drop_duplicates().
🔹 9. How do you read and write Excel files in Pandas?
Answer:
- Read:
pd.read_excel("file.xlsx") - Write:
df.to_excel("output.xlsx", index=False)
🔹 10. What is the use of ExcelWriter in Pandas?
Answer:
It allows saving multiple DataFrames into one Excel file with multiple sheets.
🧪 Section 2: Hands-on Examples (Student Practice)
✅ 1. Create a Line Chart for Weekly Sales
import matplotlib.pyplot as plt
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
sales = [100, 120, 90, 150, 130]
plt.plot(days, sales)
plt.title("Weekly Sales")
plt.xlabel("Days")
plt.ylabel("Sales")
plt.show()
✅ 2. Clean Student Data (CSV)
CSV Example:
Name,Age,Marks
John,21,85
Sara,,78
,20,90
Anna,22,
John,21,85
Python Code:
import pandas as pd
df = pd.read_csv("students.csv")
# Step 1: Show missing values
print(df.isnull().sum())
# Step 2: Fill missing values
df['Name'].fillna('Unknown', inplace=True)
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Marks'].fillna(df['Marks'].mean(), inplace=True)
# Step 3: Remove duplicates
df = df.drop_duplicates()
print(df)
✅ 3. Read Excel and Show Subject-wise Marks
import pandas as pd
df = pd.read_excel("marks.xlsx", sheet_name="Sheet1")
print("Average marks per subject:")
print(df.mean())
✅ 4. Save Cleaned Data to Excel
df.to_excel("cleaned_students.xlsx", index=False)
✅ 5. Practice Task: Fruit Pie Chart
Create a pie chart for this data:
| Fruit | Quantity |
|---|---|
| Apple | 40 |
| Banana | 30 |
| Mango | 20 |
| Orange | 10 |
Code:
import matplotlib.pyplot as plt
fruits = ['Apple', 'Banana', 'Mango', 'Orange']
quantities = [40, 30, 20, 10]
plt.pie(quantities, labels=fruits, autopct='%1.1f%%')
plt.title("Fruit Sale Distribution")
plt.show()
📘 Summary for Students
| Task | Skill Learned |
|---|---|
| Line Chart | Plotting trends |
| Cleaning Missing Data | Data preprocessing |
| Reading Excel | Real-world file handling |
| Pie/Bar Chart | Data visualization |
| Remove Duplicates | Data integrity |
