Matplotlib, Data Cleaning with Pandas, and Excel Integration using Pandas with examples and outputs.



โœ… Part 1: Matplotlib (Step-by-Step)

๐Ÿ“Œ Step 1: Install and Import

pip install matplotlib

import matplotlib.pyplot as plt


๐Ÿ“Œ Step 2: Line Chart

x = [1, 2, 3, 4]
y = [10, 20, 30, 25]

plt.plot(x, y)
plt.title("Line Chart Example")
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.show()

๐Ÿ“ˆ Used For: Show progress over time (e.g. sales growth)


๐Ÿ“Œ Step 3: Bar Chart

subjects = ['Math', 'Science', 'English']
marks = [85, 90, 78]

plt.bar(subjects, marks)
plt.title("Student Marks")
plt.xlabel("Subjects")
plt.ylabel("Marks")
plt.show()

๐Ÿ“Š Used For: Compare items like subject-wise marks or sales.


๐Ÿ“Œ Step 4: Pie Chart

fruits = ['Apple', 'Banana', 'Orange']
quantities = [40, 35, 25]

plt.pie(quantities, labels=fruits, autopct='%1.1f%%')
plt.title("Fruit Distribution")
plt.show()

๐Ÿฅง Used For: Show percentage distribution (e.g. market share)


๐Ÿ“Œ Step 5: Histogram

ages = [18, 22, 22, 25, 26, 28, 28, 30, 35, 35]

plt.hist(ages, bins=5)
plt.title("Age Group Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

๐Ÿ“š Used For: See how data is spread out (e.g. age of people)


๐Ÿ“Œ Step 6: Scatter Plot

hours = [1, 2, 3, 4, 5]
marks = [40, 50, 65, 75, 85]

plt.scatter(hours, marks)
plt.title("Study Time vs Marks")
plt.xlabel("Hours Studied")
plt.ylabel("Marks")
plt.show()

๐Ÿ“Œ Used For: Relationship between two things (e.g. effort vs result)


โœ… Part 2: Data Cleaning with Pandas (Step-by-Step)

๐Ÿ“Œ Step 1: Import Pandas

import pandas as pd


๐Ÿ“Œ Step 2: Check Missing Values

df = pd.read_csv("students.csv")
print(df.isnull())          # Shows True/False
print(df.isnull().sum())    # Shows total missing per column


๐Ÿ“Œ Step 3: Drop Missing Rows

df_clean = df.dropna()

๐Ÿ—‘๏ธ Removes rows with any missing value.


๐Ÿ“Œ Step 4: Fill Missing Values

df.fillna(0, inplace=True)  # Fill missing with 0
df['Marks'].fillna(df['Marks'].mean(), inplace=True)  # Fill with average


๐Ÿ“Œ Step 5: Remove Duplicate Rows

df = df.drop_duplicates()

๐Ÿงน Removes repeated rows in data.


๐Ÿ“Œ Step 6: Change Data Type

df['Age'] = df['Age'].astype(int)

๐Ÿง  Convert from float or string to int.


๐Ÿ“Œ Step 7: Rename Columns

df.rename(columns={'Full Name': 'Name'}, inplace=True)


๐Ÿ“Œ Step 8: Clean Strings

df['Name'] = df['Name'].str.strip().str.title()

โœ๏ธ Clean unwanted spaces and format properly.


โœ… Part 3: Excel Integration with Pandas (Step-by-Step)

๐Ÿ“Œ Step 1: Install Required Library

pip install openpyxl

(openpyxl is needed for Excel support)


๐Ÿ“Œ Step 2: Read Excel File

df = pd.read_excel("students.xlsx")

๐Ÿ“ฅ Load Excel file into Pandas.


๐Ÿ“Œ Step 3: Read Specific Sheet

df = pd.read_excel("students.xlsx", sheet_name='Marks')

๐Ÿ“„ Only read one sheet by name.


๐Ÿ“Œ Step 4: Write to Excel

df.to_excel("output.xlsx", index=False)

๐Ÿ“ค Save DataFrame to Excel file.


๐Ÿ“Œ Step 5: Save Multiple Sheets

with pd.ExcelWriter("multi_sheet.xlsx") as writer:
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')

๐Ÿ“š Save multiple reports in one Excel file.


โœ… Real-Life Use Cases

FeatureReal-Life Use
Line ChartDaily/Monthly Sales Growth
Bar ChartCompare performance
Pie ChartShow percentage of expenses
Excel ReadingRead business reports or logs
Data CleaningFix incomplete or wrong entries


โœ… Interview Questions & Answers (Matplotlib + Pandas Data Cleaning + Excel Integration)

+ Hands-on Practice Tasks for Students


๐ŸŽฏ Section 1: Interview Questions & Answers

๐Ÿ”น 1. What is Matplotlib?

Answer:
Matplotlib is a Python library used to create visualizations like line charts, bar graphs, pie charts, histograms, and scatter plots.


๐Ÿ”น 2. How do you create a bar chart in Matplotlib?

Answer:
You use the bar() function:

import matplotlib.pyplot as plt
plt.bar(['A', 'B'], [10, 20])
plt.show()


๐Ÿ”น 3. What is the use of plt.show()?

Answer:
plt.show() displays the graph or plot in a new window.


๐Ÿ”น 4. What is the difference between plot() and scatter()?

Answer:

  • plot() is used for line charts (connected data).
  • scatter() is for individual data points (used to find patterns).

๐Ÿ”น 5. What is Pandas?

Answer:
Pandas is a Python library used to store and analyze data in table-like formats using DataFrames.


๐Ÿ”น 6. How do you handle missing data in Pandas?

Answer:
You can:

  • Use dropna() to remove missing rows.
  • Use fillna() to fill missing values.

๐Ÿ”น 7. How do you find missing values in a DataFrame?

Answer:
Use:

df.isnull()
df.isnull().sum()


๐Ÿ”น 8. How to remove duplicate values in a dataset?

Answer:
Use df.drop_duplicates().


๐Ÿ”น 9. How do you read and write Excel files in Pandas?

Answer:

  • Read: pd.read_excel("file.xlsx")
  • Write: df.to_excel("output.xlsx", index=False)

๐Ÿ”น 10. What is the use of ExcelWriter in Pandas?

Answer:
It allows saving multiple DataFrames into one Excel file with multiple sheets.


๐Ÿงช Section 2: Hands-on Examples (Student Practice)


โœ… 1. Create a Line Chart for Weekly Sales

import matplotlib.pyplot as plt

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
sales = [100, 120, 90, 150, 130]

plt.plot(days, sales)
plt.title("Weekly Sales")
plt.xlabel("Days")
plt.ylabel("Sales")
plt.show()


โœ… 2. Clean Student Data (CSV)

CSV Example:

Name,Age,Marks
John,21,85
Sara,,78
,20,90
Anna,22,
John,21,85

Python Code:

import pandas as pd

df = pd.read_csv("students.csv")

# Step 1: Show missing values
print(df.isnull().sum())

# Step 2: Fill missing values
df['Name'].fillna('Unknown', inplace=True)
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Marks'].fillna(df['Marks'].mean(), inplace=True)

# Step 3: Remove duplicates
df = df.drop_duplicates()

print(df)


โœ… 3. Read Excel and Show Subject-wise Marks

import pandas as pd

df = pd.read_excel("marks.xlsx", sheet_name="Sheet1")

print("Average marks per subject:")
print(df.mean())


โœ… 4. Save Cleaned Data to Excel

df.to_excel("cleaned_students.xlsx", index=False)


โœ… 5. Practice Task: Fruit Pie Chart

Create a pie chart for this data:

FruitQuantity
Apple40
Banana30
Mango20
Orange10

Code:

import matplotlib.pyplot as plt

fruits = ['Apple', 'Banana', 'Mango', 'Orange']
quantities = [40, 30, 20, 10]

plt.pie(quantities, labels=fruits, autopct='%1.1f%%')
plt.title("Fruit Sale Distribution")
plt.show()


๐Ÿ“˜ Summary for Students

TaskSkill Learned
Line ChartPlotting trends
Cleaning Missing DataData preprocessing
Reading ExcelReal-world file handling
Pie/Bar ChartData visualization
Remove DuplicatesData integrity

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *