
Author: Saravana Kumar
-
Matplotlib, Data Cleaning with Pandas, and Excel Integration using Pandas with examples and outputs.
โ Part 1: Matplotlib (Step-by-Step)
๐ Step 1: Install and Import
pip install matplotlib
import matplotlib.pyplot as plt
๐ Step 2: Line Chart
x = [1, 2, 3, 4] y = [10, 20, 30, 25] plt.plot(x, y) plt.title("Line Chart Example") plt.xlabel("X Values") plt.ylabel("Y Values") plt.show()
๐ Used For: Show progress over time (e.g. sales growth)
๐ Step 3: Bar Chart
subjects = ['Math', 'Science', 'English'] marks = [85, 90, 78] plt.bar(subjects, marks) plt.title("Student Marks") plt.xlabel("Subjects") plt.ylabel("Marks") plt.show()
๐ Used For: Compare items like subject-wise marks or sales.
๐ Step 4: Pie Chart
fruits = ['Apple', 'Banana', 'Orange'] quantities = [40, 35, 25] plt.pie(quantities, labels=fruits, autopct='%1.1f%%') plt.title("Fruit Distribution") plt.show()
๐ฅง Used For: Show percentage distribution (e.g. market share)
๐ Step 5: Histogram
ages = [18, 22, 22, 25, 26, 28, 28, 30, 35, 35] plt.hist(ages, bins=5) plt.title("Age Group Distribution") plt.xlabel("Age") plt.ylabel("Frequency") plt.show()
๐ Used For: See how data is spread out (e.g. age of people)
๐ Step 6: Scatter Plot
hours = [1, 2, 3, 4, 5] marks = [40, 50, 65, 75, 85] plt.scatter(hours, marks) plt.title("Study Time vs Marks") plt.xlabel("Hours Studied") plt.ylabel("Marks") plt.show()
๐ Used For: Relationship between two things (e.g. effort vs result)
โ Part 2: Data Cleaning with Pandas (Step-by-Step)
๐ Step 1: Import Pandas
import pandas as pd
๐ Step 2: Check Missing Values
df = pd.read_csv("students.csv") print(df.isnull()) # Shows True/False print(df.isnull().sum()) # Shows total missing per column
๐ Step 3: Drop Missing Rows
df_clean = df.dropna()
๐๏ธ Removes rows with any missing value.
๐ Step 4: Fill Missing Values
df.fillna(0, inplace=True) # Fill missing with 0 df['Marks'].fillna(df['Marks'].mean(), inplace=True) # Fill with average
๐ Step 5: Remove Duplicate Rows
df = df.drop_duplicates()
๐งน Removes repeated rows in data.
๐ Step 6: Change Data Type
df['Age'] = df['Age'].astype(int)
๐ง Convert from float or string to int.
๐ Step 7: Rename Columns
df.rename(columns={'Full Name': 'Name'}, inplace=True)
๐ Step 8: Clean Strings
df['Name'] = df['Name'].str.strip().str.title()
โ๏ธ Clean unwanted spaces and format properly.
โ Part 3: Excel Integration with Pandas (Step-by-Step)
๐ Step 1: Install Required Library
pip install openpyxl
(
openpyxl
is needed for Excel support)
๐ Step 2: Read Excel File
df = pd.read_excel("students.xlsx")
๐ฅ Load Excel file into Pandas.
๐ Step 3: Read Specific Sheet
df = pd.read_excel("students.xlsx", sheet_name='Marks')
๐ Only read one sheet by name.
๐ Step 4: Write to Excel
df.to_excel("output.xlsx", index=False)
๐ค Save DataFrame to Excel file.
๐ Step 5: Save Multiple Sheets
with pd.ExcelWriter("multi_sheet.xlsx") as writer: df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2')
๐ Save multiple reports in one Excel file.
โ Real-Life Use Cases
Feature Real-Life Use Line Chart Daily/Monthly Sales Growth Bar Chart Compare performance Pie Chart Show percentage of expenses Excel Reading Read business reports or logs Data Cleaning Fix incomplete or wrong entries
โ Interview Questions & Answers (Matplotlib + Pandas Data Cleaning + Excel Integration)
+ Hands-on Practice Tasks for Students
๐ฏ Section 1: Interview Questions & Answers
๐น 1. What is Matplotlib?
Answer:
Matplotlib is a Python library used to create visualizations like line charts, bar graphs, pie charts, histograms, and scatter plots.
๐น 2. How do you create a bar chart in Matplotlib?
Answer:
You use thebar()
function:import matplotlib.pyplot as plt plt.bar(['A', 'B'], [10, 20]) plt.show()
๐น 3. What is the use of
plt.show()
?Answer:
plt.show()
displays the graph or plot in a new window.
๐น 4. What is the difference between
plot()
andscatter()
?Answer:
plot()
is used for line charts (connected data).scatter()
is for individual data points (used to find patterns).
๐น 5. What is Pandas?
Answer:
Pandas is a Python library used to store and analyze data in table-like formats using DataFrames.
๐น 6. How do you handle missing data in Pandas?
Answer:
You can:- Use
dropna()
to remove missing rows. - Use
fillna()
to fill missing values.
๐น 7. How do you find missing values in a DataFrame?
Answer:
Use:df.isnull() df.isnull().sum()
๐น 8. How to remove duplicate values in a dataset?
Answer:
Usedf.drop_duplicates()
.
๐น 9. How do you read and write Excel files in Pandas?
Answer:
- Read:
pd.read_excel("file.xlsx")
- Write:
df.to_excel("output.xlsx", index=False)
๐น 10. What is the use of
ExcelWriter
in Pandas?Answer:
It allows saving multiple DataFrames into one Excel file with multiple sheets.
๐งช Section 2: Hands-on Examples (Student Practice)
โ 1. Create a Line Chart for Weekly Sales
import matplotlib.pyplot as plt days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'] sales = [100, 120, 90, 150, 130] plt.plot(days, sales) plt.title("Weekly Sales") plt.xlabel("Days") plt.ylabel("Sales") plt.show()
โ 2. Clean Student Data (CSV)
CSV Example:
Name,Age,Marks John,21,85 Sara,,78 ,20,90 Anna,22, John,21,85
Python Code:
import pandas as pd df = pd.read_csv("students.csv") # Step 1: Show missing values print(df.isnull().sum()) # Step 2: Fill missing values df['Name'].fillna('Unknown', inplace=True) df['Age'].fillna(df['Age'].mean(), inplace=True) df['Marks'].fillna(df['Marks'].mean(), inplace=True) # Step 3: Remove duplicates df = df.drop_duplicates() print(df)
โ 3. Read Excel and Show Subject-wise Marks
import pandas as pd df = pd.read_excel("marks.xlsx", sheet_name="Sheet1") print("Average marks per subject:") print(df.mean())
โ 4. Save Cleaned Data to Excel
df.to_excel("cleaned_students.xlsx", index=False)
โ 5. Practice Task: Fruit Pie Chart
Create a pie chart for this data:
Fruit Quantity Apple 40 Banana 30 Mango 20 Orange 10
Code:
import matplotlib.pyplot as plt fruits = ['Apple', 'Banana', 'Mango', 'Orange'] quantities = [40, 30, 20, 10] plt.pie(quantities, labels=fruits, autopct='%1.1f%%') plt.title("Fruit Sale Distribution") plt.show()
๐ Summary for Students
Task Skill Learned Line Chart Plotting trends Cleaning Missing Data Data preprocessing Reading Excel Real-world file handling Pie/Bar Chart Data visualization Remove Duplicates Data integrity
-
Pandas Complete Guide (Simple English + Real-Life Examples)
๐ What is Pandas?
Pandas is a Python library used to work with tabular data like Excel sheets or database tables. It helps with:
- Reading & writing data
- Analyzing & cleaning data
- Performing calculations on rows and columns
โ Step 1: Installing and Importing Pandas
pip install pandas
import pandas as pd
โ
pd
is a short name (alias) we use for pandas.
โ Step 2: Creating a DataFrame (Table-like data)
data = { 'Name': ['Arun', 'Priya', 'Kumar'], 'Marks': [85, 90, 78] } df = pd.DataFrame(data) print(df)
๐จ Output:
Name Marks 0 Arun 85 1 Priya 90 2 Kumar 78
๐ฏ Real-Life Use: Represent student marks or sales reports as a table.
โ Step 3: Reading Data from Files
df = pd.read_csv("students.csv")
๐ฏ Real-Life Use: Read data from Excel, CSV, or Google Sheets.
โ Step 4: Basic Info About Data
print(df.head()) # First 5 rows print(df.tail()) # Last 5 rows print(df.shape) # (rows, columns) print(df.columns) # List of column names print(df.info()) # Column details
โ Step 5: Selecting Columns and Rows
print(df['Name']) # Select one column print(df[['Name', 'Marks']]) # Select multiple columns print(df.iloc[0]) # First row (by index) print(df.loc[1]) # Row with index label 1
๐ฏ Real-Life Use: Get details of a student by ID or name.
โ Step 6: Filtering Data (Conditional Selection)
print(df[df['Marks'] > 80])
๐จ Output:
Name Marks 0 Arun 85 1 Priya 90
๐ฏ Real-Life Use: Find students who passed or scored more than 80.
โ Step 7: Adding and Modifying Columns
df['Result'] = df['Marks'] >= 80 print(df)
๐ฏ Real-Life Use: Add “Pass/Fail” status based on marks.
โ Step 8: Sorting Data
print(df.sort_values('Marks')) print(df.sort_values('Marks', ascending=False))
๐ฏ Real-Life Use: Rank top scorers or sort products by price.
โ Step 9: Grouping and Aggregating
group = df.groupby('Result').mean() print(group)
๐ฏ Real-Life Use: Find average marks of passed vs failed students.
โ Step 10: Handling Missing Data
df.isnull() # Check missing values df.dropna() # Remove rows with missing values df.fillna(0) # Replace missing values with 0
๐ฏ Real-Life Use: Fill missing prices or names in sales data.
โ Step 11: Exporting Data to File
df.to_csv("updated_data.csv", index=False)
๐ฏ Real-Life Use: Save cleaned or updated student/sales report.
โ Step 12: Useful Pandas Functions
df.describe() # Summary (mean, std, min, max) df['Marks'].max() # Maximum marks df['Marks'].min() # Minimum marks df['Marks'].mean() # Average marks df['Marks'].sum() # Total marks
๐ฏ Real-Life Use: Get summary of any report like sales or performance.
๐ Interview Q&A (Pandas)
1. What is Pandas in Python?
Answer: A library for data manipulation and analysis using tables (DataFrames).
2. What is a DataFrame?
Answer: A 2D table with rows and columns (like Excel).
3. How do you read data in Pandas?
pd.read_csv('filename.csv')
4. How to filter rows where salary > 50000?
df[df['Salary'] > 50000]
5. How to handle missing values?
df.dropna()
df.fillna(0)
6. How to group data?
df.groupby('Department').mean()
7. Difference between
loc[]
andiloc[]
?loc[]
: Uses label (row name/index)iloc[]
: Uses position (row number)
8. How to sort data?
df.sort_values('ColumnName')
9. How to add a new column?
df['NewCol'] = value
10. Real-Life Example?
Track student marks, employee salary, monthly sales, attendance, survey data, etc.
๐ Summary
Feature Pandas Real-Life Use DataFrame Table structure Marks, Sales, Reports CSV I/O Read/Write files Load/Save reports Filter rows df[df['Age']>25]
Filter based on condition Grouping groupby()
Average sales per month Cleaning data fillna()
Remove missing entries
Let me know your next step!
-
NumPy Complete Reference Guide (with Real-Life Examples & Interview Q&A)
๐ Introduction to NumPy
What is NumPy?
NumPy (Numerical Python) is a powerful Python library used for numerical computations. It provides support for arrays, matrices, and many mathematical functions.
Why use NumPy?
- Faster than Python lists
- Supports multi-dimensional arrays
- Optimized mathematical functions
- Useful for data science, ML, and scientific computing
Installation
pip install numpy
Importing NumPy
import numpy as np
Real-Life Example:
You can analyze thousands of sales records, do calculations, and generate statistics quickly using NumPy arrays.
๐น Creating Arrays
arr1 = np.array([1, 2, 3]) # 1D array arr2 = np.array([[1, 2], [3, 4]]) # 2D array
Array Attributes
print(arr2.shape) # (2, 2) print(arr2.ndim) # 2 print(arr2.dtype) # int32 or int64
Real-Life Example:
Use 2D arrays to represent Excel-like tables (e.g., student marks, sales data).
๐น Indexing and Slicing
arr = np.array([10, 20, 30, 40]) print(arr[1:3]) # Output: [20 30] arr2 = np.array([[1, 2], [3, 4]]) print(arr2[1, 0]) # Output: 3
Real-Life Example:
Get a student’s marks from a 2D array of student results.
๐น Array Operations
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) print(a + b) # [5 7 9] print(a * b) # [4 10 18]
Broadcasting
a = np.array([1, 2, 3]) print(a + 10) # [11 12 13]
Real-Life Example:
Apply discount or tax to a list of prices using broadcasting.
๐น Array Functions
arr = np.array([1, 2, 3, 4]) print(np.sum(arr)) # 10 print(np.mean(arr)) # 2.5 print(np.max(arr)) # 4
Real-Life Example:
Calculate total or average marks of students.
๐น Reshaping and Flattening
arr = np.array([[1, 2], [3, 4]]) print(arr.reshape(4, 1)) print(arr.flatten())
Real-Life Example:
Convert a 2D image matrix into a 1D array for machine learning input.
๐น Stacking and Splitting
a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6], [7, 8]]) print(np.vstack((a, b))) print(np.hstack((a, b)))
๐น Random Numbers
np.random.seed(0) print(np.random.randint(1, 10, size=(2, 3)))
Real-Life Example:
Create random roll numbers or question orders for an online quiz.
๐น File I/O in NumPy
np.savetxt('data.csv', arr, delimiter=',') arr_loaded = np.loadtxt('data.csv', delimiter=',')
Real-Life Example:
Save and load data like marks, sales, or sensor values using CSV files.
๐ Interview Q&A with Real-Life Examples
1. What is NumPy? Why is it used?
Answer: A powerful library for numeric computations in Python. Used in data science, ML, and engineering.
Example: Analyze thousands of rows of Excel data in seconds.2. Difference between Python list and NumPy array?
Feature List NumPy Array Speed Slow Fast Memory More Less Operations No vector ops Vector ops Example: Process pixel data faster with NumPy. 3. What is broadcasting?
Answer: It allows different-shaped arrays to work together in operations.
Example: Add 10% tax to each product price:arr + 10
4. How to create arrays?
Answer: Using
np.array
,np.zeros
,np.ones
,np.arange
, etc.
Example: Initialize 0 attendance for all students.5. What is reshape and flatten?
Answer:
reshape()
changes shape,flatten()
converts to 1D.
Example: Convert a 2D image to 1D for model input.6. Mathematical operations in NumPy?
Answer: Use
+
,-
,*
,/
between arrays or scalars.
Example: Calculate final bill:price - discount
7. How to calculate statistics?
Answer: Use
np.mean
,np.median
,np.std
, etc.
Example: Find average marks of a class.8. Generating random numbers?
Answer: Use
np.random.randint
,np.random.rand
, etc.
Example: Generate random test scores or sample data.9. What is axis in NumPy?
Answer: Tells NumPy to operate along rows or columns.
Example: Sum all subjects per student:axis=1
10. File handling in NumPy?
Answer:
np.savetxt
,np.loadtxt
for CSV operations.
Example: Save survey results into a CSV file.
Sure! Here’s a step-by-step NumPy guide with more examples and outputs, explained in simple English, including real-life usage.
๐งฎ NumPy Step-by-Step with Examples and Outputs
โ Step 1: Importing NumPy
import numpy as np
โ Why? This lets us use all the NumPy functions.
โ Step 2: Creating Arrays
โค 1D Array
a = np.array([1, 2, 3]) print(a) # Output: [1 2 3]
โค 2D Array
b = np.array([[1, 2], [3, 4]]) print(b) # Output: # [[1 2] # [3 4]]
โค 3D Array
c = np.array([[[1,2], [3,4]], [[5,6], [7,8]]]) print(c)
๐ฏ Real-Life Use: Store pixel values for an image (3D – width, height, channels).
โ Step 3: Array Properties
print(b.shape) # (2, 2) print(b.ndim) # 2 print(b.size) # 4 print(b.dtype) # int64
๐ฏ Real-Life Use: Know the shape of Excel-like data before applying operations.
โ Step 4: Indexing and Slicing
arr = np.array([10, 20, 30, 40, 50]) print(arr[1:4]) # Output: [20 30 40]
โค 2D Indexing
arr2 = np.array([[1, 2], [3, 4]]) print(arr2[1, 0]) # Output: 3
๐ฏ Real-Life Use: Access student marks from rows and subjects from columns.
โ Step 5: Array Operations
a = np.array([10, 20, 30]) b = np.array([1, 2, 3]) print(a + b) # [11 22 33] print(a * b) # [10 40 90]
๐ฏ Real-Life Use: Calculate bill amount = price * quantity
โ Step 6: Broadcasting
a = np.array([1, 2, 3]) print(a + 5) # Output: [6 7 8]
๐ฏ Real-Life Use: Apply flat discount or tax to all items at once.
โ Step 7: Useful NumPy Functions
arr = np.array([10, 20, 30]) print(np.sum(arr)) # 60 print(np.mean(arr)) # 20.0 print(np.min(arr)) # 10 print(np.max(arr)) # 30 print(np.std(arr)) # 8.16...
๐ฏ Real-Life Use: Calculate total, average, and spread of exam scores.
โ Step 8: Reshape and Flatten
arr = np.array([[1, 2, 3], [4, 5, 6]]) reshaped = arr.reshape(3, 2) print(reshaped) # Output: # [[1 2] # [3 4] # [5 6]] flat = arr.flatten() print(flat) # Output: [1 2 3 4 5 6]
๐ฏ Real-Life Use: Prepare data for machine learning models (flat format).
โ Step 9: Stack and Split Arrays
a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6], [7, 8]]) print(np.hstack((a, b))) # Output: # [[1 2 5 6] # [3 4 7 8]] print(np.vstack((a, b))) # Output: # [[1 2] # [3 4] # [5 6] # [7 8]]
๐ฏ Real-Life Use: Combine tables or reports horizontally/vertically.
โ Step 10: Random Numbers
np.random.seed(0) print(np.random.randint(1, 10, (2, 3))) # Output: # [[5 6 1] # [4 4 8]]
๐ฏ Real-Life Use: Create random test data or shuffle questions.
โ Step 11: Conditional Selection
arr = np.array([10, 20, 30, 40]) print(arr[arr > 20]) # Output: [30 40]
๐ฏ Real-Life Use: Filter students who scored above 20 marks.
โ Step 12: Save & Load Files
arr = np.array([[1, 2], [3, 4]]) np.savetxt("data.csv", arr, delimiter=",") loaded = np.loadtxt("data.csv", delimiter=",") print(loaded)
๐ฏ Real-Life Use: Save and reload reports, marks, or sales data.
โ Step 13: Axis in Functions
arr = np.array([[1, 2], [3, 4]]) print(np.sum(arr, axis=0)) # Column sum: [4 6] print(np.sum(arr, axis=1)) # Row sum: [3 7]
๐ฏ Real-Life Use: Get total sales per product or per day.
โ Step 14: Special Arrays
print(np.zeros((2, 3))) # Array with all zeros print(np.ones((2, 3))) # Array with all ones print(np.eye(3)) # Identity matrix
๐ฏ Real-Life Use: Initialize tables, filters, or neural network layers.
โ Step 15: Linspace & Arange
print(np.linspace(1, 5, 5)) # [1. 2. 3. 4. 5.] print(np.arange(1, 10, 2)) # [1 3 5 7 9]
๐ฏ Real-Life Use: Generate time steps or price intervals for charts.
๐ Final Thoughts
NumPy is essential for data analysis, machine learning, and scientific work in Python. Mastering it gives you speed, accuracy, and power to work with big data.
If youโre a beginner, practice real-life examples like:
- Student marks
- Sales data
- Weather sensor logs
Keep experimenting with arrays, slicing, operations, and reshaping to become a NumPy pro!
-
Career-Based Python Library Guide
Great! Here’s a career-based list of Python libraries to help you choose what to focus on based on your career goal.
๐งญ Career-Based Python Library Guide
๐ฏ Career Path ๐ Libraries to Learn ๐ง Why Useful
๐ผ 1. Data Science / Data Analyst
๐น NumPy โ Fast numerical computations
๐น Pandas โ Table (Excel-style) data analysis
๐น Matplotlib / Seaborn โ Visualize data with graphs
๐น Scikit-learn โ Easy machine learning models
๐น Statsmodels โ Statistical tests, regressions
๐น Jupyter Notebook โ Interactive coding๐ก Why: Data cleaning, visualization, prediction
๐ค 2. Machine Learning / AI
๐น All from Data Science PLUS
๐น TensorFlow โ Deep learning by Google
๐น Keras โ Simple interface for TensorFlow
๐น PyTorch โ Deep learning by Facebook
๐น OpenCV โ Image recognition and vision
๐น NLTK / SpaCy โ Natural language (text) processing๐ก Why: AI, predictions, image & text classification
๐ 3. Web Development
๐น Flask โ Lightweight web framework
๐น Django โ Full-featured web framework
๐น Jinja2 โ HTML templates
๐น SQLAlchemy โ Database connection
๐น WTForms / Django Forms โ User input forms
๐น Requests โ Call APIs๐ก Why: Build websites, blogs, admin panels
๐งช 4. Software Testing / QA
๐น Unittest โ Built-in Python testing
๐น Pytest โ Advanced testing made easy
๐น Selenium โ Automate browser testing
๐น Behave โ BDD (like Cucumber for Python)
๐น Mock โ Test dummy data๐ก Why: Automate test cases for software quality
๐ค 5. Automation / Scripting
๐น os, shutil โ File and folder automation
๐น subprocess โ Run shell commands
๐น pyautogui โ Control mouse & keyboard
๐น schedule โ Automate time-based tasks
๐น requests / bs4 (BeautifulSoup) โ Web scraping
๐น pandas / openpyxl / csv โ Excel automation๐ก Why: Save time by writing repeatable task scripts
๐ฅ๏ธ 6. Desktop App Development
๐น Tkinter โ Built-in GUI toolkit
๐น PyQt / PySide โ Advanced GUI apps
๐น Kivy โ Multi-platform GUI & touch apps
๐น CustomTkinter โ Beautiful UI with dark mode๐ก Why: Make apps with buttons, forms, input boxes
๐งฉ 7. Game Development
๐น Pygame โ Simple 2D game creation
๐น Arcade โ Modern 2D games (better visuals)
๐น PyOpenGL โ 3D game basics
๐น Panda3D โ Full 3D engine๐ก Why: Build simple games to large 3D games
๐ 8. Cybersecurity / Hacking (Ethical)
๐น Scapy โ Network packet crafting
๐น Nmap (via Python) โ Scan devices
๐น Paramiko โ SSH and remote access
๐น Requests / BeautifulSoup โ Info gathering
๐น Socket โ Low-level networking๐ก Why: Create ethical hacking tools, scan systems
๐งฌ 9. Bioinformatics / Science
๐น BioPython โ DNA, RNA, Protein data
๐น SciPy โ Scientific math & physics
๐น NumPy / Pandas โ Data processing
๐น Matplotlib / Seaborn โ Graphs & analysis๐ก Why: Process genetic or scientific data easily
๐งพ Summary Table
Career Top Libraries Data Science NumPy, Pandas, Matplotlib, Scikit-learn AI / ML TensorFlow, Keras, PyTorch, OpenCV Web Development Flask, Django, SQLAlchemy, Requests Testing / QA Pytest, Selenium, Unittest Automation os, pyautogui, requests, bs4 Desktop Apps Tkinter, PyQt, Kivy Game Dev Pygame, Arcade, Panda3D Cybersecurity Scapy, Paramiko, Socket Science / Bio BioPython, SciPy