๐ What is Pandas?
Pandas is a Python library used to work with tabular data like Excel sheets or database tables. It helps with:
- Reading & writing data
- Analyzing & cleaning data
- Performing calculations on rows and columns
โ Step 1: Installing and Importing Pandas
pip install pandas
import pandas as pd
โ
pd
is a short name (alias) we use for pandas.
โ Step 2: Creating a DataFrame (Table-like data)
data = {
'Name': ['Arun', 'Priya', 'Kumar'],
'Marks': [85, 90, 78]
}
df = pd.DataFrame(data)
print(df)
๐จ Output:
Name Marks
0 Arun 85
1 Priya 90
2 Kumar 78
๐ฏ Real-Life Use: Represent student marks or sales reports as a table.
โ Step 3: Reading Data from Files
df = pd.read_csv("students.csv")
๐ฏ Real-Life Use: Read data from Excel, CSV, or Google Sheets.
โ Step 4: Basic Info About Data
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.shape) # (rows, columns)
print(df.columns) # List of column names
print(df.info()) # Column details
โ Step 5: Selecting Columns and Rows
print(df['Name']) # Select one column
print(df[['Name', 'Marks']]) # Select multiple columns
print(df.iloc[0]) # First row (by index)
print(df.loc[1]) # Row with index label 1
๐ฏ Real-Life Use: Get details of a student by ID or name.
โ Step 6: Filtering Data (Conditional Selection)
print(df[df['Marks'] > 80])
๐จ Output:
Name Marks
0 Arun 85
1 Priya 90
๐ฏ Real-Life Use: Find students who passed or scored more than 80.
โ Step 7: Adding and Modifying Columns
df['Result'] = df['Marks'] >= 80
print(df)
๐ฏ Real-Life Use: Add “Pass/Fail” status based on marks.
โ Step 8: Sorting Data
print(df.sort_values('Marks'))
print(df.sort_values('Marks', ascending=False))
๐ฏ Real-Life Use: Rank top scorers or sort products by price.
โ Step 9: Grouping and Aggregating
group = df.groupby('Result').mean()
print(group)
๐ฏ Real-Life Use: Find average marks of passed vs failed students.
โ Step 10: Handling Missing Data
df.isnull() # Check missing values
df.dropna() # Remove rows with missing values
df.fillna(0) # Replace missing values with 0
๐ฏ Real-Life Use: Fill missing prices or names in sales data.
โ Step 11: Exporting Data to File
df.to_csv("updated_data.csv", index=False)
๐ฏ Real-Life Use: Save cleaned or updated student/sales report.
โ Step 12: Useful Pandas Functions
df.describe() # Summary (mean, std, min, max)
df['Marks'].max() # Maximum marks
df['Marks'].min() # Minimum marks
df['Marks'].mean() # Average marks
df['Marks'].sum() # Total marks
๐ฏ Real-Life Use: Get summary of any report like sales or performance.
๐ Interview Q&A (Pandas)
1. What is Pandas in Python?
Answer: A library for data manipulation and analysis using tables (DataFrames).
2. What is a DataFrame?
Answer: A 2D table with rows and columns (like Excel).
3. How do you read data in Pandas?
pd.read_csv('filename.csv')
4. How to filter rows where salary > 50000?
df[df['Salary'] > 50000]
5. How to handle missing values?
df.dropna()
df.fillna(0)
6. How to group data?
df.groupby('Department').mean()
7. Difference between loc[]
and iloc[]
?
loc[]
: Uses label (row name/index)iloc[]
: Uses position (row number)
8. How to sort data?
df.sort_values('ColumnName')
9. How to add a new column?
df['NewCol'] = value
10. Real-Life Example?
Track student marks, employee salary, monthly sales, attendance, survey data, etc.
๐ Summary
Feature | Pandas | Real-Life Use |
---|---|---|
DataFrame | Table structure | Marks, Sales, Reports |
CSV I/O | Read/Write files | Load/Save reports |
Filter rows | df[df['Age']>25] | Filter based on condition |
Grouping | groupby() | Average sales per month |
Cleaning data | fillna() | Remove missing entries |
Let me know your next step!
Leave a Reply