Python Final Lectures

Q- how to Print Hello World

print("Hello World")

Variables in python -------

age = 30 #variable should be intutive so that we can learn any time

print(age)

Note: Shift+Enter is shortcut to run command

2) '#' this is for writing the comment in python

Rules for Variables---

Variable can not be start with any number like - 1age
Number can use in between and end with variable like - age1 age2
Special characters are not allowed expect _ (underscore) like - age_my
Space not allowed in variable
Python is case sensitive

Way to define Variable ---

age1,age2 = 30,25

age1 = 30

age2 = 25

age1=age2=30 #if 30 age for both variable

>> Data type

the type of data is basically data type

Integer = age1 to age3 is basically integer , Integer is basically full number
lets check = type(age1) #it will give u print int
float= basically decimal values
Interest = 30.24
type(Interest) #answer is float
Message = Sequence of character is basically and type will be string ,Note: If we are using quote "" the it will string
Message="My Name Is Divyanshu"
type(Message) #print will str #we can use any quote 'I can use this' , "also use this" but whenever we've multiline string then will use '''triple quote'''
Boolean = 2 values are available here True and False
Like = data = False #here data is basically variable and false is data type
type(data) #give u type of data, print bool
bool

>> Mathematical Operator

Addition = +
num1 = 20
num2 = 37
result = num1+num2
print(result) #/another way direct: print(num1+num2), without storing data on new var
Substraction = -
num1 = 50
num2 = 20
result = num1-num2
print(result) #/another way direct: print(num1-num2), without storing data on new var
Multiplication = *
num1 = 20
num2 = 39
result = num1 * num2
print(result) #/another way direct: print(num1*num2), without storing data on new var
Integer Division = // #this is basically integer division
num1 = 20
num2 = 3
result = num1 // num2
print(result) #/another way direct: print(num1//num2), without storing data on new var
answer = 6, bcse it will not give u float value
Float Division = / #this is basically float division
num1 = 20
num2 = 3
result = num1 / num2
print(result) #/another way direct: print(num1/num2), without storing data on new var
answer = 6.66666666667 , bcse it is float division
Power = **
num1 = 2
num2 = 5
result = num1 ** num2
print(result)
Modulus = % #it will give u remainder
num1 = 20
num2 = 3
result = num1 % num2
print(result)
answer = 2 , it is remainder

>> How to take input from user

age1 = input() #by default it will string type data

to give age1 type so we've to typecast here
age1 = int(input())

>>Build In Functions in Python ----

len(string) = basically this is for find length of the character
string = "Divyanshu Khare" #space is also count as string
len(string) #it will give u length of the string
ls = collection of item (list)
Defined by [ ] square bracket
Example: list_name = [item1, item2, item3, ...]
type[ls] #it will give u type of variable
max = find maximum number of list
ls[1,2,4,5,6]
max(ls)
print(max)
min = minimum number of list
ls[1,2,3,4,5,6,7]
min(ls)
sum = sum of numbers
ls[1,2,3,4,6,7,8]
sum(ls)
len(ls) = give u lenth of list
len(ls)
max(string) = it will give u maximum ASCII value's character
string = ("Divyanshu")
max(string)
min(string) = it will give u minimum ASCII value's character
string = ("Divyanshu")
min(string)
Note: ASCII is american standard value of number's in computer
sorted(ls) = it will sort list in accending order
sorted(ls,reverse = True) = it will sort list in decending orders
round() = it will round off
round(number,what place u want to round)
example: round(12356.54645,2)
12356.55 #it will give u this as a answer
abs() = it will give u any number as a absolute(positive Number)
abs(-23538)
f = format string
example:
name = "divyanshu"
age = 30
profession = "Data Science"
introduction = f"{name} is {age} year old professional working as {profession}"
print(introduction)

>> Conditional Statement -------

if else
cibil_score = int(input("Enter Cibil Score:"))
if(cibil_score>600):
print("u re eligible for loan")
else:
print("u re not eligible")
elif = when we have more than 2 conditions
color = input("Enter color - Red, Green, Yellow")
if color == "Red":
print("Stop")
elif color == "Yellow":
print("Wait")
else:
print("Go")
loops (control structure) -- repeatation of the task
exmple: string = "Data Science"
for i in string:
print(i)
example: for i in range (0,101):
print(i)
example to print square: for i in range(0,100,2) :
print(i)
example :
ls = [1,2,3,4,5,6,7,8]
for i in ls:
print(i)
while = it will run if condition is True/ Tab tk chlega jab tk condition true hai
i = 1
while i < 10:
print("Divyanshu khare")
i = i +1
Control the loop
>> Break: it will stop all iteration once requirement finished (Stop Iteration)
ls = [1,2,3,4,5,6,7,8,9,10]
for i in ls:
if i == 6:
print("Yes")
break
>>Continue : it will stop that particular iteration only and will jump on another iteration
ls = [1,2,3,4,5,6,7,8,9,10]
for i in ls:
if i == 6:
print("Yes")
continue
>>Pass : if will do nothing, it will not break anything
ls = [1,2,3,4,5,6,7,8]
for i in ls:
if i > 0:
pass #do nothing
else:
print("Negative Number")

>>> DATA STRUCTURE IN PYTHON [this is not algorithm]

>> 4 Data structures we have

List
tuples
set
dictionary

>> Important Data Type

String operations

Indexing = process of fetching character from the collection

Example:
string = "String"
string[2]

Slicing = process of fetching sequence of character / process of fetching sub-string from the given string

Example:
string = "Data Science" #lets say I want to fetch data from this string
string[start index:end index + 1] #this is index
string[0:3 + 1]
Example 2 : If i want to slicing from right to left
string = "Data Science"
string[-11: -9]
Example 3: 👉 "If I want to slice by skipping 1 character."
string = "Data Science"
string[5:12:2]
Example 4: I want to reverse print and I dont want to put first index
string = "Data Science"
string[::-1] #If I don’t enter the first index, it will start from the 0th index

In-built function for string
string = "Data"
type(string)

len(string) = length of the string
convert string into lowercase =

string = "DIVYANSHU"
string.lower()

convert string into uppercase =

string = "divyanshu"
string.upper()

convert string into capitalize

string = "divyanshu"
new_string = string.capitalize()
print(new_string)
#Ye Python ka built-in method hai jo string ke first letter ko capital (uppercase) me badal deta hai #capitalize() se jo naya result (modified string) milta hai, wo new_string me store ho jata hai.

lstring.islower() -- it will check whether string is lowercase or uppercase

lstring = "DivyanshuJi" #lstring is basically variable
lstring.islower()

string.isupper() --- it will check whether string is uppercase or lowercase

ustring = "this is small"
ustring.isupper()

string.isdigit() --- it will check whether string is digit or not ?

numstring = "435345" #this is check string is number or not
numstring.isdigit() #numstring is basically variable name of string

string.swapcase() -- it will swap the case of given string

string = "this is small case"
string.swapcase()

string.replace("word to change", "word with change")

string = "Data science"
string.replace("a", "d") #main data sience me 'a' word ko 'd' se change kr rha

string.split() == it will split string, ("jaha se split krna hai, by default space se krta hai")

string = "Divyanshu@khare"
string.split("@")

>>>>>>>>>>>LIST - basically collection of data

ls = [1,2,34,5,6,7,"mango"]
type(ls) #to check the data type
len(ls) == #check length of list
ls[3] =#it will give u value of this index from length
ls [::-1] = #reverse list
list concatination =
ls1 = ["apple", "mango"]
ls2 = ["another"]
ls3 = ls1+ls2
print(ls3)
ls.append("apple") -= #add single element on list from end of the list
ls.extend(["grapes", "gyan"]) = #add multiple element on the list from end
ls.insert(index number,"element") = #it will add on specific index
ls.index("grapes") = it will give u positive index of the element
ls.remove("element name") = #it will remove first occurrence element wise remove, we cant remove directly all element
ls.sort() = #sort list in accending orders
ls.sort(reverse=True) = #sort list in descending order, it will change existing list
sorted(ls) #it will not change existing list
ls.pop(index number) = #it will remove element from index

>>TUPLES -------------------

ek baar ban जाने ke baad uske elements badle nahi ja sakte
Tuple is Immutable
Tuple ek data structure hai — jaise list, lekin immutable hoti hai
tuple me wo inform store krte hai jaha ham chahte hai koi bhi program intensionally ya unintensonally change na kar sake
tuple () is bracket se define krte hai
example: tup = (3,4,6,7,8544,76,34,5)
type(tup)
max(tup) = #give u maximum value of tuple
mix(tup) = #give u minimum value of tuple
sum(tup) =#give u sum of tuple
sorted(tup)
list >> tuple >> list == typecasting list to tuple and tuple to list
tup1 = tuple(ls)
tup1
tup.sort()
del = #i want to remove element from list index wise
ls = [2,3,5,6,7,4]
del ls[3] #it will not return anything

4th lecture

file handling

f = open("data.txt","r") #open is a function of python it will create and open file, "r" is used to read only, if file is in same folder will use this , basically i am trying to read data.txt file here

#now I want to access content of the file >>>
>> f.read() #it will help to read or access the file

is option se file hamesa open rhega

>> another way to read

with open ("data.txt") as f: #file open hone ke bad close ho jayega

content = f.read()

print(content)

>>read text file from google drive

# URL of the Google Sheet (public link to access the sheet data)

url = 'https://docs.google.com/spreadsheets/d/1CHNr3sioM1p6OvVx4tjNc0sMWGWp76sIBYpQdJ9H40U/edit?usp=drive_link'

# 'requests' एक Python library है जो internet (HTTP/HTTPS) से data fetch करने के लिए use होती है

import requests # Importing the requests module to send HTTP requests

# 'requests.get(url)' server ko GET request bhejta hai aur response me page ka data return karta hai

response = requests.get(url) # Sending a GET request to the given URL and storing the response in 'response'

# 'response.text' me server se mila data (HTML format me) hota hai

print(response.text) # Printing the content/text returned by the server

response.status_code # To check if the link is accessible or if there's any permission issue if 200 comes then we have access for the same

>> handle excel file

from openpyxl import Workbook # openpyxl library se Workbook class import kar rahe hain (Excel file banane ke liye)

wb = Workbook() # ek nayi (blank) Excel workbook create ho gayi (abhi memory me)

ws_new = wb.active # workbook ki default active sheet ko access kar rahe hain& ws_new me active sheet ko store kr rhe

ws_new.title = 'DataStudent' # sheet ka naam 'Sheet' se badalkar 'DataStudent' kar diya

ws_new.append(["Student name", "Grades"]) # pehli row me column headings likh rahe hain

NOTE: .append() Excel sheet में एक नई row add करने के लिए use होती है।

ws_new.append(["Baba Hunny", 70]) # dusri row me pehla student ka data add kar rahe hain

ws_new.append(["Anshika", 50]) # teesri row me data

ws_new.append(["Janvi", 50]) # chauthi row me data

wb.save("student_data.xlsx") # workbook ko system me 'student_data.xlsx' naam se save kar rahe hain

print("✅ Excel file 'student_data.xlsx' created successfully with sheet 'DataStudent'")

>> how to access the save file/read save file

from openpyxl import load_workbook #Excel file read karne ke liye openpyxl ka function import kar rahe hain

wb = load_workbook("student_data.xlsx") # Pehle se existing Excel file ko open kar rahe hain

ws_new = wb["DataStudent"] # 'DataStudent' naam wali sheet ko access kar rahe hain

for row in ws_new.iter_rows(min_row =1, values_only=True): #Ab sheet ke data ko read ya modify kar sakte ho, #min_row =1, = bcse frist row se utha rhe data

print(row)

>> CSV handle (comma seperate value)

imort csv

with oepn("titanic.csv", mode ="r" ) as f:

reader = csv.reader(f)

headers = next(reader) #to read first row

print(headers)

for i, row in (reader): # i is the index

print(row)

import csv → CSV फाइल को पढ़ने के लिए Python का built-in मॉड्यूल है।

with open("titanic.csv", mode="r") as f: → titanic.csv फाइल को read mode में खोल रहे हैं।

csv.reader(f) → फाइल की हर लाइन को एक लिस्ट की तरह पढ़ेगा।

next(reader) → पहली row (header) को निकाल देगा ताकि वो दोबारा लूप में ना आए।

enumerate(reader) → हर पंक्ति के साथ उसका index (i) देगा।

print(i, row) → हर रिकॉर्ड को उसकी क्रम संख्या (index) के साथ प्रिंट करेगा।

>> Execption handling
Try- Except Block -

try:

result = 10/0
print("This line will not be executed bcoz of error")
except ZeroDivisionError:
print("you cant divide by zero")

>> can we make more than 1 except for error handling ?
Ans: yes

Example:

try:

user_input1 = input("Enter Number")

int_input1 = int(user_input1)

user_input2 = input("Enter Number")

int_input2 = int(user_input2)

print(f"ratio of given numbers {int_input1/int_input2}")

except ZeroDivisionError: #Zero Divion ki jgh kuch or v likh skte hai ??

print("u cant divide by zero ")

except ValueError:

print("Enter the valid number")

>> #Try-Execpt-else Block

try:

user_input1 = input("Enter Number")

int_input1 = int(user_input1)

user_input2 = input("Enter Number")

int_input2 = int(user_input2)

num = int_input1/int_input2

print(f"ratio of given numbers {int_input1/int_input2}")

except ZeroDivisionError: #Zero Divion ki jgh kuch or v likh skte hai ??

print("u cant divide by zero ")

except ValueError:

print("Enter the valid number")

else: #this will only execute if no execption raised

print(f"the ratio of 2 number is {num}")

>> #Try-Except-else-finally block === in any senario chahe upr kuch bhi aaye

try:

user_input1 = input("Enter Number")

int_input1 = int(user_input1)

user_input2 = input("Enter Number")

int_input2 = int(user_input2)

num = int_input1/int_input2

print(f"ratio of given numbers {int_input1/int_input2}")

except ZeroDivisionError: #Zero Divion ki jgh kuch or v likh skte hai ??

print("u cant divide by zero ")

except ValueError:

print("Enter the valid number")

else: #this will only execute if no execption raised

print(f"the ratio of 2 number is {num}")

finally: #in any senario chahe upr kuch bhi aaye, it will executive in any senario

print("i will run")

>>write a try-except block to handle filenotfound error

try:

with open("AI.txt",r") as f:

content=f.read()

print(content)

except FileNotFoundError:

print("file not available in this path")

>>RAISE == jab exception customized raise krna hai, jo system raise nhi krega main apne requirement ke according raise krna chahta hu


def withdraw (balance, amount):  
    if balance-amount < 1000:
        raise Exception("Withdrawin denied: Minimum balance of 1000 INR to be maintaned")
    else:
        remaining = balance - amount
    return balance-amount
try:
    remaining_amount=withdraw(balance=1000, amount=2000)
    print(f"After transaction remaining balance is {remaining_amount}")
except Exception as e:
    print(f"Traction failed: {e}")
    

return ka matlab


Python me return ka kaam hai: function ke andar se koi value bahar bhejna


Matlab function calculate karke result wapas main program me deta hai
def = function define krta hai / define keyword hai
withraw = function ka naam, apne according 
(Balance, amount) = is argument
Return: Function ye value bahar bhejta hai → try block me remaining_amount variable me store hoti hai
Try blockk 
    >>withdraw(balance=5000, amount=2000)



Ye function call hai.
Matlab hum withdraw function ko execute kar rahe hain aur usme:
5000 :balance 
2000 : amount
remaining_amount
function ka return value remaining_amount variable me store ho rha hai
Mtlb withdraw ke baad bacha hua paisa ab is variable me hai
>>Built-in Modules 
#math
#random
#datetime
#os
#sys

>> import math #lets say i want to get square root
math.sqrt(34)   #yaha 34 ka square nikal rhe

>>import random #module has function to generate random data
random.random() #generate random decimal number (0-1 range tak)

random.randint(1,100)

>>from datetime import datetime  #it will give u date time
datetime.now()

>>import os #files check krne ke liye
os.getcwd() #file ka path check krne ke liye


>




numpy =numerical python , numpy ke strct numeric import krke use kr skte hai

import numpy as np  = #np is a shortname /np ki jgh kuch v de skte hai, np is a alias ise hamne isliye likha taaki in future jab bhi hame numpy likhna ho to pura likhne ke bajaye np likh ke hi upyog kr paye 

numpy jis Data sturcture pe based hai >>>>>>>>>>>

>>Array ---------
import numpy as np
arr =np.array([2,3,4,5,6,]) #list input me le rha, arr is basically variable name 
type(arr)
arr.ndim #check dimension of array, other way to check dimension , last bracket jitna hai utni dimension
>> 2 / Multi Dimension Array ------
arr2 = np.array([[1,2,3], [5,46,67]])
arr2.ndim  #arr2 is array's name and ndim will help to check dimension
  here u can see 2 brackets in this image after print 



>> arr2.shape  == #it will help us to check rows and columns of arrays
arr2 = np.array([[1,2,3], [5,46,67]])
arr2.ndim
arr2.shape

>> arr2.size ==== it will give u no of elements of array in rows only
arr2 = np.array([[1,2,3,5], [5,46,67,5]])
arr2.ndim
arr2.shape
arr2.size #no of elements

>> arr2.dtype
>>zeros_array = np.zeros((row_number, column_number))
    import numpy as np 
    zeros_array = np.zeros((10,4))  #it will give u float 0 according to row and column
    print(zeros_array)

>>ones_array = np.ones((row_number, column_number))
    import numpy as np 
    ones_array = np.ones((6,4))  #it will give u float number 1 according to row and column
    print(ones_array)
>>full_array = np.full((row_number, colum_number, fill_value = value_number)
    here full_array is variable name
    np.full = function name it will update full row number like 3, column number like 4
    fill_value = Ye parameter batata hai ki array ke sabhi elements me kya bhara jaye
Example: 
full_array = np.full((6,4), fill_value = 23)
print(full_array)


>>np.random.rand(dimension_number)
example: 
r_array=np.random.rand(5)
 


np.random.rand =  Ye NumPy ke random module ka function hai.
Ye function 0 aur 1 ke beech random numbers generate krta hai.
(dimension_number) Ye batata hai ki kitne random numbers chahiye.

>> np.round(r_array,number_round_in)
example:
r_array=np.random.rand(5)
np.round(r_array,2)

>>np.arange(start,end)
example:
arr = np.arange(1,11)
print(arr)

#yah NumPy का function है जो 1 से लेकर 10 तक की संख्याएँ generate करेगा। bcse end number 11 will not count
 

>> Indexing for single dimension array
import numpy as np
arr = np.array([1,3,4,8,5,])
arr[2]  #2 is basically index of array

>> Slicing for double dimension array
import numpy as np
arr = np.array([[1,2,3], [5,46,67]])
arr[1:2] #1 = start , 2 is end #this is for single dimension

Example:   this is for multidimentional array
#slicing in multidimentional array, lets suppose I want to slice 2 3 5 6
#arr2[start_row:end_row+1,start_column:end_column+1]
arr2 = np.array([[1,2,3,4], [4,5,6,9], [7,8,9,10]])
arr2[0:2,1:3]

>> Iteration ----------- just like literation in loop
for i in arr2:
    print(i);

>> Joining  -- if I've more than 1 array than how I can merge
#lets suppose ek hospital me 2 ward hai general and ICU jiske data ko jodna hai
general = np.array([[98,43,5345], [42,45,32]])
icu = np.array([[98,92,73], [89,42,52]])
np.concatenate((general,icu),axis = 1) #side by side (row wise merge)
#axis=1 → जोड़ना horizontally (side-by-side / row-wise)  


2nd Example: merge column wise
general = np.array([[1,2,3], [5,6,7]])
np.concatenate((general,icu), axis = 0)

>> SPLITING
import numpy as np
arr = np.arange(1,11)
arr #run arr
np.array_split(arr,3) #3 is basically number of split in how many part we want to split, it is function to split array, it will split array into equal part


>> ANOTHER WAY TO SPLIT = IT WILL WORK ONLY IF ARRAY HAS NUMBER TO DIVIDE EQUAL
arr2 = np.arange(1,11)
arr2
np.split(arr2,2)  #it will work when equal divisible possible only 

>>ARRAY SORTING
import numpy as np
arr = np.array([1,2,46,7,86])
np.sort(arr)  it will sort array, by default axis will be 0  #row_wise sorting

#coulmn_wise sorting 
import numpy as np
arr = np.array([[1,2,46,7,86], [32,4324,543,4324,4324]]) 
np.sort(arr, axis = 1)
arr

>> Searching = 
import numpy as np
arr3 = np.array([1,2,46,7,86]) 
np.where(arr3>40) #stands for conditional search, it will give u index of that value which is greater than 40


>> np.nonzero(arr) = i will return index of that value whereable u've non zero , basically it will not give u 0 number's index
import numpy as np
arr = np.arange(1,11)
np.nonzero(arr)


>>Filteration = how to filter the data
import numpy as np
arr = np.array([13,5,64,10,78,10])
arr[arr>7]  #it will return element which is greater than 7


>>Mathematical Operations in Numpy

x=np.array([[2,4], [6,10]])
y = np.array([[12,23], [34,8]])
x+y  #======= it will add x +y 









>> x//y = integer division
x=np.array([[2,4], [6,10]])
y = np.array([[12,23], [34,8]])
x//y  #======= it will divide x by y  

>> np.divide(x,y) = float division
import numpy as np
x=np.array([[2,4], [6,10]])
y = np.array([[12,23], [34,8]])
np.divide(x,y)  #=it will give u float division










>> np.multiply(x,y)  #it will multiply


>>matrix = rows multiply by column #condition ye hai = number of column of first array should be equal to the number of second array
arr1 = np.array([[2,4],[1,3]])  
arr2 = np.array([[3,6],[7,3]]) 
np.matmul(x,y) # matmul is function for matrix multiplication, (2×3) + (4×7) = 6 + 28 = 34

>>reshape array = 
#array can be reshape if size before and after reshaping are same
Example: lets suppose I have 14 dimension array, I want to make two dimension array (2,7) but (2,8) isme ham nhi kar skte 
arr = np.arange(1,15)
print(arr)
reshape_array= arr.reshape(2,7) # this is function and shape is 2 and 7, reshape_array this is variable where I'm storing reshaped array
Why?  Two change dimension --- one dimension to 2 dimension 

>>another way to reshape
reshape_array= arr.reshape(2,-1) #basically -1 is only a placeholder it will calculate automatically #automatic dimension calculator 

>>another way to reshape = from 3 dimension to 1 dimension
reshape_array = arr.reshape(-1)  #पूरे array को एक single dimension (1D array) में convert कर दो। , -1 is basically placeholder and arguments


>>Mathematical Operations
1)temp = np.array([23,32,35,54,54])
print(temp)
np.mean(temp) #calculate avg 

2) np.min(temp) #find minimum
(3)np.std(temp) #standard deviation means array के values औसत (mean) से कितना दूर या फैले हुए हैं।

4)np.percentile(temp,40)  #to find percentage, in terms of count, median wali value hi aati h
5)np.sum(temp) #to sum
6) np.median(temp) #to find like beech ka number
7)np.prod(temp) #product of all elements product (गुणा)
8)np.cumsum(temp) #cumlative sum  











9)np.cumprod(temp) = Cumulative Product,
यानि हर element तक का गुणा step-by-step दिखाना।


-------------------PANDAS------------------------------
Pandas एक Python library है जो हमें data को store, clean, analyze और manipulate (बदलने) में मदद करती है।
 आसान शब्दों में:


जैसे Excel में हम rows और columns में data रखते हैं,

उसी तरह Python में Pandas हमें data को Excel की तरह handle करने की सुविधा देता है।
        CSV file read krne me help krti hai
        pandas = pannel data

>> Pandas  ================ indexing IN pandas
import pandas as pd #pd is short name of pandas u can use according to u

import pandas as pd
pd.Series([23,43,54,65]), index = ["Mon","Tue","Wed", "Thu"] #it is basically one column in my pandas, #here u can put index as per my requirement,#here I've updated tue for index 2

#another way for create series
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s["Mon"] #s is basically series name and Mon is index jiski value nikal rhe
Note: This is basically example for indexing

>>> SLICING IN SERIES
s[1:3] #main 1st index  se 2nd tak ja rha +1 rhta hai isliye 2 ke bajaye 3 likha hai 

>>Filtering in Series
import pandas as pd
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s[s>45] #here it will give u value which is greater than 45 
>> SHAPE IN PANDAS
s.shape #it will give u shape of pandas

>>INDEX OF PANDAS
s.index #it will give u index values of series 

>>MATHEMATICAL OPERATIONS  IN PANDAS
import pandas as pd
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s*2  #multiply


s+2   #adding 
s/2  #divide




>>Operations based on 2 region senario
region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})
total = region_a+region_b  #we are adding region_a value's with region_b values
total

diff = region_a - region_b #difference between region
diff

multi = region_a * region_b #multiplication here
multi

Note: In a series, the position is not important; the addition will be performed according to the index, and values like “Jan to Jan” will be added together even if I change the sequence.


>>>> other mathematicals operations based on region
region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})
region_a.max() #it iwll give u maximum value 
region_a.min() #it will give u minimum value 
region_a.mean()  #it will give u 
region_a.sum() #it will give u sum value for region
region_a.prod() #calculates the cumulative sum of a Series

>>another functions for pandas
1) apply == i want to assign some value
ex: 
region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
def sales_category(sales):      
    if sales > 30:
        return "High Value"
    elif sales < 50:
        return "Moderate"
    else:
        return "High"
region_a.apply(sales_category)

Notes based on this: def → This keyword is used to define a function in Python.

sales_category → This is the name of the function (you can choose any valid name).

(sales) → This is the parameter (a placeholder for the value that will be passed to the function when it is called).


2)map = map() → “replace or transform” each value of a Series according to rules you give.
ex: 
    dept_codes = pd.Series(["HR", "Eng", "Sal", "FIN"]) 

    dept_names = {"HR": "Human Resources"
                 "Eng": Engineering,
                 "Sal": "Science",
                 "FIN": "Finance"}
dept_codes.map(dept_names)

Explanation:

pd.Series([...]) → creates a Pandas Series (a one-dimensional labeled data array).
dept_names → is a dictionary mapping department codes to their full names.
map() → replaces each value in the Series (dept_codes) with the corresponding value from the dictionary (dept_names).
Note: > order does not matter here
      > (0,1,2,3) → Index number (position) 



A data scientist wants to extract only the months where customer churn rate exceeded 8%. The correct approach is asume churn is a pandas series.
churn[churn < 8]
churn.where(churn > 8) #this is correct answer
churn.mask(churn > 8)
churn.clip(uppoer = 8)
Ans: churn = pd.Series([10,8,4,212,14], index = ["Jan", "Feb", "Mar", "Apr" "May","jun"])
churn.where(churn > 8)

Explain: 
 1) Why use .where()?
Because where() keeps only those values which satisfy the condition, and replaces others with NaN.

2) Churn ka matlab hota hai —


kisi company ke customers ka chhod kar chale jaana ya service cancel kar dena.

-----------------DATA FRAME ---------------------------------

DataFrame Pandas library ka ek 2D (two-dimensional) data structure hota hai —

jaise ek Excel sheet ya table, jisme rows aur columns hote hain.Soch lo jaise:


Rows = records / entries


Columns = fields / variables
eXAMPLE:
data
pd.DataFrame([[)

Example: 
import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita"],
    "Age": [25,27,29],
    "City": ["Delhi", "Rganj", "Patna"]
}

df = pd.DataFrame(data)
df
df.to_csv("dat.csv", index = True)
🔹 Step 1: import pandas as pd


Ye line Pandas library ko import karti hai.


pandas ek Python library hai jo data ko table (rows & columns) ke form me handle karne ke liye use hoti hai.


as pd ka matlab — jab bhi hum “pandas” ka function use karein, hum usko shortcut naam “pd” se likh sakte hain.



🔹 Step 2: data = {...}
Yaha humne ek dictionary banayi hai jisme 3 keys hain:
Dictionary: 
 data ek dictionary hai 🧠
Python me dictionary ek data structure hoti hai jo key-value pairs me data store karti hai.
Matlab:


"Name" → list of names


"Age" → list of ages


"City" → list of cities


Iska structure kuch aisa hai:
Name : ["divyanshu", "Neha", "Ankita"]
Age  : [25, 27, 29]
City : [Delhi, Rganj, Delhi]

⚠️ Note: Aapke code me "Delhi" aur "Rganj" ke aas-paas quotes nahi lage hain —

Unhe "Delhi" aur "Rganj" likhna chahiye, warna Python error dega (kyunki wo variable samjhega).

🔹 Step 3: df = pd.DataFrame(data)


Ye line dictionary ko ek DataFrame me convert karti hai.


DataFrame basically ek Excel sheet jaisa table hota hai jisme rows aur columns hote hain.


Result kuch aisa dikhega 👇
Name Age City
0 divyanshu 25 Delhi
1 Neha 27 Rganj
2 Ankita 29 Delhi

🔹 Summary:
Line Kya karta hai
import pandas as pd Pandas library ko import karta hai
data = {...} Dictionary me data store karta hai
pd.DataFrame(data) Dictionary ko table (DataFrame) me badalta hai
df Final DataFrame object hai jisme data rows-columns me hota hai

4️⃣ df.to_csv("dat.csv", index=True)
👉 Ye line df (DataFrame) ko ek CSV file me save kar rahi hai.


"dat.csv" → file ka naam hai (ye tumhare system ke folder me ban jayegi,anaconda ke folder me)


index=True → iska matlab hai row numbers (0,1,2...) bhi file me save karna.


🔸 Agar tum index=False likhte ho, to row numbers CSV me nahi aate.

🔹 CSV file kya hoti hai?
CSV (Comma-Separated Values) ek simple text file hoti hai jisme data comma se alag hota hai

>>> how to Save data frame (df) into excel -------------------

import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita"],
    "Age": [25,27,29],
    "City": ["Delhi", "Rganj", "Patna"]
}

df = pd.DataFrame(data)
df
df.to_excel("file_name.xlsx", index = False)
5️⃣ to_excel() function kya karta hai

👉 Ye function DataFrame ko Excel file (.xlsx) format me save karta hai.

Matlab tumhara data ab Excel sheet ke form me likha jayega.
>>> How to read file created using data frame using pandas ??

df = pd.read.csv("file_name")
example:
import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita"],
    "Age": [25,27,29],
    "City": ["Delhi", "Rganj", "Patna"]
}

df = pd.DataFrame(data)
df.to_csv("NewCsv.csv", index = False)
df = pd.read_csv("NewCsv.csv")
>>>df.head == by default it shows 5 rows, how mamy rows u want to see

Explaination: head() function DataFrame ke top rows (upar ke records) dikhata hai.



Default: agar tum likhte ho df.head() → to pehle 5 rows show karta hai.


df.head(1) → sirf pehli row (first record) show karega.








>>df.tail(1) == how to see data from bottom 

import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita"],
    "Age": [25,27,29],
    "City": ["Delhi", "Rganj", "Patna"]
}

df = pd.DataFrame(data)
df.to_csv("NewCsv.csv", index = False)
df = pd.read_csv("NewCsv.csv")
df
df.head(1)
df.tail(1) #1 is basically number from bottom

🔹 Meaning:
tail() function DataFrame ke last rows (niche ke records) dikhata hai.


Default: agar tum likhte ho df.tail() → to last 5 rows show karega.


df.tail(1) → sirf last 1 row (aakhri record) show karega.
>> df.info() == entire meta data
🔹 Meaning:
info() function DataFrame ke structure aur basic details batata hai —

jaise columns ke naam, unke data types, aur har column me kitne non-null (filled) values hain.

>>df.describe() == it will give u complete statistical summary 
describe() function DataFrame ke numerical columns ka statistical summary deta hai.

Ye automatically numbers wale columns (like marks, age, salary etc.) ke liye

important measures calculate karta hai jaise:



count → kitne values hain


mean → average value


std → standard deviation (data kitna spread hai)


min → smallest value


25%, 50%, 75% → percentiles (quartiles)


max → largest value

>>df.shape  ==== how many rows and columns we have in dataFrame

>>how to index one column in dataFrame ?
df["City"] #series 
Explain: तो ये DataFrame की सिर्फ एक column (City) को निकालता है,

और इसका output होता है Pandas Series.

>> df.loc[0:1] = fetch data row wise

0:1  -----------

→ यह एक slice है (range selection) जिसका मतलब है —

row index 0 से लेकर 1 तक की rows निकालो (inclusive).
---------------------
.loc[] का इस्तेमाल label-based indexing के लिए होता है।

यानी तुम row labels (index) के आधार पर rows निकाल सकते हो।

df.set_index

>>df.set_index("City", inplace = True) = if Inplace = true, existing df itself will get updated rather than a new df
इस line का मतलब है कि "City" कॉलम को index बना दो — यानी अब DataFrame में हर row की पहचान City के नाम से होगी, न कि 0, 1, 2 से।



>>> df.icoc[0:1] = #position based indexing

import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
    "Age": [25,27,29,48],
    "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
}

df = pd.DataFrame(data)
print(df.iloc[0:2])
यह Pandas को बताता है कि 0 से शुरू होकर 2 से पहले तक की rows दिखाओ।
मतलब —

यह केवल पहली दो rows दिखाएगा 
>>>Modifying data opeations
df["Age"] = df["Age"]/100  
यह Pandas DataFrame में "Age" column की हर value को 100 से divide कर देता है।
📘 Important Concepts:


df["Age"] → यह Age column को select करता है (Series form में)


/100 → यह हर value को 100 से divide करता है


df["Age"] = ... → यह modified values वापस Age column में assign कर देता है

>>df.rename("column" ={"COLUMN_NAME":"CHANGED_COLUMN NAME"},inplace = True) = rename column name, rename header of column 

import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
    "Age": [25,27,29,48],
    "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
}

df = pd.DataFrame(data)
print(df.iloc[0:2])
df["Age"] = df["Age"]/100
df.rename(columns ={"City":"Place"},inplace = True)
df


Explain: 
columns = {...}
→ यह dictionary है जिसमें तुम पुराने column नाम और नए column नाम define कर रहे हो।
यहाँ "City" को "Place" से replace किया जा रहा है।

inplace = True
→ इसका मतलब है कि ये बदलाव सीधे original DataFrame (df) में लागू हो जाएगा।
यानी नया DataFrame बनाने की जरूरत नहीं पड़ेगी।



>>df.drop("Age",axis = 1, inplace = True) = it will drop the column
Ex: 
import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
    "Age": [25,27,29,48],
    "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
}

df = pd.DataFrame(data)
print(df.iloc[0:2]) #index wise view
df["Age"] = df["Age"]/100 #this is devide entire column
df.drop("Age",axis = 1, inplace = True)
df
📘 Explanation:


drop() → किसी row या column को DataFrame से हटाने के लिए use होता है।


"Age" → यह बताता है कौन-सा column हटाना है।


axis = 1 →


axis = 0 → rows के लिए होता है


axis = 1 → columns के लिए होता है

इसलिए यहाँ column हटाया जा रहा है।




inplace = True →


इसका मतलब: change सीधे original DataFrame में लागू हो जाएगा।


अगर ये False रहता तो हटाने का effect temporary होता।




















>> df["New_Column_name"] = [1,2,3,4..... values] = create new column
ex: 
import pandas as pd
data = {
    "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
    "Age": [25,27,29,48],
    "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
}

df = pd.DataFrame(data)
df["Age"] = df["Age"]/100 #this is devide entire column
df.drop("Age",axis = 1, inplace = True)
df
df["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 value
df


Explaination:
df["RR"]
→ इसका मतलब है: DataFrame df में "RR" नाम से एक नया column बनाना।
(अगर "RR" पहले से है तो उसकी values update हो जाएँगी।)

= ["Yes","No","Yes","No"]
→ यह values उस column की हर row में assign की जा रही हैं।
मतलब —

पहली row में “Yes”  

दूसरी में “No”

तीसरी में “Yes”

चौथी में “No”



>>> Filtering Data
df.loc["Delhi", "City"] = 30


 Explanation:


df.loc[ ] → Pandas का label-based selector है।

इसका मतलब है कि हम किसी row और column को label (नाम) से access करते हैं, न कि index number से।


"Delhi" → ये row label है।

यानी तुम उस row को target कर रहे हो जिसका index "Delhi" है।

⚠️ इसका मतलब यह है कि तुम्हारे DataFrame में “City” column index बना हुआ होना चाहिए,
>>> How to filter Data greater than,Pandas query() Function — Conditional Filtering Example

df["age"] > 25
df[df["Age"] > 25]
Explanation:
🔹 1️⃣ df["Age"] > 25 

यह line कोई data नहीं निकालती,

बल्कि एक Boolean Series (True/False values) बनाती है।
मतलब Pandas हर row की "Age" को check करता है कि

क्या वो 25 से बड़ी है या नहीं।
🔹 2️⃣ df[df["Age"] > 25]
यह line ऊपर वाली Boolean Series का इस्तेमाल करके

True वाली rows को filter करती है।


मतलब — “Age 25 से बड़ी वाली rows दिखाओ।


>> Another way to filter --
df[(df["Age"] > 25) & (df["Stake"] < 1)]

>>Pandas query() Function — Conditional Filtering Example
import pandas as pd
data = {
    "Name": ["Sumair","Neha", "Dinesh", "Junaid"],
    "Age": [2235,27,29,48],
    "City": ["Delhi", "Chainpur", "Pune", "Saharanpur"]
}

df = pd.DataFrame(data)
#
df["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 value, we can create new column
df
df["Stake"] = [45,454,124,756]
df.loc["Delhi", "Name"] = 30 
df
df["Age"] > 125
df[df["Age"] > 125]
df.query("Age > 25 and Stake < 100")

🔹 1️⃣ .query() क्या करता है?

.query() एक filtering method है जो तुम्हें SQL-style condition लिखने देता है —

यानि "Age > 25 and Stake < 100" जैसी string के अंदर directly condition दे सकते हो।
ये वही काम करता है जैसा ये code: df[(df["Age"] > 25) & (df["Stake"] < 100)]
🔹 2️⃣ "Age > 25 and Stake < 100"
यहाँ दो conditions हैं:


Age > 25 → मतलब सिर्फ वो rows जिनकी Age 25 से ज़्यादा है


Stake < 1 → और साथ में Stake column की value 100 से कम हो


and का मतलब दोनों conditions True होनी चाहिए।
🔹 3️⃣ Output

यह query सिर्फ उन्हीं rows को return करेगी जहाँ

Age 25 से बड़ी और Stake 100 से कम है।

>>df.to_clipboard() — Copy DataFrame to Clipboard
Explanation:

यह function पूरे DataFrame (df) को clipboard में copy कर देता है।

मतलब — आप इस data को Ctrl + V दबाकर सीधे Excel, Google Sheets या Notepad में paste कर सकते हैं।
>> df.to_hdf("File_name.h5", key='My_data') — Save DataFrame in HDF5 Format
🧠 Explanation:


यह function Pandas DataFrame को HDF5 file format (Hierarchical Data Format) में save (store) करने के लिए use किया जाता है।

यह format बड़ी मात्रा में data को compressed (संपीड़ित) और efficient तरीके से store करने में मदद करता है —

विशेष रूप से जब data बहुत बड़ा हो (जैसे millions of rows)।
🔸 key parameter HDF5 file में DataFrame को unique name देने के लिए जरूरी होता है। basically table name on that file

🔸 बिना key दिए .to_hdf() काम नहीं करता (error देता है)।

🔸 एक ही .h5 file में multiple DataFrames अलग-अलग keys से store किए जा सकते हैं।
🔸 HDF file is basically stored in binary format so u can not directly access this file


>>
📘 Pandas: Reading HDF5 File using pd.read_hdf()

df_hdf = pd.read_hdf("file_name.h5", key='My_data')
Definition:
pd.read_hdf() Pandas ka ek function hai jo HDF5 format file ko read (load) karke DataFrame ke form me return karta hai।

यह function .to_hdf() से save की गई file को दोबारा memory में लाने/Reading के काम आता है।

🔹read_hdf()

यह Pandas का function है जो HDF5 file format की file को read (load) करने के लिए use होता है।

यह file खोलकर उसमें से data निकालता है और उसे DataFrame के रूप में वापस देता है।

🔹"file_name.h5"

यह उस file का नाम है जिसे आप read करना चाहते हैं।

.h5 या .hdf5 extension बताता है कि file का format HDF5 है।
🔹key='My_data'

यह HDF5 file के अंदर stored DataFrame का unique name या label है।

क्योंकि एक HDF5 file में multiple DataFrames store किए जा सकते हैं, हर एक के लिए अलग key होती है।
>> Filtering Rows Based on Multiple Values using isin() Method
🔍 एक से ज़्यादा Values के आधार पर Rows को Filter करना — isin() Method
import pandas as pd

data = {
    "Employee_ID": [101, 102, 103, 104, 105],
    "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"],
    "Department": ["IT", "HR", "Sales", "Finance", "IT"],
    "Age": [25, 29, 32, 28, 26],
    "Experience_Years": [2, 5, 7, 3, 4],
    "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]
}
df = pd.DataFrame(data)
df
df[df["Department"].isin(["Sales", "IT"])]



Step-by-Step Explanation:
1️⃣ df — DataFrame


df वो variable है जिसमें पूरा dataset (table) store है।


Example के लिए, मान लो हमारा DataFrame ऐसा है:
2️⃣ df["Department"]


यह सिर्फ "Department" column को select करता है।


Output होगा:
3️⃣ isin(["Sales", "IT"])


यह check करता है कि "Department" column की values "Sales" या "IT" में हैं या नहीं।


Output देगा एक Boolean Series (True/False values):
4️⃣ df[df["Department"].isin(["Sales", "IT"])]


अब ये Boolean Series पूरे df पर apply होती है।


सिर्फ वही rows दिखाई जाएँगी जहाँ value True है।


मतलब केवल "Sales" और "IT" departments के employees दिखेंगे
✅ Final Output:
➡️ केवल "Sales" और "IT" departments की rows filter हो जाएँगी।
📘 Pandas: Excluding Rows Using ~isin() Function
📘 Pandas में ~isin() Function से कुछ Rows को Exclude करना (हटाना)

import pandas as pd

data = {
    "Employee_ID": [101, 102, 103, 104, 105],
    "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"],
    "Department": ["IT", "HR", "Sales", "Finance", "IT"],
    "Age": [25, 29, 32, 28, 26],
    "Experience_Years": [2, 5, 7, 3, 4],
    "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]
}
df = pd.DataFrame(data)
df
df[~df["Department"].isin(["Sales", "IT"])]

🧠 Step-by-Step Explanation:
1️⃣ df
यह आपका DataFrame है — जिसमें सभी employees का data है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary
101 Divyanshu Sales 25 2 40000
102 Anshika HR 27 3 45000
103 Neha IT 28 4 50000
104 Junaid Finance 30 5 55000

2️⃣ df["Department"].isin(["Sales", "IT"])
यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।

Output एक Boolean Series होती है

🧠 Step-by-Step Explanation:
1️⃣ df
यह आपका DataFrame है — जिसमें सभी employees का data है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary
101 Divyanshu Sales 25 2 40000
102 Anshika HR 27 3 45000
103 Neha IT 28 4 50000
104 Junaid Finance 30 5 55000

2️⃣ df["Department"].isin(["Sales", "IT"])
यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।

Output एक Boolean Series होती है 👇
0     True
1    False
2     True
3    False
Name: Department, dtype: bool


3️⃣ ~ (Tilde Operator)


यह एक NOT operator है (मतलब उल्टा कर देता है)।


True → False और False → True बन जाता है।


इसलिए अब output बनेगा 👇
0    False
1     True
2    False
3     True
Name: Department, dtype: bool


4️⃣ df[~df["Department"].isin(["Sales", "IT"])]
अब DataFrame में केवल वो rows बचेंगी जहाँ condition False थी पहले —

यानि अब वो employees जिनका department “Sales” और “IT” नहीं है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary
102 Anshika HR 27 3 45000
104 Junaid Finance 30 5 55000


✅ Final Output:
➡️ यह code उन सभी rows को दिखाएगा जो "Sales" और "IT" department में नहीं हैं।
 Short Summary Table:
Symbol / Function Meaning
isin() Checks if value is present in list
~ Reverses the condition (True → False)
df[...] Filters DataFrame based on condition




Question: How do we sort data Python ?
Question: Data Handling In Python ?
Question: Data Cleaning In Python ?
Question: data Handling Missing Value in Python ?
Question: Handling Duplicates in python ?


📘 Pandas – Reading a CSV File and Previewing Data (Notes)
➡️ Code:
import pandas as pd
df = pd.read_csv("day.csv")
df.head(2)

📝 Notes – Line-by-Line Explanation

1️⃣ import pandas as pd
English
This line imports the Pandas library and gives it a short name pd, so we don’t have to type pandas again and again.
Hindi
Ye line Pandas library ko import karti hai aur uska short name pd rakhti hai, jisse baar-baar pura pandas likhna na pade.


Iska matlab:

Ab hum Pandas functions pd. lagakar use kar sakte hain.

Example: pd.read_csv(), pd.DataFrame() etc.
2️⃣ df = pd.read_csv("day.csv")
English
This reads the CSV file named day.csv and loads it into a DataFrame called df.
Hindi




Ye day.csv file ko read karke Pandas DataFrame me convert karti hai jiska naam df rakha gaya hai.
Important Points


read_csv() → function to read CSV files


"day.csv" → filename


df → variable storing the table-like data
3️⃣ df.head(2)
English
This displays the first 2 rows of the DataFrame.

It helps quickly preview the data and check whether it loaded correctly.
Hindi
Ye DataFrame ki pehli 2 rows screen par show karta hai.

Isse hum file sahi load hui ya nahi, ye quickly check kar sakte hain.
General Rule



df.head() → shows first 5 rows (default)   



df.head(2) → shows first 2 rows


df.head(10) → shows first 10 rows
🧾 Pandas – df["season"].count() (Short Notes in Hindi)
✔️ Code
df["season"].count()

🧠 इस कोड में क्या हो रहा है?
df["season"] → DataFrame से season column को select करता है
.count() → उस column में कितनी entries (rows) मौजूद हैं, उनकी गिनती करता है
👉 यानी यह बताता है कि season column में total कितनी values लिखी गई हैं
और यह missing (NaN) values को count नहीं करता।


📌 Example
अगर season column में values हों:

[1, 2, NaN, 3, 2]

तो .count() output देगा:

4

क्योंकि NaN को count नहीं किया जाता।

🧾 Pandas – df["season"].nunique() (Short Notes in Hindi)
✔️ Code
df["season"].nunique()
🧠 इस कोड में क्या हो रहा है?


df["season"] → DataFrame से season column select करता है


.nunique() → उस column में कितने unique (अलग-अलग) values हैं, उनकी संख्या बताता है


👉 मतलब यह check करता है कि season column में कितने अलग-अलग season आए हैं।
📌 Example
अगर season column में values हों:
[1, 2, 2, 3, 3, 3, 4]
तो output होगा:




4
क्योंकि unique values हैं → 1, 2, 3, 4
🛑 Important:
.nunique() केवल unique values count करता है
यह NaN values को count नहीं करता
.nunique(dropna=True) by default set होता है

📍 कब काम आता है?
Category या class की अलग-अलग values जानने के लिए
Grouping से पहले
Data understanding और EDA में

🧾 Pandas – df["season"].unique() (Short Notes in Hindi)
✔️ Code
df["season"].unique()
🧠 इस कोड में क्या हो रहा है?

df["season"]

DataFrame से season column select करता है।
.unique()

उस column में मौजूद सभी unique (अलग-अलग) values को array की form में return करता है।


👉 यानी यह बताता है कि season column में कौन-कौन सी अलग values हैं।

📌 Example
अगर season column में values हों:
[1, 2, 2, 3, 3, 4]

तो output होगा:
array([1, 2, 3, 4])

:
🧾 Pandas – df["season"].value_counts() (Short Notes in Hindi)

✔️ Code

df["season"].value_counts()


🧠 इस कोड में क्या होता है?

df["season"]

DataFrame से season column select करता है।
.value_counts()

उस column में हर unique value कितनी बार आई है, उसका count देता है।

👉 यानी यह बताता है कि column में कौन-सी value कितनी बार repeat हुई है।


📌 Example

मान लो season column में values हैं:

[1, 2, 2, 3, 3, 3, 4]

तब output होगा:
3    3   ← value 3 तीन बार
2    2   ← value 2 दो बार
1    1   ← value 1 एक बार
4    1   ← value 4 एक बार

🛑 Important Points
Output value → count format में आता है।
By default, highest count सबसे ऊपर दिखता है (sorted descending)।
Missing/NaN values को भी count कर सकता है अगर specify करें।

⭐ One-line Summary
value_counts() column में हर unique value कितनी बार आई है, उसकी frequency बताता है।

🧾 Pandas – sub_df = df[["season", "temp", "hum"]].sample(10) (Notes in Hindi)
✔️ Code

sub_df = df[["season", "temp", "hum"]].sample(10)
🧠 इस लाइन में क्या हो रहा है?
1️⃣ df[["season", "temp", "hum"]]
DataFrame df से सिर्फ तीन columns select किए जा रहे हैं:

season
temp
hum
मतलब पूरा DataFrame नहीं, सिर्फ इन तीन कॉलम का छोटा DataFrame लिया गया।
2️⃣ .sample(10)
चुने गए DataFrame से random 10 rows उठाई जा रही हैं।
हर बार कोड चलाने पर अलग-अलग random rows आ सकती हैं।
3️⃣ sub_df = ...
Result को एक नए variable sub_df में store कर दिया गया।
अब sub_df एक छोटा random sample DataFrame है, जिसमें सिर्फ:
10 rows
3 columns
📌 क्यों useful है?
बड़ी dataset से जल्दी-से sample data देखने के लिए।
Analysis में random testing करने के लिए।
Machine Learning में data splitting के लिए।
🧾 Pandas – sub_df.sort_values(by="temp") (Notes in Hindi)
✔️ Code
sub_df.sort_values(by="temp")
🧠 इस लाइन में क्या हो रहा है?
sort_values() Pandas का method है जो data को sort (क्रमबद्ध) करता है।
by="temp" का मतलब है:
👉 DataFrame को temp column की values के आधार पर sort करो।
Default में sorting ascending order (छोटे से बड़े) में होती है।
📌 Output में क्या मिलेगा?
sub_df की rows temperature value के हिसाब से नीचे-ऊपर हो जाएँगी, जैसे:
season temp hum
2 0.15   45
1 0.18   60
3 0.25      55
⭐ अगर descending करना हो तो:
sub_df.sort_values(by="temp", ascending=False)

🧾 Pandas – Sorting Using Multiple Columns (sort_values)
✔️ Code
sub_df.sort_values(by = ["season", "temp"])
📌 क्या हो रहा है?
sort_values() DataFrame को sort करने के लिए उपयोग होता है।
by = ["season", "temp"] का मतलब है कि sorting दो columns के आधार पर होगी:
पहले season column
फिर उसी season के अंदर temp column
यानी पहले सभी rows season के हिसाब से sort होंगी, फिर हर season के अंदर temperature को sort किया जाएगा।
🔎 Final Summary (Short Notes)
sort_values([...]) → multiple columns पर sorting
सबसे पहले पहला column sort होता है
फिर उसी group के अंदर दूसरा column sort होता है
यह multi-level sorting कहलाती है।

📌 Topic: Pandas में Display Option – display.max_columns
▶ कोड
pd.get_option("display.max_columns")

🧠 क्या सीख रहे हैं?
यह कमांड हमें बताती है कि Pandas DataFrame को प्रिंट करते समय एक बार में कितने कॉलम दिखाई देंगे।
📝 मतलब
अगर इसका आउटपुट 20 आया, तो इसका मतलब है कि DataFrame प्रिंट होते समय pandas ज़्यादा से ज़्यादा 20 कॉलम दिखाएगा।
अगर कॉलम इससे ज़्यादा हुए, तो pandas बीच में ... दिखा देगा।
⚙ पूरा सब कुछ दिखाना चाहते हों
pd.set_option("display.max_columns", None)
अब pandas सभी कॉलम बिना छुपाए दिखाएगा।

📌 Topic: Pandas Display Settings – display.max_columns Set करना
▶ कोड
pd.set_option("display.max_columns", 50)
1️⃣ pd.set_option क्या है?
यह Pandas में settings बदलने के लिए उपयोग होने वाला function है।
हम Pandas को बताते हैं कि output में चीज़ें कैसे display होंगी।
2️⃣ "display.max_columns" क्या करता है?
Pandas default रूप से सिर्फ कुछ columns दिखाता है।
अगर DataFrame में बहुत ज़्यादा columns हों तो वह बीच में ... दिखा देता है।
"display.max_columns" से हम तय करते हैं कि output में maximum कितने columns दिखाए जाएँ।

3️⃣ 50 का मतलब
यहाँ 50 का मतलब है:

→ Pandas अब output में 50 columns तक बिना छुपाए दिखाएगा।

4️⃣ क्यों इस्तेमाल करते हैं?
बड़ी datasets में कई बार important columns छुप जाते हैं।
इस command से सारे columns clearly दिखाई देते हैं → analysis आसान हो जाता है।

✔ Final Result
अब Jupyter Notebook या कहीं भी DataFrame print करने पर 50 कॉलम तक किसी भी जगह ... नहीं आएगा।

📌 Topic – Pandas Display Option Reset (pd.reset_option)
▶ Code
pd.reset_option("display.max_columns")

1️⃣ pd.reset_option क्या है?
Pandas में settings बदलने के बाद,

अगर हमें किसी option को default (original) value पर वापस लाना हो,

तो pd.reset_option का उपयोग किया जाता है।
यह उस option को Pandas के factory default setting पर restore कर देता है।

2️⃣ "display.max_columns" क्या था?
यह option Pandas को बताता है कि

DataFrame output में maximum कितने columns दिखाई दें।
हमने पहले इसे सेट किया था:
pd.set_option("display.max_columns", 50)


3️⃣ अब यह code क्या कर रहा है?
pd.reset_option("display.max_columns")
यह command "display.max_columns" को उसकी original default value पर वापस ले आता है।
मतलब:
फिर से Pandas default limit के हिसाब से column दिखाएगा
ज़्यादा columns होने पर Pandas फिर से ... दिखा सकता है

4️⃣ कब उपयोग होता है?
जब:
Testing खत्म कर ली हो
Data visualization normal view में देखना हो
Custom settings को undo करना हो
Question: How to Handle Missing Data  -- Handling the Missing data


📘 Pandas + NumPy: DataFrame Creating with Missing Values (np.nan)
✔ Code

import numpy as np
import pandas as pd

data = {
    "Patient_ID": [101, 102, 103, 104, 105],
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118],
    "Temperature": [98.4, 99.1, 100.0, 98.7, 99.4],
    "Oxygen_Saturation": [97, 95, 93, 96, 98]
}

df = pd.DataFrame(data)
df


🧠 Line-by-Line Explanation (Hindi)


✔ import numpy as np
NumPy लाइब्रेरी को import किया गया है।
इसका उपयोग np.nan डालने के लिए किया जाता है।

np.nan का मतलब missing / blank value होता है।

✔ import pandas as pd
Pandas लाइब्रेरी import की गई है।
Pandas DataFrame बनाने, editing और data analysis के लिए उपयोग होती है।


✔ data = { ... }
यह एक Python dictionary है जिसमें अस्पताल (hospital) के मरीजों का डेटा है।
इसमें 5 columns बनाए गए हैं:
Patient_ID
Heart_Rate
Blood_Pressure
Temperature
Oxygen_Saturation


✔ np.nan क्यों?
Heart_Rate और Blood_Pressure में जानबूझकर missing values डाली गई हैं।
Real-life datasets में अक्सर missing data होता है, इसलिए इसे handle करना सीखना जरूरी है।


✔ df = pd.DataFrame(data)
Dictionary को Pandas DataFrame में convert किया गया है।
DataFrame एक table-like structure होता है जिसमें rows और columns होते हैं।
अब इस df पर हम operations कर सकते हैं।

✔ df
Jupyter Notebook में सिर्फ df लिखने से पूरा DataFrame output में display हो जाएगा।

✔ np.nan क्या होता है?
np.nan का मतलब होता है:
Not a Number (Missing Value)
यानी dataset में ऐसा place जहाँ data उपलब्ध नहीं है।

✔ क्यों इस्तेमाल किया जाता है?
Real-world datasets में अक्सर ऐसा होता है कि:

किसी patient की जानकारी record नहीं हुई
किसी survey में कुछ questions खाली छोड़ दिए
किसी sensor ने data भेजा ही नहीं
ऐसे डेटा को represent करने के लिए हम np.nan का उपयोग करते हैं।

✔ np.nan कहाँ से आता है?
import numpy as np

NumPy library को import करने के बाद ही हम np.nan उपयोग कर सकते हैं।

✔ Import करने के बाद हम ऐसे लिखते हैं:
np.nan


✔ Example
"Heart_Rate": [72, 85, np.nan, 90, 76]

यहाँ तीसरे patient की Heart Rate missing है, इसलिए उसकी जगह np.nan रखा गया।

✔ Important Properties
🔹 np.nan किसी भी number के बराबर नहीं होता
np.nan == np.nan   # False
🔹 Missing values को find करने के लिए:
df.isna()
🔹 Missing values को हटाने के लिए:
df.dropna()
🔹 Missing values भरने के लिए:
df.fillna(value)


⭐ Revision Points
np.nan का मतलब missing / blank value
यह NumPy से आता है
Real datasets में missing data represent करने के लिए इस्तेमाल होता है
Equality में यह किसी number के equal नहीं होता

📘 Pandas: df.isna() – Missing Values Check
(Hindi Notes for Revision)

✔ df.isna() क्या करता है?
df.isna() DataFrame में मौजूद missing values (np.nan) को check करता है और हर cell के लिए:

True → अगर value missing है
False → अगर value मौजूद है
return करता है।

✔ Syntax
df.isna()


✔ Example
मान लो हमारा DataFrame ऐसा है:
import numpy as np
import pandas as pd

data = {
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118]
}

df = pd.DataFrame(data)

अब check करें:
df.isna()


✔ Output (Explanation)
Column Value Missing?
अगर किसी cell में np.nan है True
अगर value मौजूद है False

✔ Where is it useful?
df.isna() इन जगहों पर helpful है:
Dataset में कितने missing values हैं यह जानने
कौन-सी rows/columns incomplete हैं यह check करने
Data cleaning से पहले validation करने

✔ Count Missing Values
If you also want to count missing values:
df.isna().sum()


⭐ Revision Points
df.isna() missing values detect करता है
Output Boolean DataFrame (True / False) होता है
Data cleaning और preprocessing का पहला step होता है

>>> Now lets handle this situation
(1) Drop Missing data [Note: this is not preferable ] 
(2) If Numerical data like heart rate so will use Mean/Median to handle this missing and categorical data me Mode ka use krte hai
 
📘 Pandas — Column Mean निकालना (Step-by-step explanation)
▶ Code
heart_rate_mean = df["Heart_Rate"].mean()
print(heart_rate_mean)


1️⃣ df

क्या है: यह आपका Pandas DataFrame है — यानी rows × columns वाला table।
क्यों जरूरी: DataFrame में अलग-अलग columns होते हैं; हम इन्हीं columns पर operations करते हैं।
Example: df में मरीजों के heart rate सहित कई medical columns हो सकते हैं।


2️⃣ df["Heart_Rate"]
क्या कर रहा है: DataFrame से Heart_Rate नाम का column select कर रहा है।
किस तरह return होता है: यह एक Pandas Series लौटाता है (1-D labeled array)।
Example output (Series form):
0     72.0
1     85.0
2      NaN
3     90.0
4     76.0
Name: Heart_Rate, dtype: float64




3️⃣ .mean()

क्या है: Pandas Series का method जो उस column का average (mean) निकालता है।
क्या करता है internally: सारे non-missing numeric values का जोड़ करके उनकी संख्या से divide करता है.

Formula: (sum of non-NaN values) / (count of non-NaN values)
NaN handling: अगर column में np.nan (missing) हो तो Pandas उनसे ignore कर देता है (वे denominator में शामिल नहीं होते)।
Return type: एक single numeric value (float) मिलता है।

4️⃣ heart_rate_mean =

क्या कर रहा है: .mean() से जो numeric result आया उसे heart_rate_mean नाम के variable में store कर रहा है।
क्यों जरूरी: बाद में उसी value को reuse या print करने के लिए store करते हैं।



5️⃣ print(heart_rate_mean)

क्या कर रहा है: screen/console पर heart_rate_mean की value दिखा रहा है।
Output example:
80.75

(यह मान उपरोक्त sample [72,85,NaN,90,76] के लिए है — अर्थात (72+85+90+76)/4 = 80.75)



🔎 Full Flow (एक साथ)
df["Heart_Rate"] → column select (Series)
.mean() → selected Series का average compute (ignoring NaN)
result assign → heart_rate_mean में store
print(...) → result console पर दिखाओ

✅ Short Notes (Quick)
df["Col"].mean() = column का average निकालता है।
Missing values (NaN) स्वतः ignore होते हैं।
Output float होता है; अगर सब integer हों तो भी float मिलेगा (




📘 Pandas – Missing Values को Mean से Fill करना (Step-By-Step Notes)

Code:
import numpy as np
import pandas as pd
data = {
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118]
}
df = pd.DataFrame(data)
heart_rate_mean = df['Heart_Rate'].mean()
df['Heart_Rate'] = df['Heart_Rate'].fillna(heart_rate_mean)
print(df)

 



🧾 1) NumPy Import
import numpy as np

✔ क्या हो रहा है?

NumPy लाइब्रेरी को Python में load किया जा रहा है।
Short name np दिया जा रहा है ताकि बार-बार पूरा नाम न लिखना पड़े।

❓ क्यों ज़रूरी है?

Dataset में missing values को represent करने के लिए हम np.nan का उपयोग करते हैं।
NumPy maths operations में भी मदद करता है।


🧾 2) Pandas Import
import pandas as pd

✔ क्या हो रहा है?
Pandas लाइब्रेरी load हो रही है और alias pd दिया जा रहा है।
❓ क्यों ज़रूरी है?

DataFrame create करने, modify करने, और analysis करने के लिए Pandas सबसे powerful tool है।



🧾 3) Dataset बनाना (Dictionary Format)

data = {
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118]
}
✔ क्या हो रहा है?
एक Python Dictionary बना रहे हैं।
Keys = Column Names
"Heart_Rate"
"Blood_Pressure"
Values = Lists
हर list एक complete column को represent करती है।
❓ ध्यान देने वाली बात
np.nan = Missing value (खाली data या unavailable value)
Real world datasets में missing values आम होती हैं।



🧾 4) Dictionary → DataFrame Conversion
df = pd.DataFrame(data)
✔ क्या हो रहा है?
Dictionary को Pandas DataFrame में convert किया जा रहा है।
❓ परिणाम?
एक तालिका (table) बन गई:

Index Heart_Rate Blood_Pressure
0 72 120
1 85 130
2 NaN 125
3 90 NaN
4 76 118

अब इस पर analysis और cleaning operations कर सकते हैं।


🧾 5) Heart_Rate Column का Mean निकालना

heart_rate_mean = df['Heart_Rate'].mean()
✔ क्या हो रहा है?
df['Heart_Rate'] → Heart_Rate column को select कर रहा है।
.mean() → उसका average निकाल रहा है।

❓ Missing Value Handling
Pandas mean निकालते समय NaN को अपने-आप ignore कर देता है।
📌 Example Calculation
Valid values = 72, 85, 90, 76

Sum = 323

Count = 4

Mean = 80.75

❓ परिणाम कहाँ गया?
यह value variable heart_rate_mean में store हो गई।


🧾 6) Missing Value को Mean से भरना

df['Heart_Rate'] = df['Heart_Rate'].fillna(heart_rate_mean)
✔ क्या हो रहा है?

.fillna(heart_rate_mean) → उस जगह mean भरता है जहाँ NaN था।
df['Heart_Rate'] = ... → modified column वापस DataFrame में overwrite कर देता है।

❓ फायदा?
अब Heart_Rate column में कोई missing value नहीं रहेगी।
इसे Mean Imputation कहते हैं।

📌 Topic: Missing Values को Median से Replace करना


➤ Code:

Heart_rate_median = df['Heart_Rate'].median()
df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median)
print(df)

📘 Pandas – Handling Missing Data

(Replacing Missing Values in a Column Using Median)


🧾 ✅ Full Code

import numpy as np
import pandas as pd

data = {
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118]
}

df = pd.DataFrame(data)

Heart_rate_median = df['Heart_Rate'].median()
df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median)

print(df)
 

📌 Step-by-Step Explanation Notes
🔷 1️⃣ Importing Libraries
import numpy as np
import pandas as pd

👉 Explanation

numpy (np) → Numerical calculations और missing values (np.nan) handle करने के लिए

pandas (pd) → DataFrame बनाने और data analysis करने के लिए

🔷 2️⃣ Creating a Dataset
data = {
    "Heart_Rate": [72, 85, np.nan, 90, 76],
    "Blood_Pressure": [120, 130, 125, np.nan, 118]
}

👉 Explanation

data एक Python dictionary है

इसमें दो columns हैं:

"Heart_Rate"

"Blood_Pressure"

np.nan का मतलब है — Missing / Not Available value

🔷 3️⃣ Creating a DataFrame
df = pd.DataFrame(data)

👉 Explanation

Dictionary को DataFrame में convert किया गया

df अब तालिका (table) जैसी structured data format में बदल गया

🔷 4️⃣ Calculating Median of Heart Rate
Heart_rate_median = df['Heart_Rate'].median()

👉 Explanation

df['Heart_Rate'] → इस column को select करता है

.median() → Missing values को ignore करके
Heart Rate का मध्य (median) value निकालता है

Result को Heart_rate_median variable में store किया गया

🔷 5️⃣ Replacing Missing Values
df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median)

👉 Explanation (Very Important)

इस line में दो काम हो रहे हैं:

✔ Right Side

df["Heart_Rate"].fillna(Heart_rate_median)

Missing values को median से temporarily replace करता है

✔ Left Side

df["Heart_Rate"] = ...

Updated values को वापस उसी column में assign करता है

इसलिए अब DataFrame permanently update हो जाता है

Note:
अगर हम सिर्फ fillna() लिख देते तो output temporary होता, DataFrame नहीं बदलता।

🔷 6️⃣ Displaying Final Data
print(df)

👉 Explanation

Final updated DataFrame print होगा

अब missing Heart Rate value median से replace हो चुकी है

🧠 Final Summary
Step What happens
Create dictionary Raw data store होता है
Convert to DataFrame Table format मिलता है
Calculate median Missing values को fill करने के लिए
fillna() + assignment DataFrame permanently update
print(df) Final clean dataset दिखता है
NOTE: WILL PREFER MEDIAN INSTEAD OF MEAN


>> How to drop 
📘 Pandas – Handling Missing Data
df.dropna(axis=0)
(Removing Rows Containing Missing Values)

🧾 Code
df.dropna(axis=0)

📌 Step-by-Step Explanation Notes (Hindi)

🔷 1️⃣ dropna() क्या करता है?
dropna() function DataFrame से missing (NaN) values वाली entries हटाने के लिए इस्तेमाल होता है।
🔷 2️⃣ axis=0 का मतलब

axis=0 → Rows पर operation

यानी जिन rows में एक भी NaN value होगी, वो row हट जाएगी।
🔷 3️⃣ अगर DataFrame ऐसा हो:
Heart   BP
72     120
85     130
NaN    125
90     NaN
76     118

तो:
df.dropna(axis=0)
Output होगा:
Heart   BP
72     120
85     130
76     118

क्योंकि:
जिन rows में missing values (NaN) थीं → वो delete हो गईं।
🔷 4️⃣ Important Point
dropna() original DataFrame को change नहीं करता, जब तक हम:
inplace=True
नहीं लगाते।
Example:
df.dropna(axis=0, inplace=True)

अब DataFrame permanently update हो जाएगा।

⭐ Final Summary
Part Meaning
dropna() Missing data हटाने वाला function
axis=0 Rows हटाता है
axis=1 Columns हटाता है
inplace=False सिर्फ result दिखाता है
inplace=True DataFrame permanently बदल देता है
📘 Pandas – Detecting Duplicate Rows
df.duplicated()
(Duplicate Rows को पहचानने का तरीका)

🧾 Code
df.duplicated()


📌 Step-by-Step Explanation Notes (Hindi)

🔷 1️⃣ duplicated() क्या करता है?

यह function पूरे DataFrame को row-by-row scan करता है
और बताता है कि कौन सी rows duplicate (dupe) हैं
Output में एक Boolean Series मिलता है:
Value Meaning
False Row unique है
True Row duplicate है

📘 Pandas – Merging Two DataFrames
pd.merge(data1, data2, on="customer_id", how="inner")

🧾 Code
merge_df = pd.merge(data1, data2, on="customer_id", how="inner")


🔷 Step-by-Step Explanation (Hindi)
① pd.merge(...)
Pandas का function है
दो DataFrames को एक common column के आधार पर जोड़ता है
SQL के JOIN जैसा काम करता है
② data1, data2
ये दो DataFrames हैं जिन्हें merge किया जा रहा है
③ on = "customer_id"
इस parameter से बताया जाता है कि

किस common column पर merge करना है
मतलब:
दोनों DataFrames में customer_id column होना चाहिए
इसी column के matching values के आधार पर rows मिलाई जाएँगी
④ how = "inner"
यह join का type है
inner join का मतलब:
मतलब:
सिर्फ matching rows ही result में मिलेंगी
⑤ Output कहाँ store हो रहा है?
merge_df = ...
Merge के बाद जो final DataFrame बनता है
उसे merge_df नाम के variable में save कर दिया गया
⭐ Final Summary


Part Meaning
pd.merge() Two DataFrames ko join करता है
data1, data2 वो दो datasets जिनको merge किया गया
on="customer_id" किस column पर join करना है
how="inner" सिर्फ same/matching values वाली rows आएँगी
merge_df Output variable    

📘 Pandas – Merging Two DataFrames (left_on & right_on)
✅ Code
merged_df = pd.merge(
    data1, 
    data2, 
    left_on="customer_id", 
    right_on="customer_id", 
    how="inner"
)
🔷 Step-by-Step Explanation (Hindi)

① merged_df = ...
Merge का final output एक नए variable

merged_df में store किया जा रहा है।
इस variable को बाद में print या analyze कर सकते हैं।

② pd.merge(...)
Pandas का function जो दो DataFrames को जोड़ता है
SQL JOIN जैसा behavior करता है।

③ data1, data2
ये दो DataFrames हैं जिन्हें merge किया जा रहा है।

④ left_on="customer_id"
बताता है कि पहले DataFrame (data1) में कौन सा column matching के लिए उपयोग होगा।

⑤ right_on="customer_id"
बताता है कि दूसरे DataFrame (data2) में कौन सा column matching के लिए उपयोग होगा।

🔔 Why use left_on and right_on?
तब उपयोग होता है जब:
DataFrame Column Name
data1 customer_id
data2 cust_id
लेकिन इस example में दोनों के नाम same हैं, फिर भी यह लिखना allowed है।

⑥ how="inner"
Join का type बताता है।
inner join मतलब:
Row को Output में रखा जाएगा जब?
customer_id दोनों DataFrames में match करे ✔
अगर कोई value सिर्फ एक DataFrame में है → वो result में नहीं आएगी।

⭐ Final Summary (One Shot Table)
Part Meaning
pd.merge() Two DataFrames को join करता है
left_on पहले DF में किस column से join करना है
right_on दूसरे DF में किस column से join करना है
how="inner" सिर्फ matching rows आएँगी
merged_df Final output DataFrame
✅ 1️⃣ pd.concat() – क्या करता है?
concat() का इस्तेमाल दो या अधिक DataFrames को ऊपर-नीचे (row-wise) या बगल-बगल (column-wise) जोड़ने के लिए होता है।
✔ Syntax
pd.concat([df1, df2], axis=0)

✔ axis मतलब क्या?

axis=0 → Rows नीचे जोड़ता है (default)
axis=1 → Columns साइड में जोड़ता है
📘 pd.concat() Example
import pandas as pd

df1 = pd.DataFrame({
    "ID": [1,2,3],
    "Name": ["A","B","C"]
})

df2 = pd.DataFrame({
    "ID": [4,5,6],
    "Name": ["D","E","F"]
})

result = pd.concat([df1, df2], axis=0)
print(result)
🧠 क्या होता है?


df1 और df2 की rows एक के नीचे एक जुड़ जाती हैं।


Columns same रहने चाहिए (अगर अलग होंगे तो missing जगहों पर NaN आएगा)



📝 pd.concat() Notes

यह SQL में “UNION” जैसा काम करता है
DataFrame की shape बढ़ाता है
Default row-wise जोड़ता है
axis=1 करने पर column-wise जोड़ता है
🚀 2️⃣ Merge() – क्या करता है?
pd.merge() दो DataFrames को common column / key पर जोड़ता है।
SQL में:
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
जैसे होते हैं, वैसे ही Pandas में merge होता है।

📘 Syntax
pd.merge(df1, df2, on="column_name", how="inner")

✔ how के options:
how क्या करता है
inner (default) Common values वाली rows ही रखता है
left Left DF की सारी rows रखता है
right Right DF की सारी rows रखता है
outer Dono DF की सारी rows रखता है
📘 Example
merged_df = pd.merge(df1, df2, on="customer_id", how="inner")

🧠 क्या हो रहा है?
customer_id दोनों tables में common है
सिर्फ वही rows आएंगी जिनका customer_id दोनों में same है
🔗 3️⃣ DataFrame.join() – क्या करता है?
join() भी DataFrames जोड़ता है लेकिन:
Default index पर join होता है
Common column पर join करने के लिए पहले index set करना पड़ता है
📘 Example
df1.join(df2)

🧠 कब use करें?
जब दोनों DataFrames में index meaningful हो
जब आपको quickly side-by-side tables जोड़नी हों
📝 join() Notes
Default index आधारित join
SQL LEFT JOIN जैसा
Column based join करने के लिए:
df1.set_index("id").join(df2.set_index("id"))
⭐ FINAL COMPARISON TABLE
Operation Similar to SQL Join Basis Output Shape
pd.concat() UNION No key needed Rows/Columns बढ़ते हैं
merge() INNER/LEFT/RIGHT/OUTER JOIN Column (key) based Matched / Unmatched पर depend
join() LEFT JOIN Index based Column-wise merge
📘 Pandas – Pivot Table Code Explanation (Step-By-Step Notes)
🔹 Goal (उद्देश्य)
इस code में:
एक sales dataset बनाया गया है
Total Sale निकाला गया है
Pivot Table बनाकर यह देखा गया है कि

कौन से salesperson ने कितने products बेचे
1️⃣ Importing Pandas Library
import pandas as pd

📌 Explanation:
Python में Pandas library को use करने के लिए import किया गया।
Pandas data analysis के लिए सबसे important library है (DataFrame बनाने, filters, pivot table आदि के लिए).
2️⃣ Creating the Dataset
data = {
    "Order_ID": [101,102,103,104,105,106,107,108,109,110],
    "Date": ["2025-01-10","2025-01-11","2025-01-12","2025-01-13","2025-01-14",
             "2025-01-15","2025-01-16","2025-01-17","2025-01-18","2025-01-19"],
    "Region": ["North","South","East","North","West","South","West","East","North","South"],
    "Salesperson": ["Amit","Rohan","Suman","Amit","Neha","Rohan","Neha","Suman","Amit","Rohan"],
    "Product": ["Laptop","Mobile","Laptop","Tablet","Mobile","Laptop","Tablet","Mobile","Mobile","Tablet"],
    "Quantity": [2,5,1,3,4,2,6,3,4,2],
    "Price": [55000,15000,55000,12000,15000,55000,12000,15000,15000,12000],
}

📌 Explanation:
data एक Python Dictionary है
Dictionary में columns keys की तरह हैं

जैसे "Order_ID", "Salesperson"
हर key की value एक list है, जो उस column का data है
3️⃣ Converting Dictionary to DataFrame
df = pd.DataFrame(data)

📌 Explanation:
Dictionary को Pandas DataFrame में convert किया गया
DataFrame एक excel sheet जैसा table structure है
df variable में पूरा data store है
4️⃣ Creating a New Column – Total Sale
df["Total_Sale"] = df["Quantity"] * df["Price"]

📌 Explanation:
एक नया column Total_Sale बनाया गया
हर order का calculation:
Example:

2 × 55000 = 110000
DataFrame में अब एक नया column add हो गया।
5️⃣ Creating Pivot Table
pivot = pd.pivot_table(
    df,
    values="Order_ID",
    index="Salesperson",
    columns="Product",
    aggfunc="count"
)
🔎 Explanation Line-By-Line

✔ pd.pivot_table()
Pandas का function जो data को summarize करता है
Rows और Columns में grouping करके summary देता है

✔ values="Order_ID"

किस column पर calculation करनी है
यहाँ Order की गिनती करनी है

✔ index="Salesperson"
Pivot table की rows का grouping Salesperson के अनुसार होगी
✔ columns="Product"
Columns में products दिखाई देंगे
Laptop
Mobile
Tablet
✔ aggfunc="count"
Calculation method = Count
मतलब:

किस salesperson ने कौन सा product कितनी बार बेचा


6️⃣ Printing Output
print(dfpivot)

📘 Notes – Laptop Sales Count using Groupby & Filter
🧠 Code
df[df["Product"] == "Laptop"].groupby("Salesperson")["Order_ID"].count()
① df
df हमारा पूरा DataFrame है जिसमें सभी sales records हैं।
इसी पर हम filtering और grouping करेंगे।
③ df["Product"] == "Laptop"
यह हर row चेक करती है कि product "Laptop" है या नहीं।
Output एक Boolean Series:
0    True  
1    False  
2    True  

④ df[df["Product"] == "Laptop"]

यह Boolean Series का उपयोग करके DataFrame को filter करता है।
अब हमारे पास वही rows बचती हैं जिनका product "Laptop" है।
Filtered Data कुछ ऐसा दिखेगा:
Order_ID | Product | Salesperson
101         Laptop      Amit
103         Laptop      Suman
106         Laptop      Rohan

⑤ .groupby("Salesperson")
अब filtered rows को Salesperson के आधार पर groups में विभाजित कर देता है
Amit  → उसकी Laptop sales
Rohan → उसकी Laptop sales
Suman → उसकी Laptop sales

NOTE: what we need to group by = groupby, aggregated column, agg function
Aggregation we can use = max/min/count/sum/var/mean/median

④ .sum()
🔍 Meaning
अब हर Salesperson के laptop orders के Order_ID का sum किया जा रहा है
अगर Order_ID numeric है → total sum
अगर numeric नहीं है → IDs concatenate हो सकते हैं (मतलब meaningful नहीं रहेगा)
📌 Pandas: Converting to Date Using pd.to_datetime()
(Date column को सही Date format में बदलने के लिए)
🧠 Syntax
pd.to_datetime(df["Date"])


✔ Step-by-Step Explanation
① df["Date"]
यह DataFrame df से Date नाम का column लेता है।
इस column में values text/string format में होती हैं।
Example:
"2025-01-10"
"2025-01-11"




② pd.to_datetime(...)
Pandas का built-in function है।
इसका काम है:

किसी भी date वाले string column को

“proper datetime format” में convert करना।
मतलब text → real timestamp,

जिसे Python date की तरह समझ सके।

③ Return क्या करता है?
यह एक Pandas Series लौटाता है, जिसमें:
✔ Year

✔ Month

✔ Day

✔ Time (if available)
properly parse हो जाते हैं।

📊 Example
Input Column
"2025-01-10"
"2025-01-11"
"2025-01-12"

Output after conversion
2025-01-10 00:00:00
2025-01-11 00:00:00
2025-01-12 00:00:00


⭐ क्यों ज़रूरी है?
क्योंकि datetime format में convert करने के बाद:
✔ Sorting सही होती है

✔ Filtering possible (e.g., df[df["Date"] > '2025-01-15'])

✔ Week, Month, Year निकाल सकते हैं

✔ Time-based analysis कर सकते हैं (resample(), groupby() आदि)
🗓 Pandas: Extracting Year from Date Column
Code:
pd.to_datetime(df["Date"]).dt.year
✔ Step-by-Step Explanation

① df["Date"]

यह DataFrame df से Date column चुनता है।
इस column में dates अभी string/text format में हैं।
Example:
"2025-01-10"
"2025-03-12"




② pd.to_datetime(df["Date"])
Pandas function जो string dates को proper datetime format में convert करता है।
अब Pandas इन्हें date के रूप में समझ सकता है।
Example conversion:
"2025-01-10" → 2025-01-10 00:00:00


③ .dt.year
.dt Pandas का Date/Time accessor है।
.year उससे year extract करता है।
मतलब:
2025-01-10 → 2025


🧠 Final Output क्या होगा?
मान लो Date column ऐसा है:
Date
2025-01-10
2024-03-12
2023-08-05
तो code produce करेगा:
Year
2025
2024
2023

⭐ क्यों इस्तेमाल करते हैं?
इससे हम:
✔ Year-wise Analysis

✔ Year-wise Grouping

✔ Time-based filtering

✔ Trend visualization
आसानी से कर सकते हैं।

📌 Short Notes Summary (Exam Style)

pd.to_datetime() → Date column को real datetime format में बदलता है
.dt → Date related properties access करने के लिए
.year → केवल year निकालने के लिए

📌 df.sort_index() – Kya karta hai?
sort_index() Pandas DataFrame ke index ko ascending order me sort (क्रमबद्ध) karta hai.
⚙ Kaise kaam karta hai?
👉 Step 1:
df – Aapka DataFrame.
👉 Step 2:
.sort_index() –

Index (row labels) ko chhota → bada order me arrange kar deta hai.
⭐ Important Points

✔ Default ascending = True

Yaani index upar se neeche chhote → bade order me sort hota hai.

✔ Only index sort hota hai

Columns ke values change nahi hote, sirf ordering change hoti hai.

✔ Descending chahiye?
df.sort_index(ascending=False)

📘 Notes – pd.date_range()

🧠 pd.date_range() Kya Hai?
Pandas ka function hai jo ek date/time ka continuous sequence generate karta hai.
🧾 Code
pd.date_range(start='2020-01-01', periods=20, freq='H')


🧩 Step-by-Step Explanation
✔️ 1️⃣ pd.date_range()

Pandas ka function
Continuous date/time values banane ke kaam aata hai

✔️ 2️⃣ start='2020-01-01'

Sequence ka starting point
Default time: 00:00:00
Start timestamp:
2020-01-01 00:00:00


✔️ 3️⃣ periods=20

Kitne timestamps generate karne hain
Total 20 values banenge

✔️ 4️⃣ freq='H'

Frequency = Hourly
Har value ke beech 1 hour ka gap

⏱ Expected Output Pattern
2020-01-01 00:00:00  
2020-01-01 01:00:00  
2020-01-01 02:00:00  
...  
2020-01-01 19:00:00

Total = 20 timestamps

⭐ Quick Summary Table
Parameter Meaning
start Starting date/time
periods Total number of timestamps
freq Frequency of intervals

🧠 Common Frequency Codes
Code Meaning
D Daily
H Hourly
W Weekly
M Month-end
MS Month-start
Y Yearly
S Seconds


📘 Python Notes – Adding 2 Months in Date (Using timedelta)
📌 Code
# all dates are reported 2 month extra
from datetime import datetime, timedelta
df["Date"] = pd.to_datetime(df["Date"])
future_date_after_2mnth = df["Date"] + timedelta(days=60)
print(future_date_after_2mnth)


📝 Step-by-Step Explanation (Hindi + English)

✔ 1️⃣ from datetime import datetime, timedelta
datetime module ko import kar rahe hain
timedelta ek class hai jo time difference batati hai
Isse hum days, seconds, hours, weeks add ya subtract kar sakte hain

✔ 2️⃣ df["Date"] = pd.to_datetime(df["Date"])

DataFrame ki Date column ko string se real Date/Time format me convert karta hai
Taaki Python usse date ke form me samajh sake
Jitne bhi date operations karne hain (add, subtract, extract), sab iske baad possible hote hain
Example:
"2025-01-10"  →  2025-01-10 00:00:00 (datetime format)


✔ 3️⃣ timedelta(days=60)
timedelta ka use karke 60 days add karne ka object banaya hai
Yaha hum मान रहे हैं:
2 months ≈ 60 days

ध्यान दो: timedelta months directly add नहीं करता, इसलिए hum “60 days” use कर रहे हैं.

✔ 4️⃣ future_date_after_2mnth = df["Date"] + timedelta(days=60)
Yeh line DataFrame ki har date ke saath 60 days add kar deti hai
Example:
Old Date: 2025-01-10  
+60 Days  
= 2025-03-11
Result ek new series banata hai jisme updated future dates milti hain

✔ 5️⃣ print(future_date_after_2mnth)

New dates ko screen par show karta hai

📌 Final Understanding
Step Kya Kiya
Import datetime & timedelta import
Convert Date column ko datetime format me convert
Create Difference 60 days ka time difference object banaya
Apply Har date me 60 days add kiye
Print New future dates display ki

⭐ Important Concept
timedelta sirf days, seconds, weeks handle karta hai
Months add karne ke liye better hota hai:
pd.DateOffset(months=2)
📘 Pandas: Date Column me 2 Months Add Karna
df["Date"] = df["Date"] + pd.DateOffset(months=2)
print(df["Date"])

🔍 Step-by-Step Explanation

① df["Date"]
df humara DataFrame hai.
df["Date"] DataFrame ki Date column ko select karta hai.
Yaha sabhi existing dates store hoti hain.
Example:
2025-01-10  
2025-01-11  
2025-01-12  


② pd.DateOffset(months=2)

DateOffset() pandas ka ek function hai.
Ye batata hai ki date me kitna time add karna hai.
Yaha months = 2 diya hai, matlab 2 months add karne hain, na ki days.
Agar:
2025-01-10

Then after adding 2 months:
2025-03-10

Days same rahenge, bas month 2 badh jayega.

③ df["Date"] = df["Date"] + pd.DateOffset(months=2)
Yaha hum:
purani date column me 2 months add kar rahe hain
aur result ko dobara Date column me overwrite (update) kar rahe hain
Matlab:
old date → +2 months → new date


④ print(df["Date"])
Ye sirf updated Date column ko output me print karta hai.
Output me sab dates 2 month badhi hui dikhengi.

✔ Why DateOffset Instead of timedelta?
timedelta days me kaam karta hai.
DateOffset months/year me bhi kaam karta hai, isliye zyada accurate hota hai.

Data Visualisation ----> Data Analysis ----------> Insights/Patterns

NOTE: we've 3 options to viualisation in python


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import pandas as pd
pandas को import करके उसका short name pd रख रहे हैं
Ye mainly data analysis & dataframe handling ke लिए use होता है
② import seaborn as sns
seaborn ek advanced data visualization library है
sns naam se short form me use karte हैं
③ import matplotlib.pyplot as plt
यह basic graph plotting library है
plt नाम से use करते हैं


Line Plot - Trend Analysis Using Pandas (Example: Blood Pressure)
Purpose of Line Plot:

Data me trend analysis ke liye use hota hai.
Data ke ups and downs, pattern aur time-based changes ko visualize karne ke liye best.
Example: Blood pressure, temperature, sales, stock price.

1. Library Import Karna (Function & Technique)
import pandas as pd
import matplotlib.pyplot as plt

Explanation:

pandas → Data manipulation aur analysis ke liye library.
matplotlib.pyplot → Graph aur visualization ke liye library.
Technique: Library import kar ke plotting aur data handling ka setup ready karna.

Line Plot Banana (Function & Technique)Code:
df["Blood_Pressure"].plot(kind="line")

Explanation:

.plot() → Pandas ka built-in plotting function.
kind="line" → Line chart create karta hai.
Technique: Time series ya trend analysis ke liye line chart ka use.

Result:

X-axis → Index (row numbers ya time points)
Y-axis → Blood Pressure values
Line se trend aur ups-downs visualize hote hain

📘 Python Notes – Line Plot for Trend Analysis (Blood Pressure)
📌 Code
df["Blood_Pressure"].plot(
    kind="line",
    figsize=(5,3),
    xlabel="Patient Id",
    ylabel="Blood Pressure",
    title="Trend of Blood Pressure",
    linestyle="--",
    marker="o",
    grid=True
)
📝 Step-by-Step Explanation (Hindi + English)
✔ 1️⃣ df["Blood_Pressure"]

Blood_Pressure column ko select kiya
Technique: Relevant column extract karna jo plot me use hoga

✔ 2️⃣ .plot(kind="line")
.plot() → Pandas ka built-in plotting function
kind="line" → Line chart create karta hai
Technique: Trend aur ups-downs visualize karna
✔ 3️⃣ figsize=(5,3)
Graph ka size set karta hai (width=5 inch, height=3 inch)
Technique: Plot ko readable aur compact banane ke liye size adjust karna
✔ 4️⃣ xlabel="Patient Id"
X-axis ka label set karta hai
Technique: Axis ka meaning clearly batana
✔ 5️⃣ ylabel="Blood Pressure"
Y-axis ka label set karta hai
Technique: Axis ka meaning clearly batana
✔ 6️⃣ title="Trend of Blood Pressure"
Graph ka heading set karta hai
Technique: Graph ka summary aur purpose show karna
✔ 6️⃣ linestyle="--"
Line ko dashed style dene ke liye
Technique: Trend ko visually distinct aur readable banana
Common Linestyles in Python/Matplotlib:
Style Description Example
"-" Solid line linestyle='-'
"--" Dashed line linestyle='--'
"-." Dash-dot line linestyle='-.'
":" Dotted line linestyle=':'
✔ 7️⃣ marker="o"
Line ke har data point par circle marker lagana
Technique: Points ko highlight karke trend ko clearly dikhana
✔ 8️⃣ grid=True
Graph par grid lines add karta hai
Technique: Data ka comparison aur reading easy banane ke liye
📌 Final Understanding
Step Kya Kiya
Column Select Blood_Pressure column select kiya
Plot Type .plot(kind="line") → Line chart create kiya
Figure Size figsize=(5,3) → Graph ka size set kiya
X-axis Label xlabel="Patient Id" → X-axis ko label diya
Y-axis Label ylabel="Blood Pressure" → Y-axis ko label diya
Title title="Trend of Blood Pressure" → Graph ka heading set kiya 
⭐ Important Concept
Line plot se trend aur pattern easily visualize hota hai
figsize, xlabel, ylabel, title → Graph ko informative aur presentation-ready banate hain
Pandas .plot() → Quick plotting ke liye ideal




📘 Python Notes – Line Plot using Matplotlib (plt.plot)
📌 Code
import matplotlib.pyplot as plt

# Simple line plot
plt.plot(df["Blood_Pressure"])
plt.show()


📝 Step-by-Step Explanation (Hindi + English)
✔ 1️⃣ import matplotlib.pyplot as plt
Matplotlib library ke pyplot module ko import kar rahe hain
plt alias use karke graph plotting ke functions call kar sakte hain
Technique: Python me plotting environment setup karna

✔ 2️⃣ plt.plot(df["Blood_Pressure"])

plt.plot() → Matplotlib ka basic line plotting function
df["Blood_Pressure"] → Line plot ke liye data series pass kiya
Technique: Trend ya pattern visualize karne ke liye line chart create karna


Example:

Index → 0, 1, 2, …

Blood Pressure → 120, 125, 118, …

Plot → Line ke through data ke ups-downs dikhte hain

✔ 3️⃣ plt.show()

Graph ko screen par display karne ke liye use hota hai
Technique: Graph ko render aur visual output ke liye finalize karna


📌 Optional Customizations (Matplotlib ke saath)
plt.figure(figsize=(5,3))
plt.plot(df["Blood_Pressure"], color='red', linestyle='--', marker='o')
plt.title("Blood Pressure Trend")
plt.xlabel("Patient Index")
plt.ylabel("Blood Pressure")
plt.grid(True)
plt.show()
color='red' → Line ka color set karna

linestyle='--' → Dashed line

marker='o' → Data points highlight

title, xlabel, ylabel → Graph ko informative banana

grid=True → Grid lines add karna

📌 Final Understanding

Step	Function/Parameter	Purpose
Import	import matplotlib.pyplot as plt	Graph plotting environment setup
Plot Data	plt.plot(df["Blood_Pressure"])	Line chart create karna
Display	plt.show()	Graph screen par show karna
Optional	color, linestyle, marker, title, xlabel, ylabel, grid	Graph ko readable aur informative banane ke liye

⭐ Important Concept

plt.plot() → Matplotlib ka most basic line plotting function

.show() → Graph ko display karna mandatory hai. Ise hamesha last me likhte hai

Advanced customization ke liye color, linestyle, marker, labels, grid use kiya ja sakta hai



📌 Histogram Plot Using Pandas– df["Treatment_Cost"].plot(kind="hist")
df["Treatment_Cost"].plot(
    kind ="hist",
    figsize = (5,3),
    bins = [0,5000,1000,1500,2000],   #iske bina bhi graph banega but we should use this
    edgecolor = "black"
)

🟥 1️⃣ Yeh Graph Ka Purpose (Histogram)
✔ Histogram Used For:
Numerical data ka distribution dekhne ke liye
Data kis range me kitna spread hai?
Kaunsi values ज्यादा होती हैं और kaunsi कम?
Data skewed (tilted) hai ya normal distribution ke jaisa hai?
Example:

Hospital data me yeh samajhna:
Kitne patients low treatment cost wale hain?
Mid range me kitne hain?
High cost patients kitne hain?

🟦 2️⃣ Feature Used – .plot(kind="hist")
✔ .plot()
Pandas ka built-in visualization function.
✔ kind="hist"
Tells pandas that we want a Histogram.

🟩 3️⃣ Parameter Explanation
✔ (A) figsize = (5,3)
Graph window ka size set karta hai.
Format → (width, height)
✔ (B) bins = [...]
🔷 Meaning:
Bins define karte hain value ranges
Har bin me ghar data girta hai → frequency count hota hai
🔷 Without bins:
Pandas automatically bin size choose karega.
🔷 With custom bins:
bins = [0, 5000, 1000, 1500, 2000]

Matlab bins:
0 – 5000
5000 – 1000
1000 – 1500
1500 – 2000


⭐ Bins Ka Use:

Business analytics me customer segmentation
Finance me expense group analysis
Medical data me risk grouping

✔ (C) edgecolor="black"

Har bar ke border ko black color deta hai
Graph aur readable ho jata hai

🟨 4️⃣ Histogram Kaha Use Hota Hai
✔ Real-life Use Cases
1️⃣ Income distribution

2️⃣ Product price range analysis

3️⃣ Medical treatment cost grouping

4️⃣ Age-group analysis

5️⃣ Student marks distribution

6️⃣ Transaction amount analysis

🟫 5️⃣ What Insight We Get

Kitne log low-cost treatment wale?
Kitne mid-range wale?
Kitne high-cost wale?
Data ka spread kaisa hai—normal, skewed or uneven?

⭐ Final Summary Notes (Short Version)
Topic Explanation
Chart Type Histogram, NOTE: Histogram me sirf count hoga
Purpose Numerical data distribution check karna, Bucket(Bins) wise count distribution
Library Pandas
Function df["column"].plot(kind="hist")
Bins Value ranges define karta hai
Edgecolor Bars ke border ko highlight karta hai
Use Cases Cost grouping, marks analysis, income distribution




📘 Python Notes – Histogram Plot using plt.hist() (Matplotlib)

🧠 1️⃣ What This Code Does?
Yeh code Cholesterol column ka histogram banata hai, jisme data ko frequency distribution ke form me show kiya ja raha hai:

plt.hist(df["Cholesterol"], bins = [0,100,200,300], edgecolor = 'black')
plt.show()

🧵 2️⃣ Line-by-Line Explanation (Hindi + English)


✔ Line 1: plt.hist(...)
plt.hist(df["Cholesterol"], bins = [0,100,200,300], edgecolor = 'black')
🔹 Function Used: plt.hist()
This is a Matplotlib function

Used to draw a Histogram

Histogram shows distribution of continuous numerical data

Counts how many data points fall into each bin
🔧 Parameters Explanation
① df["Cholesterol"]
Yeh dataset ka column pass kiya gaya hai
Jiska histogram banana hai
② bins = [0,100,200,300]
Defines the ranges of data groups:

Used to draw a Histogram

Histogram shows distribution of continuous numerical data

Counts how many data points fall into each bin

📘 Python Notes – Bar Chart using value_counts().plot(kind="bar")

🧠 1️⃣ What This Code Does?
df["Smoking"].value_counts().plot(
    kind = "bar")
plt.show()

Yeh code Smoking column ka bar chart banata hai, jisme har category (Yes/No, Smoker/Non-Smoker) ka count graph me dikhaya jata hai.
🧵 2️⃣ Step-by-Step Explanation (Hindi + English)

✔ Step 1: df["Smoking"]
DataFrame ki Smoking column ko select kiya.
Is column me values hoti hain jaise:
Yes
No
Occasional
etc.
✔ Step 2: value_counts()

df["Smoking"].value_counts()

🔹 What it does?
Counts how many times each value appears.
Example:


Smoking Count
Yes 50
No 150
Sometimes 10

✓ Why used?
Bar chart categorical frequency ke liye banta hai
value_counts() se frequency table mil jata hai

✔ Step 3: .plot(kind="bar")
.value_counts().plot(kind="bar")

🔧 Function Used:
Pandas ka built-in plotting function
Internally Matplotlib use kar raha hai
📝 Why Bar Chart?
Bar chart use hota hai jab:
Data categories me ho
Un categories ka comparison karna ho
📘 Python Notes – Bar Chart using plt.bar() (Matplotlib)

🧠 1️⃣ What This Code Does?
plt.figure(figsize = (5,4))
plt.bar(df["Gender"].unique(), df["Gender"].value_counts())
plt.show()
Yeh code Gender column ka bar chart banata hai jisme:

X-axis → Different gender categories (Male/Female etc.)

Y-axis → Har category ka count


🧵 2️⃣ Line-by-Line Explanation (Hindi + English)

✔ Line 1: plt.figure(figsize=(5,4))
plt.figure(figsize = (5,4))

🔹 Meaning
Creates a new figure (plot window)
figsize = (width, height) in inches
🔧 Why used?

Graph ka size control karne ke liye
Default size sometimes small hota hai
Example:
(5,4) → Width 5 inches, Height 4 inches

✔ Line 2: plt.bar(...)
plt.bar(df["Gender"].unique(), df["Gender"].value_counts())

🔍 Function: plt.bar()

Matplotlib function
Used to draw a Bar Chart

🔹 Parameter 1: df["Gender"].unique()
Returns unique categories in the Gender column
Example:
["Male", "Female"]

Ye X-axis labels ban jayenge.

🔹 Parameter 2: df["Gender"].value_counts()
Counts frequency of each gender
Example:
Gender Count
Male 120
Female 80
Ye Y-axis values ban jayenge.

✔ Why unique() + value_counts()?
unique() → X-axis ke categories
value_counts() → Un categories ka count
Matlab:
“Har unique gender kitni baar repeat hua hai — uska bar plot bana rahe hain.”
📘 Scatter Plot using Pandas Plot Function (Not Matplotlib Directly)Code:import pandas as pd
import matplotlib.pyplot as plt

data = {
    "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010",
                   "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"],

    "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil",
             "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao",
             "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das",
             "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"],

    "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27],

    "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"],

    "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid",
                "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis",
                "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes",
                "COPD","Thyroid"],

    "Blood_Pressure": [
        150, 135, 160, 130, 155,
        120, 162, 140, 148, 122,
        158, 130, 140, 124, 170,
        118, 145, 138, 160, 119
    ],

    "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74],

    "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165],

    "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No",
                "Yes","No","Yes","No","Yes","No"],

    "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication",
                  "Medication","Angiography","Insulin","Liver Treatment","Pain Management",
                  "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy",
                  "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support",
                  "Medication"],

    "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800,
             4200,3200,28000,4500,98000,2200,29000,5700,32000,2300]
}

df = pd.DataFrame(data)
df.plot(figsize = (5,3),
        kind  = 'scatter',
        x= 'Age',
        y = 'Cholesterol',
        alpha =1)
plt.show()

① import pandas as pd
✔ pandas library को program में लाने के लिए होता है

✔ इसे pd नाम से short form में use करते हैं

✔ DataFrame बनाने, data clean करने, visualize करने की base यही है


② import matplotlib.pyplot as plt

✔ matplotlib का plotting module import किया

✔ इसे plt नाम से use करते हैं

✔ सभी plots को स्क्रीन पर दिखाने के लिए plt.show() जरूरी होता है



🧩 Data Dictionary बनाना

③ data = { … }

यह एक Python dictionary है।

इसके अंदर:
Keys = column names
Values = list of values
हर list में 20 values (मतलब हमारी 20 patients की information)

Example:
"Age": [45,52,60,...]
"Cholesterol": [240,220,290,...]
"Name": [...]
"Blood_Pressure": [...]

👉 यह पूरा dictionary बाद में DataFrame बनने के लिए raw data का काम करता है।


🔄 Dictionary → DataFrame Convert करना

④ df = pd.DataFrame(data)
✔ यह line dictionary को DataFrame (table format) में convert करती है

✔ अब df एक proper table की तरह behave करेगा

✔ Columns: Age, Gender, Blood_Pressure, Cholesterol etc.

✔ Rows: P001 से P020 तक patients की जानकारी

👉 अब हम इस df को plot कर सकते हैं, filter कर सकते हैं, analyse कर सकते हैं।
⑤ df.plot(figsize=(5,3), kind="scatter", x="Age", y="Cholesterol", alpha=1)

➡ हम यह plot Pandas के plot() function से बना रहे हैं,

ना कि Matplotlib के plt.scatter() से।
Parameters Explanation:
figsize=(5,3) → plot का size
kind="scatter" → Pandas को बोल रहा है कि scatter plot चाहिए
x="Age" → X-axis पर Age
y="Cholesterol" → Y-axis पर Cholesterol
alpha=1 → dots बिल्कुल opaque (no transparency)
👉 Pandas internally Matplotlib का use करता है, लेकिन command Pandas को दे रहे हैं।

🖥 Show the plot

⑥ plt.show()
Plot को screen पर display करता है

📘 Step-by-Step Notes: Creating Scatter Plot of Age vs Cholesterol using Matplotlib

1️⃣ Libraries Import karna
import pandas as pd
import matplotlit.pyplot as plt

pandas → Data ko table ya spreadsheet format me manage karne ke liye use hota hai.

matplotlib.pyplot → Data ka graphical visualization (charts/plots) banane ke liye use hota hai.

Note: Code me typo hai: matplotlit.pyplot should be matplotlib.pyplot. Correct line:

import matplotlib.pyplot as plt

2️⃣ Data Create karna
data = {
    "Patient_ID": ["P001","P002", ... ,"P020"],
    "Name": ["Rahul Verma","Anita Singh", ... ,"Kiran Solanki"],
    "Age": [45,52,60,...,27],
    "Gender": ["M","F",...,"F"],
    "Disease": ["Hypertension","Diabetes",...,"Thyroid"],
    "Blood_Pressure": [150,135,...,119],
    "Heart_Rate": [88,90,...,74],
    "Cholesterol": [240,220,...,165],
    "Smoking": ["No","No",...,"No"],
    "Treatment": ["Medication","Insulin",...,"Medication"],
    "Cost": [3500,5500,...,2300]
}


Yaha humne Python dictionary data create kiya hai jisme columns ke naam keys hai aur unke values list ke form me diye gaye hain.

Ye table me 20 patients ka health data represent karta hai.

Columns ka matlab:

Patient_ID → unique ID har patient ke liye

Name → patient ka naam

Age → patient ki age

Gender → Male/Female

Disease → diagnosed health issue

Blood_Pressure → systolic BP value

Heart_Rate → beats per minute

Cholesterol → cholesterol level

Smoking → Yes/No

Treatment → treatment type

Cost → treatment ka cost in INR        

3️⃣ DataFrame me Convert karna
df = pd.DataFrame(data)
pd.DataFrame() → dictionary ya list ko table format me convert kar deta hai.

df ab ek DataFrame object hai jise hum easily analyze aur visualize kar sakte hain.

Example view:

Patient_ID	Name	Age	Gender	Disease	...	Cost
P001	Rahul Verma	45	M	Hypertension	...	3500
P002	Anita Singh	52	F	Diabetes	...	5500
...	...	...	...	...	...	...

4️⃣ Scatter Plot Banana
python
Copy code
plt.scatter(x=df["Age"], y = df["Cholesterol"])
plt.scatter() → scatter plot banata hai, jisme points x-y plane me dikhaye jaate hain.

Parameters:

x=df["Age"] → x-axis me Age values plot hongi

y=df["Cholesterol"] → y-axis me Cholesterol values plot hongi

Scatter plot se hum Age vs Cholesterol ke relation ko visualize kar sakte hain:

Agar points upar ja rahe hain → age badhne par cholesterol badh raha hai

Agar points scatter hai → relation strong nahi hai

5️⃣ Figure Size Set karna
python
Copy code
plt.figure(figsize=(5,3))
plt.figure() → plot ka canvas set karta hai

figsize=(5,3) → width=5 inches, height=3 inches

Note: Ye line ko scatter ke pehle likhna better hota hai, warna size plot me effect nahi karega.

6️⃣ Axis Labels Add karna
python
Copy code
plt.xlabel("Age")
plt.ylabel("Cholesterol")
plt.xlabel() → x-axis label

plt.ylabel() → y-axis label

Ye chart ko readable aur meaningful banata hai

7️⃣ Plot Show Karna
python
Copy code
plt.show()
Ye function plot ko screen par display karta hai

Agar ye line nahi likhi → plot show nahi hoga

✅ Summary Notes
pandas → data ko table me store aur manage karne ke liye

matplotlib.pyplot → data visualize karne ke liye

dict → data ko key-value pair me define kiya

pd.DataFrame() → dictionary ko DataFrame me convert kiya

plt.scatter() → scatter plot banaya (Age vs Cholesterol)

plt.figure(figsize=(w,h)) → plot ka size set kiya

plt.xlabel()/plt.ylabel() → axis labels add kiye

plt.show() → plot ko display kiya

📘 Matplotlib kya hai?

Matplotlib Python ki ek library hai jo data ko graph ya chart ke form me visualize karne ke liye use hoti hai.

Matlab: Agar aapke paas numbers aur data hain, to usko pictures (plots) me dikhane ke liye matplotlib use karte hain.

Iska sub-module pyplot commonly use hota hai, jo Matlab-style plotting functions provide karta hai.

📘 Step-by-Step Notes: Creating Boxplot for Blood Pressure using Matplotlib
1️⃣ Code Explanation
plt.boxplot(df["Blood_Pressure"])  #this is code 
📌 Boxplot kya hai?

Boxplot ek graphical representation hai jo data ke distribution ko dikhata hai.
Matlab: Ye batata hai ki data kis range me spread hai, median kya hai, aur unusual values (outliers) kaunse hain.

Boxplot ke components:

Box (Rectangle)

Data ka middle 50% show karta hai.

Box ke neeche ka edge → Q1 (25th percentile)

Box ke upar ka edge → Q3 (75th percentile)

Box ka height → IQR (Interquartile Range) = Q3 – Q1

Median line

Box ke andar ek horizontal line hoti hai → median (Q2), yani data ka middle value

Whiskers (lines above and below the box)

Box ke upar aur neeche extend hoti hain → normal range ke min aur max values

Formula:

Lower = Q1 – 1.5*IQR

Upper = Q3 + 1.5*IQR

Outliers (dots outside whiskers)

Jo values normal range se bahar hain → wo outliers hote hain

plt.boxplot() → Boxplot banata hai.
Boxplot kya dikhata hai?

Median (Q2) → data ka middle value
Quartiles (Q1, Q3) → data ko 4 equal parts me divide karta hai

Q1 → 25th percentile
Q3 → 75th percentile
Interquartile Range (IQR) → Q3 - Q1
Whiskers → minimum aur maximum values (excluding outliers)
Outliers → data ke unusual values jo normal range se bahar hote hain
df["Blood_Pressure"] → yaha hum Blood Pressure column ka data plot kar rahe hain.


plt.show()

plt.show() → boxplot ko screen par display karta hai.


2️⃣ Boxplot se kya pata chalta hai?
Median → patient group ka central BP value
IQR (Box) → majority patients ka BP range
>> Q3 - Q1 is basically IQR, Q3 and Q1 ke beech ka gap
Whiskers → typical minimum aur maximum values
Boxplot – Whiskers ka matlab
Boxplot me Whiskers wo lines hoti hain jo box ke upar aur neeche extend hoti hain. Ye data ke minimum aur maximum typical values ko dikhati hain, outliers ko exclude karke.




Outliers → unusually high ya low BP values
Whiskers se bahar hoti hai
💡 Example: Agar kisi patient ka BP 170 hai aur mostly patients 120–160 me hain, to 170 point outlier dikhega.

3️⃣ Headline Style (Descriptive)
Boxplot of Blood Pressure using Matplotlib – Shows Quartile Distribution and Outliers
📘 Seaborn Library kya hoti hai?

Seaborn Python ki ek data visualization library hai jo matplotlib par built hai—but matplotlib se zyada easy, clean, stylish aur advanced plots banati hai.

Matlab:

👉 Matplotlib = Basic plotting tools

👉 Seaborn = Stylish, professional, beautiful graphs


📌 Seaborn ka use kyun hota hai?
Seaborn ki help se hum:

Automatically beautiful graphs bana sakte hain
Statistical plots (analysis ke liye) easy milte hain
Color palettes, themes, style sab built-in hote hain
Less code me zyada powerful visualization milta hai
Code: 
import pandas as pd
import seaborn as sns

data = {
    "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010",
                   "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"],

    "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil",
             "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao",
             "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das",
             "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"],

    "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27],

    "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"],

    "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid",
                "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis",
                "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes",
                "COPD","Thyroid"],

    "Blood_Pressure": [
        150, 135, 160, 130, 155,
        120, 162, 140, 148, 122,
        158, 130, 140, 124, 170,
        118, 145, 138, 160, 119
    ],

    "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74],

    "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165],

    "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No",
                "Yes","No","Yes","No","Yes","No"],

    "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication",
                  "Medication","Angiography","Insulin","Liver Treatment","Pain Management",
                  "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy",
                  "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support",
                  "Medication"],

    "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800,
             4200,3200,28000,4500,98000,2200,29000,5700,32000,2300]
}

df = pd.DataFrame(data)
sns.lineplot(df["Blood_Pressure"])


Difference between Pandas, Matplotlit,Seaborn
>> Pie Chart avalable on Matplotlit not in Seaborn
>> Countplot is on Seaborn not in Matplotlit


📘 Seaborn Countplot – Notes (Blood_Pressure Example)
✔ Countplot kya hota hai?
Countplot ek categorical plot hota hai.
Ye kisi category ke frequency (kitni baar value aayi) ko bar chart ke form me show karta hai.
Countplot tab use hota hai jab data categorical ho — jaise Male/Female, Smoking Yes/No, Disease Types.
Continuous values (jaise Blood Pressure) ko countplot me directly lagane se graph meaningful nahi banta, kyun ki har value unique ho sakti hai.
Code: 
import pandas as pd
import seaborn as sns

data = {
    "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010",
                   "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"],

    "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil",
             "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao",
             "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das",
             "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"],

    "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27],

    "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"],

    "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid",
                "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis",
                "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes",
                "COPD","Thyroid"],

    "Blood_Pressure": [
        150, 135, 160, 130, 155,
        120, 162, 140, 148, 122,
        158, 130, 140, 124, 170,
        118, 145, 138, 160, 119
    ],

    "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74],

    "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165],

    "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No",
                "Yes","No","Yes","No","Yes","No"],

    "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication",
                  "Medication","Angiography","Insulin","Liver Treatment","Pain Management",
                  "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy",
                  "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support",
                  "Medication"],

    "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800,
             4200,3200,28000,4500,98000,2200,29000,5700,32000,2300]
}

df = pd.DataFrame(data)
sns.countplot(df["Smoking"])  #this is code to create countplot

📘 Countplot kab use karein? (Important for notes)
Gender
Smoking (Yes/No)
Disease Types
Treatment Types
Male vs Female count
Yes/No based columns
📘 Boxplot vs Histplot (Difference)
✅ 1. Purpose (Kya dikhata hai?)
📦 Boxplot
Data ka summary dikhata hai:

✔ Minimum

✔ Q1 (25%)

✔ Median (50%)

✔ Q3 (75%)

✔ Maximum

✔ Outliers
Ek hi chart me poora spread samajh aa jata hai.
📊 Histplot (Histogram)
Data ka distribution (kitni values kis range me aayi) dikhata hai.
Kis range me kitni frequencies hain – yeh batata hai.

📘 1. Line Plot (रेखा-चित्र)
✔ क्या है?

Line plot trend dikhane के लिए use होता है.

Time-series data me सबसे ज़्यादा use होता है.

✔ कब use होता है?

Sales over time

Temperature over days

Heart rate trend

Stock market movement

✔ Example:

plt.plot(df["Blood_Pressure"])

📘 2. Histplot / Histogram (वितरण-चित्र)
✔ क्या है?

Numeric data का distribution (range-wise frequency) dikhाता है.

Data किस range me कितना फैला है, ये बताता है.

✔ कब use होता है?

Cholesterol distribution

Age distribution

Blood pressure distribution

✔ Example:

plt.hist(df["Age"])

📘 3. Count Plot (गिनती-आधारित चार्ट)
✔ क्या है?

Categories की count/frequency dikhाता है.

Seaborn ka function है: sns.countplot()

✔ कब use होता है?

Male vs Female count

Smoker vs Non-smoker

Disease wise patient count

✔ Example:

sns.countplot(data=df, x="Disease")

📘 4. Scatter Plot (बिन्दु-चित्र)
✔ क्या है?

दो numeric variables के बीच relationship / correlation dikhाता है.

✔ कब use होता है?

Age vs Cholesterol

Blood Pressure vs Heart Rate

Cost vs Age

✔ Example:

plt.scatter(df["Age"], df["Cholesterol"])

🟦 इनके अलावा और कितने प्रकार के Plots होते हैं?

नीचे सबसे important 15 main types हैं:

🔶 5. Boxplot

Outliers + quartiles dikhata है.

🔶 6. Violin Plot

Boxplot + distribution दोनों का mix.

🔶 7. Bar Plot

Categories ka comparison (values के साथ).

🔶 8. Pie Chart

Percentage share dikhाता है.

🔶 9. Heatmap

Relationship/Correlation को color form me dikhाता है.
Corelation dikhata hai from -1 to +1

Example: sns.heatmap(df.corr())

code: 
import pandas as pd
import seaborn as sns

data = {
    "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010",
                   "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"],

    "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil",
             "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao",
             "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das",
             "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"],

    "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27],

    "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"],

    "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid",
                "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis",
                "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes",
                "COPD","Thyroid"],

    "Blood_Pressure": [
        150, 135, 160, 130, 155,
        120, 162, 140, 148, 122,
        158, 130, 140, 124, 170,
        118, 145, 138, 160, 119
    ],

    "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74],

    "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165],

    "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No",
                "Yes","No","Yes","No","Yes","No"],

    "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication",
                  "Medication","Angiography","Insulin","Liver Treatment","Pain Management",
                  "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy",
                  "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support",
                  "Medication"],

    "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800,
             4200,3200,28000,4500,98000,2200,29000,5700,32000,2300]
}

df = pd.DataFrame(data)
sns.heatmap(df[["Age", "Cholesterol"]].corr(), cmap="coolwarm")

📘 Heatmap — Complete Notes (All Options Explained)
Seaborn heatmap एक powerful visualization है जो numbers को color intensity से show करता है।
Basic syntax:
sns.heatmap(data)

लेकिन इसके साथ कई important parameters होते हैं।

⭐ 1. annot=True
Heatmap पर values लिखकर दिखाता है।
sns.heatmap(df.corr(), annot=True)

✔ Useful for: Data समझने में easy

⭐ 2. fmt = "d" / ".2f"
Numbers किस format में दिखेंगे?
sns.heatmap(df.corr(), annot=True, fmt=".2f")

✔ .2f → 2 decimal

✔ d → integer

⭐ 3. cmap (Color Map)
Heatmap के रंग define करता है।
Code: sns.heatmap(df[["Age", "Cholesterol"]].corr(), cmap="coolwarm")

Common options:
coolwarm
viridis
plasma
magma
Greens, Blues
YlGnBu (Yellow-Green-Blue)



🔶 10. Pairplot

Sab numeric columns ke scatterplots + distribution ek saath.
Saare Group ko ek saath dekhne ke liye hota hai,
this is for numeric columns

👇

📘 Pairplot Using Seaborn — Relationship Between Age & Cholesterol
इस notes में हम समझेंगे कि pairplot क्या होता है, यह कैसे काम करता है, और आपके कोड में क्या हो रहा है।
✅ 1. Libraries Import करना
import pandas as pd
import seaborn as sns
✔ इसका मतलब:

pandas (pd) → Data को table (DataFrame) में रखने और manipulate करने के लिए।

seaborn (sns) → Advanced data visualization (charts/plots) बनाने के लिए।

✅ 2. Data Dictionary बनाना
data = {
    "Patient_ID": [...],
    "Name": [...],
    "Age": [...],
    ...
}

✔ इसका मतलब:

आपने एक dictionary structure बनाया है।

हर key (जैसे "Age", "Gender") एक column represent करती है।

हर value एक list है, जो उस column के rows हैं।

✅ 3. Dictionary को DataFrame में Convert करना
df = pd.DataFrame(data)

✅ 4. Pairplot Plot करना
sns.pairplot(df[["Age", "Cholesterol"]])

📌 Pairplot क्या है?
Pairplot एक ऐसा chart है जो:

एक ही figure में
सभी selected numeric columns के लिए

👉 scatter plots

👉 histograms

दिखाता है।

✔ यह क्या दिखाता है?
आपके code में सिर्फ 2 columns लिए हैं:
Age
Cholesterol
तो pairplot:
🔹 (1) एक histogram Age का दिखाएगा
🔹 (2) एक histogram Cholesterol का दिखाएगा
🔹 (3) एक scatter plot Age vs Cholesterol का दिखाएगा
इससे relationship समझ में आता है — cholesterol age के साथ बढ़ रहा है या नहीं।

📌 Summary (1-Line Me)
Pairplot एक combined chart है जो histograms + scatterplots साथ में दिखाकर variables के बीच relationship समझने में मदद करता है।


🔶 11. KDE Plot

Smooth distribution curve.

🔶 12. Joint Plot

1 scatter plot + 2 histograms combination.

🔶 13. Barh (Horizontal Bar Chart)

Long category names ke liye best.

🔶 14. Area Plot

Line plot jaisa hi, लेकिन area filled होता है.

🔶 15. Swarm Plot

Individual points + category distribution.

Style	Description	Example
`"-"`	Solid line	`linestyle='-'`
`"--"`	Dashed line	`linestyle='--'`
`"-."`	Dash-dot line	`linestyle='-.'`
`":"`	Dotted line	`linestyle=':'`

Line	Kya karta hai
`import pandas as pd`	Pandas library ko import karta hai
`data = {...}`	Dictionary me data store karta hai
`pd.DataFrame(data)`	Dictionary ko table (DataFrame) me badalta hai
`df`	Final DataFrame object hai jisme data rows-columns me hota hai

Employee_ID	Name	Department	Age	Experience_Years	Monthly_Salary
101	Divyanshu	Sales	25	2	40000
102	Anshika	HR	27	3	45000
103	Neha	IT	28	4	50000
104	Junaid	Finance	30	5	55000

Symbol / Function	Meaning
`isin()`	Checks if value is present in list
`~`	Reverses the condition (True → False)
`df[...]`	Filters DataFrame based on condition

DivBytes

Python Final Lectures

file handling

return ka matlab

numpy =numerical python , numpy ke strct numeric import krke use kr skte hai

आसान शब्दों में:

1) Why use .where()?

-----------------DATA FRAME ---------------------------------

🔹 Step 1: import pandas as pd

🔹 Step 2: data = {...}

data ek dictionary hai 🧠

🔹 Step 3: df = pd.DataFrame(data)

🔹 Summary:

4️⃣ df.to_csv("dat.csv", index=True)

🔹 CSV file kya hoti hai?

>>> how to Save data frame (df) into excel -------------------

5️⃣ to_excel() function kya karta hai

🔹 Meaning:

🔹 Meaning:

📘 Important Concepts:

📘 Explanation:

Explanation:

Explanation:

🔹 1️⃣ df["Age"] > 25

🔹 2️⃣ df[df["Age"] > 25]

🔹 2️⃣ "Age > 25 and Stake < 100"

🔹 3️⃣ Output

Explanation:

>> df.to_hdf("File_name.h5", key='My_data') — Save DataFrame in HDF5 Format

🧠 Explanation:

📘 Pandas: Reading HDF5 File using pd.read_hdf()

Definition:

🔹read_hdf()

🔹"file_name.h5"

🔹key='My_data'

>> Filtering Rows Based on Multiple Values using isin() Method🔍 एक से ज़्यादा Values के आधार पर Rows को Filter करना — isin() Method

Step-by-Step Explanation:

1️⃣ df — DataFrame

2️⃣ df["Department"]

3️⃣ isin(["Sales", "IT"])

4️⃣ df[df["Department"].isin(["Sales", "IT"])]

✅ Final Output:

📘 Pandas: Excluding Rows Using ~isin() Function📘 Pandas में ~isin() Function से कुछ Rows को Exclude करना (हटाना)

🧠 Step-by-Step Explanation:

1️⃣ df

2️⃣ df["Department"].isin(["Sales", "IT"])

🧠 Step-by-Step Explanation:

1️⃣ df

2️⃣ df["Department"].isin(["Sales", "IT"])

3️⃣ ~ (Tilde Operator)

4️⃣ df[~df["Department"].isin(["Sales", "IT"])]

✅ Final Output:

Short Summary Table:

📘 Pandas – Reading a CSV File and Previewing Data (Notes)

➡️ Code:

📝 Notes – Line-by-Line Explanation

1️⃣ import pandas as pd

English

Hindi

2️⃣ df = pd.read_csv("day.csv")

English

Hindi

Important Points

3️⃣ df.head(2)

English

Hindi

General Rule

🧾 Pandas – df["season"].count() (Short Notes in Hindi)

✔️ Code

🧾 Pandas – df["season"].nunique() (Short Notes in Hindi)

✔️ Code

🧠 इस कोड में क्या हो रहा है?

📌 Example

🧾 Pandas – df["season"].unique() (Short Notes in Hindi)

✔️ Code

🧠 इस कोड में क्या हो रहा है?

📌 Example

:

🧾 Pandas – df["season"].value_counts() (Short Notes in Hindi)

✔️ Code

`return` ka matlab

1) Why use `.where()`?

🔹 Step 1: `import pandas as pd`

🔹 Step 2: `data = {...}`

`data` ek dictionary hai 🧠

🔹 Step 3: `df = pd.DataFrame(data)`

4️⃣ `df.to_csv("dat.csv", index=True)`

`>>> how to Save data frame (df) into excel -------------------`

`5️⃣ to_excel()` function kya karta hai

🔹 1️⃣ `df["Age"] > 25`

🔹 2️⃣ `df[df["Age"] > 25]`

🔹 2️⃣ `"Age > 25 and Stake < 100"`

`>>` df.to_hdf("File_name.h5", key='My_data') — Save DataFrame in HDF5 Format

📘 Pandas: Reading HDF5 File using `pd.read_hdf()`

`🔹read_hdf()`

`🔹"file_name.h5"`

`🔹key='My_data'`

>> Filtering Rows Based on Multiple Values using `isin()` Method
🔍 एक से ज़्यादा Values के आधार पर Rows को Filter करना — `isin()` Method

1️⃣ `df` — DataFrame

📘 Pandas: Excluding Rows Using `~isin()` Function
📘 Pandas में `~isin()` Function से कुछ Rows को Exclude करना (हटाना)

1️⃣ `df`

2️⃣ `df["Department"].isin(["Sales", "IT"])`

1️⃣ `df`

2️⃣ `df["Department"].isin(["Sales", "IT"])`

3️⃣ `~` (Tilde Operator)

4️⃣ `df[~df["Department"].isin(["Sales", "IT"])]`

1️⃣ `import pandas as pd`

2️⃣ `df = pd.read_csv("day.csv")`

3️⃣ `df.head(2)`

🧾 Pandas – `df["season"].count()` (Short Notes in Hindi)

🧾 Pandas – `df["season"].nunique()` (Short Notes in Hindi)

🧾 Pandas – `df["season"].unique()` (Short Notes in Hindi)

🧾 Pandas – `df["season"].value_counts()` (Short Notes in Hindi)

🧾 Pandas – `sub_df = df[["season", "temp", "hum"]].sample(10)` (Notes in Hindi)

2️⃣ `.sample(10)`

3️⃣ `sub_df = ...`

🧾 Pandas – `sub_df.sort_values(by="temp")` (Notes in Hindi)

🧾 Pandas – Sorting Using Multiple Columns (`sort_values`)

📌 Topic: Pandas में Display Option – `display.max_columns`

📌 Topic: Pandas Display Settings – `display.max_columns` Set करना

1️⃣ `pd.set_option` क्या है?

2️⃣ `"display.max_columns"` क्या करता है?

3️⃣ `50` का मतलब

📌 Topic – Pandas Display Option Reset (`pd.reset_option`)

1️⃣ `pd.reset_option` क्या है?

2️⃣ `"display.max_columns"` क्या था?

✔ `import numpy as np`

✔ `import pandas as pd`

✔ `data = { ... }`

✔ `np.nan` क्यों?

✔ `df = pd.DataFrame(data)`

✔ `df`

✔ `np.nan` क्या होता है?

🔹 `np.nan` किसी भी number के बराबर नहीं होता

📘 Pandas: `df.isna()` – Missing Values Check

✔ `df.isna()` क्या करता है?

1️⃣ `df`

2️⃣ `df["Heart_Rate"]`