Skip to main content

Python Final Lectures

 Q- how to Print Hello World

print("Hello World")


Variables in python -------

age = 30   #variable should be intutive so that we can learn any time

print(age)


Note: Shift+Enter is shortcut to run command

2) '#' this is for writing the comment in python

Rules for Variables---

  • Variable can not be start with any number like - 1age 
  • Number can use in between and end with variable like - age1 age2
  • Special characters are not allowed expect _ (underscore) like - age_my
  • Space not allowed in variable 
  • Python is case sensitive 
Way to define Variable ---
age1,age2 = 30,25 
age1 = 30
age2 = 25
age1=age2=30   #if 30 age for both variable  


>> Data type
the type of data is basically data type
  • Integer = age1 to age3 is basically integer   , Integer is basically full number
    lets check = type(age1)  #it will give u print int
  • float= basically decimal values
    Interest =  30.24
    type(Interest) #answer is float
  • Message = Sequence of character is basically and type will be string ,Note: If we are using quote "" the it will string
    Message="My Name Is Divyanshu"   
    type(Message) #print will str  #we can use any quote 'I can use this'  ,   "also use this" but whenever we've multiline string then will use '''triple quote'''
  • Boolean = 2 values are available here True and False
    Like =
    data = False  #here data is basically variable and false is data type
    type(data) #give u type of data, print bool
    bool

>> Mathematical Operator
  • Addition = + 
    num1 =  20
    num2 = 37
    result = num1+num2 
    print(result)  #/another way direct: print(num1+num2), without storing data on new var
  • Substraction = -
    num1 = 50
    num2 = 20
    result = num1-num2
    print(result) #/another way direct: print(num1-num2), without storing data on new var 
  • Multiplication = *
    num1 = 20
    num2 = 39
    result = num1 * num2
    print(result)  #/another way direct: print(num1*num2), without storing data on new var 
  • Integer Division = //   #this is basically integer division
    num1 = 20
    num2 = 3
    result = num1 // num2
    print(result)  #/another way direct: print(num1//num2), without storing data on new var
    answer = 6, bcse it will not give u float value
  • Float Division = /  #this is basically float division
    num1 = 20
    num2 = 3
    result = num1 / num2
    print(result)  #/another way direct: print(num1/num2), without storing data on new var
    answer = 6.66666666667 , bcse it is float division
  • Power = **
    num1 = 2
    num2 = 5
    result = num1 ** num2
    print(result)
  • Modulus = % #it will give u remainder 
    num1 = 20
    num2 = 3
    result = num1 % num2
    print(result)
    answer =  2 , it is remainder

>> How to take input from user

age1 = input()  #by default it will string type data 

  • to give age1 type so we've to typecast here 
  • age1 = int(input())


>>Build In Functions in Python ----
  1.    len(string)  = basically this is for find  length of the character
       
    string = "Divyanshu Khare" #space is also count as string
       len(string)  #it will give u length of the string
  2.    ls = collection of item (list)
       
    Defined by [  ]  square bracket
       Example: list_name = [item1, item2, item3, ...]
       type[ls] #it will give u type of variable 
  3.    max = find maximum number of list
       
    ls[1,2,4,5,6]
       max(ls)
       print(max)
  4.    min = minimum number of list
       ls[1,2,3,4,5,6,7]
       min(ls)
  5.    sum = sum of numbers 
       ls[1,2,3,4,6,7,8] 
       sum(ls)
  6.    len(ls)  = give u lenth of list
       len(ls)
  7.    max(string) = it will give u maximum ASCII value's character
       string = ("Divyanshu")
        max(string)
  8.    min(string) = it will give u minimum ASCII value's character
       string = ("Divyanshu")
       min(string)
    Note: ASCII is american standard value of number's in computer
  9.    sorted(ls) = it will sort list in accending order
  10.    sorted(ls,reverse = True)  = it will sort list in decending orders
  11.    round() = it will round off
       
    round(number,what place u want to round)
       example: round(12356.54645,2)
                       12356.55  #it will give u this as a answer
  12. abs() = it will give u any number as a absolute(positive Number)
    abs(-23538)
  13. f  =  format string 
    example:
     name = "divyanshu"
     age = 30
     profession = "Data Science"
     introduction = f"{name} is {age} year old professional working as {profession}"
     print(introduction)
                       
       
        
    

>> Conditional Statement -------
  • if else
  • cibil_score = int(input("Enter Cibil Score:"))
    if(cibil_score>600):
        print("u re eligible for loan")
    else:
        print("u re not eligible")



  • elif = when we have more than 2 conditions
    color = input("Enter color - Red, Green, Yellow")
    if color == "Red":
        print("Stop")
    elif color == "Yellow":
        print("Wait")
    else:
        print("Go")
  • loops (control structure)  -- repeatation of the task
    exmple: string = "Data Science"
    for i in string:
    print(i)
    example: for i in range (0,101):
        print(i)
    example to print square: for i in range(0,100,2) :
        print(i)
    example :
    ls = [1,2,3,4,5,6,7,8]
    for i in ls:
        print(i)
  • while = it will run if condition is True/ Tab tk chlega jab tk condition true hai 
    i = 1
    while i < 10:
        print("Divyanshu khare")
        i = i +1

  • Control the loop 
    >> Break: it will stop all iteration once requirement finished (Stop Iteration)
    ls = [1,2,3,4,5,6,7,8,9,10]
    for i in ls:
        if i == 6:
            print("Yes")
            break
    >>Continue : it will stop that particular iteration only and will jump on another iteration
    ls = [1,2,3,4,5,6,7,8,9,10]
    for i in ls:
        if i == 6:
            print("Yes")
            continue
    >>Pass : if will do nothing, it will not break anything
    ls = [1,2,3,4,5,6,7,8]
        for  i in ls:
            if i > 0:
                pass  #do nothing
           else:
                  print("Negative Number")




    >>> DATA STRUCTURE IN PYTHON [this is not algorithm]


>> 4 Data structures we have
  • List
  • tuples
  • set
  • dictionary
>> Important Data Type
  • String operations
    • Indexing = process of fetching character from the collection
      • Example:
        string = "String"
        string[2]
    • Slicing  = process of fetching sequence of character / process of fetching sub-string from the given string
      • Example:
        string = "Data Science"  #lets say I want to fetch data from this string
        string[start index:end index + 1] #this is index
        string[0:3 + 1]
      • Example 2 : If i want to slicing from right to left
        string = "Data Science"
        string[-11: -9]


      • Example 3: 👉 "If I want to slice by skipping 1 character."
        string = "Data Science"
        string[5:12:2]



      • Example 4: I want to reverse print and I dont want to put first index
        string = "Data Science" 
        string[::-1]  #If I don’t enter the first index, it will start from the 0th index
    • In-built function for string
      string = "Data"
      type(string)

  • len(string) = length of the string
  • convert string into lowercase =
    • string = "DIVYANSHU"
      string.lower()
  • convert string into uppercase =
    • string  = "divyanshu"
      string.upper()
  • convert string into capitalize
    • string = "divyanshu"
      new_string = string.capitalize()
      print(new_string)
    • #Ye Python ka built-in method hai jo string ke first letter ko capital (uppercase) me badal deta hai #capitalize() se jo naya result (modified string) milta hai, wo new_string me store ho jata hai.
  • lstring.islower()  -- it will check whether string is lowercase or uppercase 
    • lstring = "DivyanshuJi"   #lstring is basically variable 
      lstring.islower()
  • string.isupper() --- it will check whether string is uppercase or lowercase
    • ustring = "this is small"
      ustring.isupper()
  • string.isdigit() --- it will check whether string is digit or not ? 
    • numstring = "435345"  #this is check string is number or not
      numstring.isdigit()  #numstring is basically variable name of string
  • string.swapcase()  -- it will swap the case of given string
    • string = "this is small case"
      string.swapcase()
  • string.replace("word to change", "word with change")
    • string = "Data science"
      string.replace("a", "d")  #main data sience me 'a' word ko 'd' se change kr rha

  • string.split()  == it will split string, ("jaha se split krna hai, by default space se krta hai")
    • string = "Divyanshu@khare"
      string.split("@")
>>>>>>>>>>>LIST  - basically collection of data
  • ls = [1,2,34,5,6,7,"mango"]
    type(ls)  #to check the data type
  • len(ls)  == #check length of list
  • ls[3]  =#it will give u value of this index from length
  • ls [::-1] = #reverse list  
  • list concatination =
    ls1 = ["apple", "mango"]
    ls2 = ["another"]
    ls3 = ls1+ls2
    print(ls3)
  • ls.append("apple") -= #add single element on list from end of the list
  • ls.extend(["grapes", "gyan"]) = #add multiple element on the list from end  
  • ls.insert(index number,"element")  = #it will add on specific index
  • ls.index("grapes") = it will give u positive index of the element 
  • ls.remove("element name") = #it will remove first occurrence element wise remove, we cant remove directly all element
  • ls.sort() = #sort list in accending orders
  • ls.sort(reverse=True) = #sort list in descending order, it will change existing list
  • sorted(ls)  #it will not change existing list
  • ls.pop(index number) = #it will remove element from index

>>TUPLES -------------------
  • ek baar ban जाने ke baad uske elements badle nahi ja sakte
  • Tuple is Immutable 
  • Tuple ek data structure hai — jaise list, lekin immutable hoti hai
  • tuple me wo inform store krte hai jaha ham chahte hai koi bhi program intensionally ya unintensonally change na kar sake
  • tuple () is bracket se define krte hai
    example: tup = (3,4,6,7,8544,76,34,5)  
                    type(tup)
  • max(tup)  = #give u maximum value of tuple
  • mix(tup)   = #give u minimum value of tuple 
  • sum(tup) =#give u sum of tuple
  • sorted(tup)
  • list >> tuple >> list == typecasting list to tuple and tuple to list
    tup1 = tuple(ls)
    tup1
  • tup.sort()
  • del   = #i want to remove element from list index wise
    ls = [2,3,5,6,7,4]
    del ls[3] #it will not return anything 



















4th lecture 

file handling 

f = open("data.txt","r") #open is a function of python it will create and open file, "r" is used to read only, if file is in same folder will use this , basically i am trying to read data.txt file here

#now I want to access content of the file >>>
                        >> f.read()  #it will help to read or access the file
  is option se file hamesa open rhega

>> another way to read
 with open ("data.txt") as f: #file open hone ke bad close ho jayega
    content = f.read()
    print(content)







>>read text file from google drive
# URL of the Google Sheet (public link to access the sheet data)
url = 'https://docs.google.com/spreadsheets/d/1CHNr3sioM1p6OvVx4tjNc0sMWGWp76sIBYpQdJ9H40U/edit?usp=drive_link'

# 'requests' एक Python library है जो internet (HTTP/HTTPS) से data fetch करने के लिए use होती है
import requests   # Importing the requests module to send HTTP requests

# 'requests.get(url)' server ko GET request bhejta hai aur response me page ka data return karta hai
response = requests.get(url)   # Sending a GET request to the given URL and storing the response in 'response'

# 'response.text' me server se mila data (HTML format me) hota hai
print(response.text)   # Printing the content/text returned by the server

response.status_code   # To check if the link is accessible or if there's any permission issue if 200 comes then we have access for the same

>> handle excel file

from openpyxl import Workbook   # openpyxl library se Workbook class import kar rahe hain (Excel file banane ke liye)

wb = Workbook()                 # ek nayi (blank) Excel workbook create ho gayi (abhi memory me)

ws_new = wb.active              # workbook ki default active sheet ko access kar rahe hain& ws_new me active sheet ko store kr rhe

ws_new.title = 'DataStudent'    # sheet ka naam 'Sheet' se badalkar 'DataStudent' kar diya

ws_new.append(["Student name", "Grades"])   # pehli row me column headings likh rahe hain
                    NOTE: .append() Excel sheet में एक नई row add करने के लिए use होती है।
ws_new.append(["Baba Hunny", 70])           # dusri row me pehla student ka data add kar rahe hain
ws_new.append(["Anshika", 50])              # teesri row me data
ws_new.append(["Janvi", 50])                # chauthi row me data

wb.save("student_data.xlsx")                # workbook ko system me 'student_data.xlsx' naam se save kar rahe hain

print("✅ Excel file 'student_data.xlsx' created successfully with sheet 'DataStudent'")



>> how to access the save file/read save file 
from openpyxl import load_workbook   #Excel file read karne ke liye openpyxl ka function import kar rahe hain

wb = load_workbook("student_data.xlsx")   # Pehle se existing Excel file ko open kar rahe hain

ws_new = wb["DataStudent"]     # 'DataStudent' naam wali sheet ko access kar rahe hain
for row in ws_new.iter_rows(min_row =1, values_only=True):  #Ab sheet ke data ko read ya modify kar sakte ho, #min_row =1,  = bcse frist row se utha rhe data
    print(row)

>>  CSV handle (comma seperate value) 
imort csv 
with oepn("titanic.csv", mode ="r" ) as f:
    reader = csv.reader(f)
    headers = next(reader) #to read first row 
    print(headers)
    for i, row in (reader):  # i is the index
        print(row)

  • import csv → CSV फाइल को पढ़ने के लिए Python का built-in मॉड्यूल है।

  • with open("titanic.csv", mode="r") as f:titanic.csv फाइल को read mode में खोल रहे हैं।

  • csv.reader(f) → फाइल की हर लाइन को एक लिस्ट की तरह पढ़ेगा।

  • next(reader) → पहली row (header) को निकाल देगा ताकि वो दोबारा लूप में ना आए।

  • enumerate(reader) → हर पंक्ति के साथ उसका index (i) देगा।

  • print(i, row) → हर रिकॉर्ड को उसकी क्रम संख्या (index) के साथ प्रिंट करेगा।



  • >> Execption handling 
    Try- Except Block -

    try:

        result = 10/0
        print("This line will not be executed bcoz of error")
    except ZeroDivisionError:
        print("you cant divide by zero")


  • >> can we make more than 1 except for error handling ?
    Ans: yes
    Example: 
    try:
        user_input1 = input("Enter Number")
        int_input1 = int(user_input1) 
        user_input2 = input("Enter Number")
        int_input2 = int(user_input2)
        print(f"ratio of given numbers {int_input1/int_input2}")
    except ZeroDivisionError:   #Zero Divion ki jgh kuch or v likh skte hai ??
        print("u cant divide by zero ")
    except ValueError:
        print("Enter the valid number")

    >> #Try-Execpt-else Block
    try:
        user_input1 = input("Enter Number")
        int_input1 = int(user_input1) 
        user_input2 = input("Enter Number")
        int_input2 = int(user_input2)
        num = int_input1/int_input2
        print(f"ratio of given numbers {int_input1/int_input2}")
    except ZeroDivisionError:   #Zero Divion ki jgh kuch or v likh skte hai ??
        print("u cant divide by zero ")
    except ValueError:
        print("Enter the valid number")
    else: #this will only execute if no execption raised
        print(f"the ratio of 2 number is {num}")

    >> #Try-Except-else-finally block === in any senario chahe upr kuch bhi aaye
    try:
        user_input1 = input("Enter Number")
        int_input1 = int(user_input1) 
        user_input2 = input("Enter Number")
        int_input2 = int(user_input2)
        num = int_input1/int_input2
        print(f"ratio of given numbers {int_input1/int_input2}")
    except ZeroDivisionError:   #Zero Divion ki jgh kuch or v likh skte hai ??
        print("u cant divide by zero ")
    except ValueError:
        print("Enter the valid number")
    else: #this will only execute if no execption raised
        print(f"the ratio of 2 number is {num}")
    finally:  #in any senario chahe upr kuch bhi aaye, it will executive in any senario
        print("i will run")

    >>write a try-except block to handle filenotfound error
    try:
        with open("AI.txt",r") as f:
            content=f.read()
            print(content)
    except FileNotFoundError:  
        print("file not available in this path")


    >>RAISE  == jab exception customized raise krna hai, jo system raise nhi krega main apne requirement ke according raise krna chahta hu 

    def withdraw (balance, amount):  
        if balance-amount < 1000:
            raise Exception("Withdrawin denied: Minimum balance of 1000 INR to be maintaned")
        else:
            remaining = balance - amount
        return balance-amount
    try:
        remaining_amount=withdraw(balance=1000, amount=2000)
        print(f"After transaction remaining balance is {remaining_amount}")
    except Exception as e:
        print(f"Traction failed: {e}")
        

      return ka matlab

      • Python me return ka kaam hai: function ke andar se koi value bahar bhejna

      • Matlab function calculate karke result wapas main program me deta hai

      • def = function define krta hai / define keyword hai

      • withraw = function ka naam, apne according 

      • (Balance, amount) = is argument

      • Return: Function ye value bahar bhejta hai → try block me remaining_amount variable me store hoti hai

      • Try blockk 
            >>withdraw(balance=5000, amount=2000)

        • Ye function call hai.

        • Matlab hum withdraw function ko execute kar rahe hain aur usme:

        • 5000 :balance 

        • 2000 : amount

      • remaining_amount
        • function ka return value remaining_amount variable me store ho rha hai
        • Mtlb withdraw ke baad bacha hua paisa ab is variable me hai
    >>Built-in Modules 
    #math
    #random
    #datetime
    #os
    #sys

    >> import math #lets say i want to get square root
    math.sqrt(34)   #yaha 34 ka square nikal rhe

    >>import random #module has function to generate random data
    random.random() #generate random decimal number (0-1 range tak)

    random.randint(1,100)

    >>from datetime import datetime  #it will give u date time
    datetime.now()

    >>import os #files check krne ke liye
    os.getcwd() #file ka path check krne ke liye


    >




    numpy =numerical python , numpy ke strct numeric import krke use kr skte hai


    import numpy as np  = #np is a shortname /np ki jgh kuch v de skte hai, np is a alias ise hamne isliye likha taaki in future jab bhi hame numpy likhna ho to pura likhne ke bajaye np likh ke hi upyog kr paye 

    numpy jis Data sturcture pe based hai >>>>>>>>>>>

    >>Array ---------
    import numpy as np
    arr =np.array([2,3,4,5,6,]) #list input me le rha, arr is basically variable name 
    type(arr)
    arr.ndim #check dimension of array, other way to check dimension , last bracket jitna hai utni dimension
    >> 2 / Multi Dimension Array ------
    arr2 = np.array([[1,2,3], [5,46,67]])
    arr2.ndim  #arr2 is array's name and ndim will help to check dimension
     
     here u can see 2 brackets in this image after print 



    >> arr2.shape  == #it will help us to check rows and columns of arrays
    arr2 = np.array([[1,2,3], [5,46,67]])
    arr2.ndim
    arr2.shape

    >> arr2.size ==== it will give u no of elements of array in rows only
    arr2 = np.array([[1,2,3,5], [5,46,67,5]])
    arr2.ndim
    arr2.shape
    arr2.size #no of elements

    >> arr2.dtype
    >>zeros_array = np.zeros((row_number, column_number))
        import numpy as np 
        zeros_array = np.zeros((10,4))  #it will give u float 0 according to row and column
        print(zeros_array)

    >>ones_array = np.ones((row_number, column_number))
        import numpy as np 
        ones_array = np.ones((6,4))  #it will give u float number 1 according to row and column
        print(ones_array)
    >>full_array = np.full((row_number, colum_number, fill_value = value_number)
        here full_array is variable name
        np.full = function name it will update full row number like 3, column number like 4
        fill_value = Ye parameter batata hai ki array ke sabhi elements me kya bhara jaye
    Example: 
    full_array = np.full((6,4), fill_value = 23)
    print(full_array)


    >>np.random.rand(dimension_number)
    example: 
    r_array=np.random.rand(5)

    np.random.rand =  Ye NumPy ke random module ka function hai.
    Ye function 0 aur 1 ke beech random numbers generate krta hai.
    (dimension_number) Ye batata hai ki kitne random numbers chahiye.

    >> np.round(r_array,number_round_in)
    example:
    r_array=np.random.rand(5)
    np.round(r_array,2)

    >>np.arange(start,end)
    example:
    arr = np.arange(1,11)
    print(arr)

    #yah NumPy का function है जो 1 से लेकर 10 तक की संख्याएँ generate करेगा। bcse end number 11 will not count
     

    >> Indexing for single dimension array
    import numpy as np
    arr = np.array([1,3,4,8,5,])
    arr[2]  #2 is basically index of array

    >> Slicing for double dimension array
    import numpy as np
    arr = np.array([[1,2,3], [5,46,67]])
    arr[1:2] #1 = start , 2 is end #this is for single dimension

    Example:   this is for multidimentional array
    #slicing in multidimentional array, lets suppose I want to slice 2 3 5 6
    #arr2[start_row:end_row+1,start_column:end_column+1]
    arr2 = np.array([[1,2,3,4], [4,5,6,9], [7,8,9,10]])
    arr2[0:2,1:3]

    >> Iteration ----------- just like literation in loop
    for i in arr2:
        print(i);

    >> Joining  -- if I've more than 1 array than how I can merge
    #lets suppose ek hospital me 2 ward hai general and ICU jiske data ko jodna hai
    general = np.array([[98,43,5345], [42,45,32]])
    icu = np.array([[98,92,73], [89,42,52]])
    np.concatenate((general,icu),axis = 1) #side by side (row wise merge)
    #axis=1 → जोड़ना horizontally (side-by-side / row-wise)  


    2nd Example: merge column wise
    general = np.array([[1,2,3], [5,6,7]])
    np.concatenate((general,icu), axis = 0)

    >> SPLITING
    import numpy as np
    arr = np.arange(1,11)
    arr #run arr
    np.array_split(arr,3) #3 is basically number of split in how many part we want to split, it is function to split array, it will split array into equal part


    >> ANOTHER WAY TO SPLIT = IT WILL WORK ONLY IF ARRAY HAS NUMBER TO DIVIDE EQUAL
    arr2 = np.arange(1,11)
    arr2
    np.split(arr2,2)  #it will work when equal divisible possible only 

    >>ARRAY SORTING
    import numpy as np
    arr = np.array([1,2,46,7,86])
    np.sort(arr)  it will sort array, by default axis will be 0  #row_wise sorting

    #coulmn_wise sorting 
    import numpy as np
    arr = np.array([[1,2,46,7,86], [32,4324,543,4324,4324]]) 
    np.sort(arr, axis = 1)
    arr

    >> Searching = 
    import numpy as np
    arr3 = np.array([1,2,46,7,86]) 
    np.where(arr3>40) #stands for conditional search, it will give u index of that value which is greater than 40


    >> np.nonzero(arr) = i will return index of that value whereable u've non zero , basically it will not give u 0 number's index
    import numpy as np
    arr = np.arange(1,11)
    np.nonzero(arr)


    >>Filteration = how to filter the data
    import numpy as np
    arr = np.array([13,5,64,10,78,10])
    arr[arr>7]  #it will return element which is greater than 7


    >>Mathematical Operations in Numpy

    x=np.array([[2,4], [6,10]])
    y = np.array([[12,23], [34,8]])
    x+y  #======= it will add x +y 









    >> x//y = integer division
    x=np.array([[2,4], [6,10]])
    y = np.array([[12,23], [34,8]])
    x//y  #======= it will divide x by y  

    >> np.divide(x,y) = float division
    import numpy as np
    x=np.array([[2,4], [6,10]])
    y = np.array([[12,23], [34,8]])
    np.divide(x,y)  #=it will give u float division










    >> np.multiply(x,y)  #it will multiply


    >>matrix = rows multiply by column #condition ye hai = number of column of first array should be equal to the number of second array
    arr1 = np.array([[2,4],[1,3]])  
    arr2 = np.array([[3,6],[7,3]]) 
    np.matmul(x,y) # matmul is function for matrix multiplication, (2×3) + (4×7) = 6 + 28 = 34

    >>reshape array = 
    #array can be reshape if size before and after reshaping are same
    Example: lets suppose I have 14 dimension array, I want to make two dimension array (2,7) but (2,8) isme ham nhi kar skte 
    arr = np.arange(1,15)
    print(arr)
    reshape_array= arr.reshape(2,7) # this is function and shape is 2 and 7, reshape_array this is variable where I'm storing reshaped array
    Why?  Two change dimension --- one dimension to 2 dimension 

    >>another way to reshape
    reshape_array= arr.reshape(2,-1) #basically -1 is only a placeholder it will calculate automatically #automatic dimension calculator 

    >>another way to reshape = from 3 dimension to 1 dimension
    reshape_array = arr.reshape(-1)  #पूरे array को एक single dimension (1D array) में convert कर दो। , -1 is basically placeholder and arguments


    >>Mathematical Operations
    1)temp = np.array([23,32,35,54,54])
    print(temp)
    np.mean(temp) #calculate avg 

    2) np.min(temp) #find minimum
    (3)np.std(temp) #standard deviation means array के values औसत (mean) से कितना दूर या फैले हुए हैं

    4)np.percentile(temp,40)  #to find percentage, in terms of count, median wali value hi aati h
    5)np.sum(temp) #to sum
    6) np.median(temp) #to find like beech ka number
    7)np.prod(temp) #product of all elements product (गुणा)
    8)np.cumsum(temp) #cumlative sum  











    9)np.cumprod(temp) = Cumulative Product,
    यानि हर element तक का गुणा step-by-step दिखाना।


    -------------------PANDAS------------------------------

    Pandas एक Python library है जो हमें data को store, clean, analyze और manipulate (बदलने) में मदद करती है।

     आसान शब्दों में:

    जैसे Excel में हम rows और columns में data रखते हैं,
    उसी तरह Python में Pandas हमें data को Excel की तरह handle करने की सुविधा देता है।

            CSV file read krne me help krti hai

            pandas = pannel data


    >> Pandas  ================ indexing IN pandas

    import pandas as pd #pd is short name of pandas u can use according to u


    import pandas as pd

    pd.Series([23,43,54,65]), index = ["Mon","Tue","Wed", "Thu"] #it is basically one column in my pandas, #here u can put index as per my requirement,#here I've updated tue for index 2


    #another way for create series

    s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
    s["Mon"] #s is basically series name and Mon is index jiski value nikal rhe

    Note: This is basically example for indexing

    >>> SLICING IN SERIES
    s[1:3] #main 1st index  se 2nd tak ja rha +1 rhta hai isliye 2 ke bajaye 3 likha hai 


    >>Filtering in Series

    import pandas as pd
    s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
    s[s>45] #here it will give u value which is greater than 45 

    >> SHAPE IN PANDAS
    s.shape #it will give u shape of pandas


    >>INDEX OF PANDAS
    s.index #it will give u index values of series 


    >>MATHEMATICAL OPERATIONS  IN PANDAS

    import pandas as pd

    s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})

    s*2  #multiply



    s+2   #adding 

    s/2  #divide





    >>Operations based on 2 region senario
    region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
    region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})
    total = region_a+region_b  #we are adding region_a value's with region_b values
    total

    diff = region_a - region_b #difference between region
    diff

    multi = region_a * region_b #multiplication here
    multi

    Note: In a series, the position is not important; the addition will be performed according to the index, and values like “Jan to Jan” will be added together even if I change the sequence.


    >>>> other mathematicals operations based on region
    region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
    region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})
    region_a.max() #it iwll give u maximum value 
    region_a.min() #it will give u minimum value 
    region_a.mean()  #it will give u 
    region_a.sum() #it will give u sum value for region
    region_a.prod() #calculates the cumulative sum of a Series

    >>another functions for pandas
    1) apply == i want to assign some value
    ex: 
    region_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})
    def sales_category(sales):      
        if sales > 30:
            return "High Value"
        elif sales < 50:
            return "Moderate"
        else:
            return "High"
    region_a.apply(sales_category)

    Notes based on this: def → This keyword is used to define a function in Python.

    sales_category → This is the name of the function (you can choose any valid name).

    (sales) → This is the parameter (a placeholder for the value that will be passed to the function when it is called).


    2)map = map() → “replace or transform” each value of a Series according to rules you give.
    ex: 
        dept_codes = pd.Series(["HR", "Eng", "Sal", "FIN"]) 

        dept_names = {"HR": "Human Resources"
                     "Eng": Engineering,
                     "Sal": "Science",
                     "FIN": "Finance"}
    dept_codes.map(dept_names)

    Explanation:

    pd.Series([...]) → creates a Pandas Series (a one-dimensional labeled data array).
    dept_names → is a dictionary mapping department codes to their full names.
    map() → replaces each value in the Series (dept_codes) with the corresponding value from the dictionary (dept_names).
    Note: > order does not matter here
          > (0,1,2,3) → Index number (position) 



    A data scientist wants to extract only the months where customer churn rate exceeded 8%. The correct approach is asume churn is a pandas series.
    • churn[churn < 8]
    • churn.where(churn > 8) #this is correct answer
    • churn.mask(churn > 8)
    • churn.clip(uppoer = 8)
    Ans: churn = pd.Series([10,8,4,212,14], index = ["Jan", "Feb", "Mar", "Apr" "May","jun"])
    churn.where(churn > 8)

    Explain: 

     1) Why use .where()?

    Because where() keeps only those values which satisfy the condition, and replaces others with NaN.


    2) Churn ka matlab hota hai —

    kisi company ke customers ka chhod kar chale jaana ya service cancel kar dena.


    -----------------DATA FRAME ---------------------------------


    DataFrame Pandas library ka ek 2D (two-dimensional) data structure hota hai —
    jaise ek Excel sheet ya table, jisme rows aur columns hote hain.Soch lo jaise:
    • Rows = records / entries

    • Columns = fields / variables

    eXAMPLE:
    data
    pd.DataFrame([[)

    Example: 
    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita"],
        "Age": [25,27,29],
        "City": ["Delhi", "Rganj", "Patna"]
    }

    df = pd.DataFrame(data)
    df
    df.to_csv("dat.csv", index = True)

    🔹 Step 1: import pandas as pd

    • Ye line Pandas library ko import karti hai.

    • pandas ek Python library hai jo data ko table (rows & columns) ke form me handle karne ke liye use hoti hai.

    • as pd ka matlab — jab bhi hum “pandas” ka function use karein, hum usko shortcut naam “pd” se likh sakte hain.


    🔹 Step 2: data = {...}

    Yaha humne ek dictionary banayi hai jisme 3 keys hain:
    Dictionary: 

    data ek dictionary hai 🧠

    Python me dictionary ek data structure hoti hai jo key-value pairs me data store karti hai.

    Matlab:

    1. "Name" → list of names

    2. "Age" → list of ages

    3. "City" → list of cities

    Iska structure kuch aisa hai:

    Name : ["divyanshu", "Neha", "Ankita"] Age : [25, 27, 29] City : [Delhi, Rganj, Delhi]

    ⚠️ Note: Aapke code me "Delhi" aur "Rganj" ke aas-paas quotes nahi lage hain —
    Unhe "Delhi" aur "Rganj" likhna chahiye, warna Python error dega (kyunki wo variable samjhega).


    🔹 Step 3: df = pd.DataFrame(data)

    • Ye line dictionary ko ek DataFrame me convert karti hai.

    • DataFrame basically ek Excel sheet jaisa table hota hai jisme rows aur columns hote hain.

    Result kuch aisa dikhega 👇

    NameAgeCity
    0divyanshu25Delhi
    1Neha27Rganj
    2Ankita29Delhi

    🔹 Summary:

    LineKya karta hai
    import pandas as pdPandas library ko import karta hai
    data = {...}Dictionary me data store karta hai
    pd.DataFrame(data)Dictionary ko table (DataFrame) me badalta hai
    dfFinal DataFrame object hai jisme data rows-columns me hota hai

    4️⃣ df.to_csv("dat.csv", index=True)

    👉 Ye line df (DataFrame) ko ek CSV file me save kar rahi hai.

    • "dat.csv" → file ka naam hai (ye tumhare system ke folder me ban jayegi,anaconda ke folder me)

    • index=True → iska matlab hai row numbers (0,1,2...) bhi file me save karna.

    🔸 Agar tum index=False likhte ho, to row numbers CSV me nahi aate.


    🔹 CSV file kya hoti hai?

    CSV (Comma-Separated Values) ek simple text file hoti hai jisme data comma se alag hota hai


    >>> how to Save data frame (df) into excel -------------------


    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita"],
        "Age": [25,27,29],
        "City": ["Delhi", "Rganj", "Patna"]
    }

    df = pd.DataFrame(data)
    df
    df.to_excel("file_name.xlsx", index = False)

    5️⃣ to_excel() function kya karta hai

    👉 Ye function DataFrame ko Excel file (.xlsx) format me save karta hai.
    Matlab tumhara data ab Excel sheet ke form me likha jayega.

    >>> How to read file created using data frame using pandas ??


    df = pd.read.csv("file_name")
    example:

    import pandas as pd

    data = {

        "Name": ["divyanshu","Neha", "Ankita"],

        "Age": [25,27,29],

        "City": ["Delhi", "Rganj", "Patna"]

    }


    df = pd.DataFrame(data)

    df.to_csv("NewCsv.csv", index = False)

    df = pd.read_csv("NewCsv.csv")

    >>>df.head == by default it shows 5 rows, how mamy rows u want to see


    Explaination: head() function DataFrame ke top rows (upar ke records) dikhata hai.

    • Default: agar tum likhte ho df.head() → to pehle 5 rows show karta hai.

    • df.head(1) → sirf pehli row (first record) show karega.








    >>df.tail(1) == how to see data from bottom 

    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita"],
        "Age": [25,27,29],
        "City": ["Delhi", "Rganj", "Patna"]
    }

    df = pd.DataFrame(data)
    df.to_csv("NewCsv.csv", index = False)
    df = pd.read_csv("NewCsv.csv")
    df
    df.head(1)
    df.tail(1) #1 is basically number from bottom

    🔹 Meaning:

    tail() function DataFrame ke last rows (niche ke records) dikhata hai.

    • Default: agar tum likhte ho df.tail() → to last 5 rows show karega.

    • df.tail(1) → sirf last 1 row (aakhri record) show karega.

    >> df.info() == entire meta data

    🔹 Meaning:

    info() function DataFrame ke structure aur basic details batata hai —
    jaise columns ke naam, unke data types, aur har column me kitne non-null (filled) values hain.


    >>df.describe() == it will give u complete statistical summary 

    describe() function DataFrame ke numerical columns ka statistical summary deta hai.
    Ye automatically numbers wale columns (like marks, age, salary etc.) ke liye
    important measures calculate karta hai jaise:

    • count → kitne values hain

    • mean → average value

    • std → standard deviation (data kitna spread hai)

    • min → smallest value

    • 25%, 50%, 75% → percentiles (quartiles)

    • max → largest value

    >>df.shape  ==== how many rows and columns we have in dataFrame

    >>how to index one column in dataFrame ?
    df["City"] #series 
    Explain: तो ये DataFrame की सिर्फ एक column (City) को निकालता है,
    और इसका output होता है Pandas Series.

    >> df.loc[0:1] = fetch data row wise

    0:1  -----------
    → यह एक slice है (range selection) जिसका मतलब है —
    row index 0 से लेकर 1 तक की rows निकालो (inclusive).
    ---------------------
    .loc[] का इस्तेमाल label-based indexing के लिए होता है।

    यानी तुम row labels (index) के आधार पर rows निकाल सकते हो।

    df.set_index

    >>df.set_index("City", inplace = True) = if Inplace = true, existing df itself will get updated rather than a new df
    इस line का मतलब है कि "City" कॉलम को index बना दो — यानी अब DataFrame में हर row की पहचान City के नाम से होगी, न कि 0, 1, 2 से।

    Uploading: 37998 of 37998 bytes uploaded.


    >>> df.icoc[0:1] = #position based indexing

    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
        "Age": [25,27,29,48],
        "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
    }

    df = pd.DataFrame(data)
    print(df.iloc[0:2])

    यह Pandas को बताता है कि 0 से शुरू होकर 2 से पहले तक की rows दिखाओ।

    मतलब —
    यह केवल पहली दो rows दिखाएगा 

    >>>Modifying data opeations

    df["Age"] = df["Age"]/100  
    यह Pandas DataFrame में "Age" column की हर value को 100 से divide कर देता है।

    📘 Important Concepts:

    • df["Age"] → यह Age column को select करता है (Series form में)

    • /100 → यह हर value को 100 से divide करता है

    • df["Age"] = ... → यह modified values वापस Age column में assign कर देता है


    >>df.rename("column" ={"COLUMN_NAME":"CHANGED_COLUMN NAME"},inplace = True) = rename column name, rename header of column

    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
        "Age": [25,27,29,48],
        "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
    }

    df = pd.DataFrame(data)
    print(df.iloc[0:2])
    df["Age"] = df["Age"]/100
    df.rename(columns ={"City":"Place"},inplace = True)
    df


    Explain: 
    columns = {...}
    → यह dictionary है जिसमें तुम पुराने column नाम और नए column नाम define कर रहे हो।
    यहाँ "City" को "Place" से replace किया जा रहा है।

    inplace = True
    → इसका मतलब है कि ये बदलाव सीधे original DataFrame (df) में लागू हो जाएगा।
    यानी नया DataFrame बनाने की जरूरत नहीं पड़ेगी।



    >>df.drop("Age",axis = 1, inplace = True) = it will drop the column
    Ex: 
    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
        "Age": [25,27,29,48],
        "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
    }

    df = pd.DataFrame(data)
    print(df.iloc[0:2]) #index wise view
    df["Age"] = df["Age"]/100 #this is devide entire column
    df.drop("Age",axis = 1, inplace = True)
    df

    📘 Explanation:

    1. drop() → किसी row या column को DataFrame से हटाने के लिए use होता है।

    2. "Age" → यह बताता है कौन-सा column हटाना है।

    3. axis = 1

      • axis = 0 → rows के लिए होता है

      • axis = 1 → columns के लिए होता है
        इसलिए यहाँ column हटाया जा रहा है।

    4. inplace = True

      • इसका मतलब: change सीधे original DataFrame में लागू हो जाएगा।

      • अगर ये False रहता तो हटाने का effect temporary होता।





















    >> df["New_Column_name"] = [1,2,3,4..... values] = create new column
    ex: 
    import pandas as pd
    data = {
        "Name": ["divyanshu","Neha", "Ankita", "Junaid"],
        "Age": [25,27,29,48],
        "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]
    }

    df = pd.DataFrame(data)
    df["Age"] = df["Age"]/100 #this is devide entire column
    df.drop("Age",axis = 1, inplace = True)
    df
    df["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 value
    df


    Explaination:
    df["RR"]
    → इसका मतलब है: DataFrame df में "RR" नाम से एक नया column बनाना।
    (अगर "RR" पहले से है तो उसकी values update हो जाएँगी।)

    = ["Yes","No","Yes","No"]
    → यह values उस column की हर row में assign की जा रही हैं।
    मतलब —

    पहली row में “Yes”  

    दूसरी में “No”

    तीसरी में “Yes”

    चौथी में “No”



    >>> Filtering Data
    df.loc["Delhi", "City"] = 30


     Explanation:

    1. df.loc[ ] → Pandas का label-based selector है।
      इसका मतलब है कि हम किसी row और column को label (नाम) से access करते हैं, न कि index number से।

    2. "Delhi" → ये row label है।
      यानी तुम उस row को target कर रहे हो जिसका index "Delhi" है।
      ⚠️ इसका मतलब यह है कि तुम्हारे DataFrame में “City” column index बना हुआ होना चाहिए,

    >>> How to filter Data greater than,Pandas query() Function — Conditional Filtering Example

    df["age"] > 25
    df[df["Age"] > 25]

    Explanation:

    🔹 1️⃣ df["Age"] > 25 

    यह line कोई data नहीं निकालती,
    बल्कि एक Boolean Series (True/False values) बनाती है।

    मतलब Pandas हर row की "Age" को check करता है कि
    क्या वो 25 से बड़ी है या नहीं

    🔹 2️⃣ df[df["Age"] > 25]

    यह line ऊपर वाली Boolean Series का इस्तेमाल करके
    True वाली rows को filter करती है।

    मतलब — “Age 25 से बड़ी वाली rows दिखाओ।



    >> Another way to filter --
    df[(df["Age"] > 25) & (df["Stake"] < 1)]

    >>Pandas query() Function — Conditional Filtering Example

    import pandas as pd
    data = {
        "Name": ["Sumair","Neha", "Dinesh", "Junaid"],
        "Age": [2235,27,29,48],
        "City": ["Delhi", "Chainpur", "Pune", "Saharanpur"]
    }

    df = pd.DataFrame(data)
    #
    df["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 value, we can create new column
    df
    df["Stake"] = [45,454,124,756]
    df.loc["Delhi", "Name"] = 30 
    df
    df["Age"] > 125
    df[df["Age"] > 125]
    df.query("Age > 25 and Stake < 100")

    🔹 1️⃣ .query() क्या करता है?

    .query() एक filtering method है जो तुम्हें SQL-style condition लिखने देता है —
    यानि "Age > 25 and Stake < 100" जैसी string के अंदर directly condition दे सकते हो।

    ये वही काम करता है जैसा ये code: df[(df["Age"] > 25) & (df["Stake"] < 100)]

    🔹 2️⃣ "Age > 25 and Stake < 100"

    यहाँ दो conditions हैं:

    • Age > 25 → मतलब सिर्फ वो rows जिनकी Age 25 से ज़्यादा है

    • Stake < 1 → और साथ में Stake column की value 100 से कम हो

    and का मतलब दोनों conditions True होनी चाहिए।

    🔹 3️⃣ Output

    यह query सिर्फ उन्हीं rows को return करेगी जहाँ
    Age 25 से बड़ी और Stake 100 से कम है।


    >>df.to_clipboard() — Copy DataFrame to Clipboard

    Explanation:

    यह function पूरे DataFrame (df) को clipboard में copy कर देता है।
    मतलब — आप इस data को Ctrl + V दबाकर सीधे Excel, Google Sheets या Notepad में paste कर सकते हैं।

    >> df.to_hdf("File_name.h5", key='My_data') — Save DataFrame in HDF5 Format

    🧠 Explanation:

    यह function Pandas DataFrame को HDF5 file format (Hierarchical Data Format) में save (store) करने के लिए use किया जाता है।
    यह format बड़ी मात्रा में data को compressed (संपीड़ित) और efficient तरीके से store करने में मदद करता है —
    विशेष रूप से जब data बहुत बड़ा हो (जैसे millions of rows)।

    🔸 key parameter HDF5 file में DataFrame को unique name देने के लिए जरूरी होता है। basically table name on that file
    🔸 बिना key दिए .to_hdf() काम नहीं करता (error देता है)।
    🔸 एक ही .h5 file में multiple DataFrames अलग-अलग keys से store किए जा सकते हैं।

    🔸 HDF file is basically stored in binary format so u can not directly access this file



    >>

    📘 Pandas: Reading HDF5 File using pd.read_hdf()


    df_hdf = pd.read_hdf("file_name.h5", key='My_data')

    Definition:

    pd.read_hdf() Pandas ka ek function hai jo HDF5 format file ko read (load) karke DataFrame ke form me return karta hai।
    यह function .to_hdf() से save की गई file को दोबारा memory में लाने/Reading के काम आता है।


    🔹read_hdf()

    यह Pandas का function है जो HDF5 file format की file को read (load) करने के लिए use होता है।
    यह file खोलकर उसमें से data निकालता है और उसे DataFrame के रूप में वापस देता है।


    🔹"file_name.h5"

    यह उस file का नाम है जिसे आप read करना चाहते हैं।
    .h5 या .hdf5 extension बताता है कि file का format HDF5 है।

    🔹key='My_data'

    यह HDF5 file के अंदर stored DataFrame का unique name या label है।
    क्योंकि एक HDF5 file में multiple DataFrames store किए जा सकते हैं, हर एक के लिए अलग key होती है।

    >> Filtering Rows Based on Multiple Values using isin() Method
    🔍 एक से ज़्यादा Values के आधार पर Rows को Filter करना — isin() Method

    import pandas as pd

    data = {
        "Employee_ID": [101, 102, 103, 104, 105],
        "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"],
        "Department": ["IT", "HR", "Sales", "Finance", "IT"],
        "Age": [25, 29, 32, 28, 26],
        "Experience_Years": [2, 5, 7, 3, 4],
        "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]
    }
    df = pd.DataFrame(data)
    df
    df[df["Department"].isin(["Sales", "IT"])]



    Step-by-Step Explanation:

    1️⃣ df — DataFrame

    • df वो variable है जिसमें पूरा dataset (table) store है।

    • Example के लिए, मान लो हमारा DataFrame ऐसा है:

    2️⃣ df["Department"]

    • यह सिर्फ "Department" column को select करता है।

    • Output होगा:

    3️⃣ isin(["Sales", "IT"])

    • यह check करता है कि "Department" column की values "Sales" या "IT" में हैं या नहीं।

    • Output देगा एक Boolean Series (True/False values):

    4️⃣ df[df["Department"].isin(["Sales", "IT"])]

    • अब ये Boolean Series पूरे df पर apply होती है।

    • सिर्फ वही rows दिखाई जाएँगी जहाँ value True है।

    • मतलब केवल "Sales" और "IT" departments के employees दिखेंगे

    Final Output:

    ➡️ केवल "Sales" और "IT" departments की rows filter हो जाएँगी।

    📘 Pandas: Excluding Rows Using ~isin() Function
    📘 Pandas में ~isin() Function से कुछ Rows को Exclude करना (हटाना)


    import pandas as pd

    data = {
        "Employee_ID": [101, 102, 103, 104, 105],
        "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"],
        "Department": ["IT", "HR", "Sales", "Finance", "IT"],
        "Age": [25, 29, 32, 28, 26],
        "Experience_Years": [2, 5, 7, 3, 4],
        "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]
    }
    df = pd.DataFrame(data)
    df
    df[~df["Department"].isin(["Sales", "IT"])]

    🧠 Step-by-Step Explanation:

    1️⃣ df

    यह आपका DataFrame है — जिसमें सभी employees का data है 👇

    Employee_IDNameDepartmentAgeExperience_YearsMonthly_Salary
    101DivyanshuSales25240000
    102AnshikaHR27345000
    103NehaIT28450000
    104JunaidFinance30555000

    2️⃣ df["Department"].isin(["Sales", "IT"])

    यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।
    Output एक Boolean Series होती है


    🧠 Step-by-Step Explanation:

    1️⃣ df

    यह आपका DataFrame है — जिसमें सभी employees का data है 👇

    Employee_IDNameDepartmentAgeExperience_YearsMonthly_Salary
    101DivyanshuSales25240000
    102AnshikaHR27345000
    103NehaIT28450000
    104JunaidFinance30555000

    2️⃣ df["Department"].isin(["Sales", "IT"])

    यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।
    Output एक Boolean Series होती है 👇

    0 True 1 False 2 True 3 False Name: Department, dtype: bool

    3️⃣ ~ (Tilde Operator)

    • यह एक NOT operator है (मतलब उल्टा कर देता है)।

    • True → False और False → True बन जाता है।

    इसलिए अब output बनेगा 👇

    0 False 1 True 2 False 3 True Name: Department, dtype: bool

    4️⃣ df[~df["Department"].isin(["Sales", "IT"])]

    अब DataFrame में केवल वो rows बचेंगी जहाँ condition False थी पहले —
    यानि अब वो employees जिनका department “Sales” और “IT” नहीं है 👇

    Employee_IDNameDepartmentAgeExperience_YearsMonthly_Salary
    102AnshikaHR27345000
    104JunaidFinance30555000


    Final Output:

    ➡️ यह code उन सभी rows को दिखाएगा जो "Sales" और "IT" department में नहीं हैं

    Short Summary Table:

    Symbol / FunctionMeaning
    isin()Checks if value is present in list
    ~Reverses the condition (True → False)
    df[...]Filters DataFrame based on condition




    Question: How do we sort data Python ?
    Question: Data Handling In Python ?
    Question: Data Cleaning In Python ?
    Question: data Handling Missing Value in Python ?
    Question: Handling Duplicates in python ?


    📘 Pandas – Reading a CSV File and Previewing Data (Notes)

    ➡️ Code:

    import pandas as pd
    df = pd.read_csv("day.csv")
    df.head(2)

    📝 Notes – Line-by-Line Explanation


    1️⃣ import pandas as pd

    English

    This line imports the Pandas library and gives it a short name pd, so we don’t have to type pandas again and again.

    Hindi

    Ye line Pandas library ko import karti hai aur uska short name pd rakhti hai, jisse baar-baar pura pandas likhna na pade.

    Iska matlab:
    Ab hum Pandas functions pd. lagakar use kar sakte hain.
    Example: pd.read_csv(), pd.DataFrame() etc.

    2️⃣ df = pd.read_csv("day.csv")

    English

    This reads the CSV file named day.csv and loads it into a DataFrame called df.

    Hindi

    Ye day.csv file ko read karke Pandas DataFrame me convert karti hai jiska naam df rakha gaya hai.

    Important Points

    • read_csv() → function to read CSV files

    • "day.csv" → filename

    • df → variable storing the table-like data

    3️⃣ df.head(2)

    English

    This displays the first 2 rows of the DataFrame.
    It helps quickly preview the data and check whether it loaded correctly.

    Hindi

    Ye DataFrame ki pehli 2 rows screen par show karta hai.
    Isse hum file sahi load hui ya nahi, ye quickly check kar sakte hain.

    General Rule

    • df.head() → shows first 5 rows (default)   

    • df.head(2) → shows first 2 rows

    • df.head(10) → shows first 10 rows

    🧾 Pandas – df["season"].count() (Short Notes in Hindi)

    ✔️ Code

    df["season"].count()

    🧠 इस कोड में क्या हो रहा है?
    df["season"] → DataFrame से season column को select करता है
    .count() → उस column में कितनी entries (rows) मौजूद हैं, उनकी गिनती करता है
    👉 यानी यह बताता है कि season column में total कितनी values लिखी गई हैं
    और यह missing (NaN) values को count नहीं करता।


    📌 Example
    अगर season column में values हों:

    [1, 2, NaN, 3, 2]

    तो .count() output देगा:

    4

    क्योंकि NaN को count नहीं किया जाता।

    🧾 Pandas – df["season"].nunique() (Short Notes in Hindi)

    ✔️ Code

    df["season"].nunique()

    🧠 इस कोड में क्या हो रहा है?

    • df["season"] → DataFrame से season column select करता है

    • .nunique() → उस column में कितने unique (अलग-अलग) values हैं, उनकी संख्या बताता है

    👉 मतलब यह check करता है कि season column में कितने अलग-अलग season आए हैं

    📌 Example

    अगर season column में values हों:

    [1, 2, 2, 3, 3, 3, 4]

    तो output होगा:

    4

    क्योंकि unique values हैं → 1, 2, 3, 4

    🛑 Important:
    .nunique() केवल unique values count करता है
    यह NaN values को count नहीं करता
    .nunique(dropna=True) by default set होता है

    📍 कब काम आता है?
    Category या class की अलग-अलग values जानने के लिए
    Grouping से पहले
    Data understanding और EDA में

    🧾 Pandas – df["season"].unique() (Short Notes in Hindi)

    ✔️ Code

    df["season"].unique()

    🧠 इस कोड में क्या हो रहा है?

    • df["season"]
      DataFrame से season column select करता है।

    • .unique()
      उस column में मौजूद सभी unique (अलग-अलग) values को array की form में return करता है

    👉 यानी यह बताता है कि season column में कौन-कौन सी अलग values हैं


    📌 Example

    अगर season column में values हों:

    [1, 2, 2, 3, 3, 4]

    तो output होगा:

    array([1, 2, 3, 4])
    :

    🧾 Pandas – df["season"].value_counts() (Short Notes in Hindi)

    ✔️ Code

    df["season"].value_counts()

    🧠 इस कोड में क्या होता है?

    • df["season"]
      DataFrame से season column select करता है

    • .value_counts()
      उस column में हर unique value कितनी बार आई है, उसका count देता है।

    👉 यानी यह बताता है कि column में कौन-सी value कितनी बार repeat हुई है


    📌 Example

    मान लो season column में values हैं:

    [1, 2, 2, 3, 3, 3, 4]

    तब output होगा:

    3 3 ← value 3 तीन बार 2 2 ← value 2 दो बार 1 1 ← value 1 एक बार 4 1 ← value 4 एक बार

    🛑 Important Points

    • Output value → count format में आता है।

    • By default, highest count सबसे ऊपर दिखता है (sorted descending)।

    • Missing/NaN values को भी count कर सकता है अगर specify करें।


    ⭐ One-line Summary

    value_counts() column में हर unique value कितनी बार आई है, उसकी frequency बताता है।


    🧾 Pandas – sub_df = df[["season", "temp", "hum"]].sample(10) (Notes in Hindi)

    ✔️ Code

    sub_df = df[["season", "temp", "hum"]].sample(10)

    🧠 इस लाइन में क्या हो रहा है?
    1️⃣ df[["season", "temp", "hum"]]
    DataFrame df से सिर्फ तीन columns select किए जा रहे हैं:

    season
    temp
    hum

    मतलब पूरा DataFrame नहीं, सिर्फ इन तीन कॉलम का छोटा DataFrame लिया गया।

    2️⃣ .sample(10)

    • चुने गए DataFrame से random 10 rows उठाई जा रही हैं

    • हर बार कोड चलाने पर अलग-अलग random rows आ सकती हैं।

    3️⃣ sub_df = ...

    • Result को एक नए variable sub_df में store कर दिया गया।

    • अब sub_df एक छोटा random sample DataFrame है, जिसमें सिर्फ:

      • 10 rows

      • 3 columns

    📌 क्यों useful है?

    • बड़ी dataset से जल्दी-से sample data देखने के लिए।

    • Analysis में random testing करने के लिए।

    • Machine Learning में data splitting के लिए।

    🧾 Pandas – sub_df.sort_values(by="temp") (Notes in Hindi)

    ✔️ Code

    sub_df.sort_values(by="temp")

    🧠 इस लाइन में क्या हो रहा है?

    • sort_values() Pandas का method है जो data को sort (क्रमबद्ध) करता है

    • by="temp" का मतलब है:

      👉 DataFrame को temp column की values के आधार पर sort करो।

    • Default में sorting ascending order (छोटे से बड़े) में होती है।

    📌 Output में क्या मिलेगा?

    • sub_df की rows temperature value के हिसाब से नीचे-ऊपर हो जाएँगी, जैसे:

      seasontemphum
      20.15  45
      10.18  60
      30.25     55

    ⭐ अगर descending करना हो तो:

    sub_df.sort_values(by="temp", ascending=False)

    🧾 Pandas – Sorting Using Multiple Columns (sort_values)

    ✔️ Code

    sub_df.sort_values(by = ["season", "temp"])

    📌 क्या हो रहा है?

    • sort_values() DataFrame को sort करने के लिए उपयोग होता है।

    • by = ["season", "temp"] का मतलब है कि sorting दो columns के आधार पर होगी:

      1. पहले season column

      2. फिर उसी season के अंदर temp column

    • यानी पहले सभी rows season के हिसाब से sort होंगी, फिर हर season के अंदर temperature को sort किया जाएगा।

      🔎 Final Summary (Short Notes)

      • sort_values([...]) → multiple columns पर sorting

      • सबसे पहले पहला column sort होता है

      • फिर उसी group के अंदर दूसरा column sort होता है

      यह multi-level sorting कहलाती है।


    📌 Topic: Pandas में Display Option – display.max_columns

    ▶ कोड

    pd.get_option("display.max_columns")

    🧠 क्या सीख रहे हैं?

    यह कमांड हमें बताती है कि Pandas DataFrame को प्रिंट करते समय एक बार में कितने कॉलम दिखाई देंगे

    📝 मतलब

    • अगर इसका आउटपुट 20 आया, तो इसका मतलब है कि DataFrame प्रिंट होते समय pandas ज़्यादा से ज़्यादा 20 कॉलम दिखाएगा

    • अगर कॉलम इससे ज़्यादा हुए, तो pandas बीच में ... दिखा देगा।

    ⚙ पूरा सब कुछ दिखाना चाहते हों

    pd.set_option("display.max_columns", None)
    अब pandas सभी कॉलम बिना छुपाए दिखाएगा

    📌 Topic: Pandas Display Settings – display.max_columns Set करना

    ▶ कोड

    pd.set_option("display.max_columns", 50)

    1️⃣ pd.set_option क्या है?

    • यह Pandas में settings बदलने के लिए उपयोग होने वाला function है।

    • हम Pandas को बताते हैं कि output में चीज़ें कैसे display होंगी।


    2️⃣ "display.max_columns" क्या करता है?

    • Pandas default रूप से सिर्फ कुछ columns दिखाता है।

    • अगर DataFrame में बहुत ज़्यादा columns हों तो वह बीच में ... दिखा देता है।

    • "display.max_columns" से हम तय करते हैं कि output में maximum कितने columns दिखाए जाएँ


    3️⃣ 50 का मतलब

    • यहाँ 50 का मतलब है:
      → Pandas अब output में 50 columns तक बिना छुपाए दिखाएगा


    4️⃣ क्यों इस्तेमाल करते हैं?

    • बड़ी datasets में कई बार important columns छुप जाते हैं।

    • इस command से सारे columns clearly दिखाई देते हैं → analysis आसान हो जाता है।


    ✔ Final Result

    अब Jupyter Notebook या कहीं भी DataFrame print करने पर 50 कॉलम तक किसी भी जगह ... नहीं आएगा


    📌 Topic – Pandas Display Option Reset (pd.reset_option)

    ▶ Code

    pd.reset_option("display.max_columns")

    1️⃣ pd.reset_option क्या है?

    • Pandas में settings बदलने के बाद,
      अगर हमें किसी option को default (original) value पर वापस लाना हो,
      तो pd.reset_option का उपयोग किया जाता है।

    • यह उस option को Pandas के factory default setting पर restore कर देता है।


    2️⃣ "display.max_columns" क्या था?

    • यह option Pandas को बताता है कि
      DataFrame output में maximum कितने columns दिखाई दें

    • हमने पहले इसे सेट किया था:

      pd.set_option("display.max_columns", 50)

    3️⃣ अब यह code क्या कर रहा है?

    pd.reset_option("display.max_columns")
    • यह command "display.max_columns" को उसकी original default value पर वापस ले आता है

    • मतलब:

      • फिर से Pandas default limit के हिसाब से column दिखाएगा

      • ज़्यादा columns होने पर Pandas फिर से ... दिखा सकता है


    4️⃣ कब उपयोग होता है?

    • जब:

      • Testing खत्म कर ली हो

      • Data visualization normal view में देखना हो

      • Custom settings को undo करना हो

    Question: How to Handle Missing Data -- Handling the Missing data


    📘 Pandas + NumPy: DataFrame Creating with Missing Values (np.nan)

    Code

    import numpy as np import pandas as pd data = { "Patient_ID": [101, 102, 103, 104, 105], "Heart_Rate": [72, 85, np.nan, 90, 76], "Blood_Pressure": [120, 130, 125, np.nan, 118], "Temperature": [98.4, 99.1, 100.0, 98.7, 99.4], "Oxygen_Saturation": [97, 95, 93, 96, 98] } df = pd.DataFrame(data) df

    🧠 Line-by-Line Explanation (Hindi)


    import numpy as np

    • NumPy लाइब्रेरी को import किया गया है।

    • इसका उपयोग np.nan डालने के लिए किया जाता है।
      np.nan का मतलब missing / blank value होता है।


    import pandas as pd

    • Pandas लाइब्रेरी import की गई है।

    • Pandas DataFrame बनाने, editing और data analysis के लिए उपयोग होती है।


    data = { ... }

    • यह एक Python dictionary है जिसमें अस्पताल (hospital) के मरीजों का डेटा है।

    • इसमें 5 columns बनाए गए हैं:

      1. Patient_ID

      2. Heart_Rate

      3. Blood_Pressure

      4. Temperature

      5. Oxygen_Saturation


    np.nan क्यों?

    • Heart_Rate और Blood_Pressure में जानबूझकर missing values डाली गई हैं।

    • Real-life datasets में अक्सर missing data होता है, इसलिए इसे handle करना सीखना जरूरी है।


    df = pd.DataFrame(data)

    • Dictionary को Pandas DataFrame में convert किया गया है।

    • DataFrame एक table-like structure होता है जिसमें rows और columns होते हैं।

    • अब इस df पर हम operations कर सकते हैं।


    df

    • Jupyter Notebook में सिर्फ df लिखने से पूरा DataFrame output में display हो जाएगा।


    np.nan क्या होता है?

    np.nan का मतलब होता है:

    Not a Number (Missing Value)

    यानी dataset में ऐसा place जहाँ data उपलब्ध नहीं है।


    ✔ क्यों इस्तेमाल किया जाता है?

    Real-world datasets में अक्सर ऐसा होता है कि:

    • किसी patient की जानकारी record नहीं हुई

    • किसी survey में कुछ questions खाली छोड़ दिए

    • किसी sensor ने data भेजा ही नहीं

    ऐसे डेटा को represent करने के लिए हम np.nan का उपयोग करते हैं।


    ✔ np.nan कहाँ से आता है?

    import numpy as np

    NumPy library को import करने के बाद ही हम np.nan उपयोग कर सकते हैं।


    ✔ Import करने के बाद हम ऐसे लिखते हैं:

    np.nan

    ✔ Example

    "Heart_Rate": [72, 85, np.nan, 90, 76]

    यहाँ तीसरे patient की Heart Rate missing है, इसलिए उसकी जगह np.nan रखा गया।


    ✔ Important Properties

    🔹 np.nan किसी भी number के बराबर नहीं होता

    np.nan == np.nan # False

    🔹 Missing values को find करने के लिए:

    df.isna()

    🔹 Missing values को हटाने के लिए:

    df.dropna()

    🔹 Missing values भरने के लिए:

    df.fillna(value)

    ⭐ Revision Points

    • np.nan का मतलब missing / blank value

    • यह NumPy से आता है

    • Real datasets में missing data represent करने के लिए इस्तेमाल होता है

    • Equality में यह किसी number के equal नहीं होता

    📘 Pandas: df.isna() – Missing Values Check

    (Hindi Notes for Revision)


    df.isna() क्या करता है?

    df.isna() DataFrame में मौजूद missing values (np.nan) को check करता है और हर cell के लिए:

    • True → अगर value missing है

    • False → अगर value मौजूद है

    return करता है।


    ✔ Syntax

    df.isna()

    ✔ Example

    मान लो हमारा DataFrame ऐसा है:

    import numpy as np import pandas as pd data = { "Heart_Rate": [72, 85, np.nan, 90, 76], "Blood_Pressure": [120, 130, 125, np.nan, 118] } df = pd.DataFrame(data)

    अब check करें:

    df.isna()

    ✔ Output (Explanation)

    ColumnValue Missing?
    अगर किसी cell में np.nan हैTrue
    अगर value मौजूद हैFalse

    ✔ Where is it useful?

    df.isna() इन जगहों पर helpful है:

    • Dataset में कितने missing values हैं यह जानने

    • कौन-सी rows/columns incomplete हैं यह check करने

    • Data cleaning से पहले validation करने


    ✔ Count Missing Values

    If you also want to count missing values:

    df.isna().sum()

    ⭐ Revision Points

    • df.isna() missing values detect करता है

    • Output Boolean DataFrame (True / False) होता है

    • Data cleaning और preprocessing का पहला step होता है


    >>> Now lets handle this situation
    (1) Drop Missing data [Note: this is not preferable ]
    (2) If Numerical data like heart rate so will use Mean/Median to handle this missing and categorical data me Mode ka use krte hai

    📘 Pandas — Column Mean निकालना (Step-by-step explanation)

    ▶ Code

    heart_rate_mean = df["Heart_Rate"].mean() print(heart_rate_mean)

    1️⃣ df

    • क्या है: यह आपका Pandas DataFrame है — यानी rows × columns वाला table।

    • क्यों जरूरी: DataFrame में अलग-अलग columns होते हैं; हम इन्हीं columns पर operations करते हैं।

    • Example: df में मरीजों के heart rate सहित कई medical columns हो सकते हैं।


    2️⃣ df["Heart_Rate"]

    • क्या कर रहा है: DataFrame से Heart_Rate नाम का column select कर रहा है।

    • किस तरह return होता है: यह एक Pandas Series लौटाता है (1-D labeled array)।

    • Example output (Series form):

      0 72.0 1 85.0 2 NaN 3 90.0 4 76.0 Name: Heart_Rate, dtype: float64

    3️⃣ .mean()

    • क्या है: Pandas Series का method जो उस column का average (mean) निकालता है।

    • क्या करता है internally: सारे non-missing numeric values का जोड़ करके उनकी संख्या से divide करता है.

      • Formula: (sum of non-NaN values) / (count of non-NaN values)

    • NaN handling: अगर column में np.nan (missing) हो तो Pandas उनसे ignore कर देता है (वे denominator में शामिल नहीं होते)।

    • Return type: एक single numeric value (float) मिलता है।


    4️⃣ heart_rate_mean =

    • क्या कर रहा है: .mean() से जो numeric result आया उसे heart_rate_mean नाम के variable में store कर रहा है।

    • क्यों जरूरी: बाद में उसी value को reuse या print करने के लिए store करते हैं।


    5️⃣ print(heart_rate_mean)

    • क्या कर रहा है: screen/console पर heart_rate_mean की value दिखा रहा है।

    • Output example:

      80.75

      (यह मान उपरोक्त sample [72,85,NaN,90,76] के लिए है — अर्थात (72+85+90+76)/4 = 80.75)


    🔎 Full Flow (एक साथ)

    1. df["Heart_Rate"] → column select (Series)

    2. .mean() → selected Series का average compute (ignoring NaN)

    3. result assign → heart_rate_mean में store

    4. print(...) → result console पर दिखाओ


    ✅ Short Notes (Quick)

    • df["Col"].mean() = column का average निकालता है।

    • Missing values (NaN) स्वतः ignore होते हैं।

    • Output float होता है; अगर सब integer हों तो भी float मिलेगा (





    📘 Pandas – Missing Values को Mean से Fill करना (Step-By-Step Notes)



    Code:

    import numpy as np

    import pandas as pd

    data = {

    "Heart_Rate": [72, 85, np.nan, 90, 76],

    "Blood_Pressure": [120, 130, 125, np.nan, 118]

    }

    df = pd.DataFrame(data)

    heart_rate_mean = df['Heart_Rate'].mean()

    df['Heart_Rate'] = df['Heart_Rate'].fillna(heart_rate_mean)

    print(df)



    🧾 1) NumPy Import

    import numpy as np

    ✔ क्या हो रहा है?

    • NumPy लाइब्रेरी को Python में load किया जा रहा है।

    • Short name np दिया जा रहा है ताकि बार-बार पूरा नाम न लिखना पड़े।

    ❓ क्यों ज़रूरी है?

    • Dataset में missing values को represent करने के लिए हम np.nan का उपयोग करते हैं।

    • NumPy maths operations में भी मदद करता है।


    🧾 2) Pandas Import

    import pandas as pd

    ✔ क्या हो रहा है?

    • Pandas लाइब्रेरी load हो रही है और alias pd दिया जा रहा है।

    ❓ क्यों ज़रूरी है?

    • DataFrame create करने, modify करने, और analysis करने के लिए Pandas सबसे powerful tool है।


    🧾 3) Dataset बनाना (Dictionary Format)

    data = { "Heart_Rate": [72, 85, np.nan, 90, 76], "Blood_Pressure": [120, 130, 125, np.nan, 118] }

    ✔ क्या हो रहा है?

    • एक Python Dictionary बना रहे हैं।

    • Keys = Column Names

      • "Heart_Rate"

      • "Blood_Pressure"

    • Values = Lists

      • हर list एक complete column को represent करती है।

    ❓ ध्यान देने वाली बात

    • np.nan = Missing value (खाली data या unavailable value)

    • Real world datasets में missing values आम होती हैं।


    🧾 4) Dictionary → DataFrame Conversion

    df = pd.DataFrame(data)

    ✔ क्या हो रहा है?

    • Dictionary को Pandas DataFrame में convert किया जा रहा है।

    ❓ परिणाम?

    • एक तालिका (table) बन गई:

    IndexHeart_RateBlood_Pressure
    072120
    185130
    2NaN125
    390NaN
    476118

    अब इस पर analysis और cleaning operations कर सकते हैं।


    🧾 5) Heart_Rate Column का Mean निकालना

    heart_rate_mean = df['Heart_Rate'].mean()

    ✔ क्या हो रहा है?

    • df['Heart_Rate'] → Heart_Rate column को select कर रहा है।

    • .mean() → उसका average निकाल रहा है।

    ❓ Missing Value Handling

    Pandas mean निकालते समय NaN को अपने-आप ignore कर देता है।

    📌 Example Calculation

    Valid values = 72, 85, 90, 76
    Sum = 323
    Count = 4
    Mean = 80.75

    ❓ परिणाम कहाँ गया?

    • यह value variable heart_rate_mean में store हो गई।


    🧾 6) Missing Value को Mean से भरना

    df['Heart_Rate'] = df['Heart_Rate'].fillna(heart_rate_mean)

    ✔ क्या हो रहा है?

    • .fillna(heart_rate_mean) → उस जगह mean भरता है जहाँ NaN था।

    • df['Heart_Rate'] = ... → modified column वापस DataFrame में overwrite कर देता है।

    ❓ फायदा?

    • अब Heart_Rate column में कोई missing value नहीं रहेगी।

    • इसे Mean Imputation कहते हैं।


    📌 Topic: Missing Values को Median से Replace करना


    Code:

    Heart_rate_median = df['Heart_Rate'].median() df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median) print(df)

    📘 Pandas – Handling Missing Data

    (Replacing Missing Values in a Column Using Median)


    🧾 ✅ Full Code

    import numpy as np import pandas as pd data = { "Heart_Rate": [72, 85, np.nan, 90, 76], "Blood_Pressure": [120, 130, 125, np.nan, 118] } df = pd.DataFrame(data) Heart_rate_median = df['Heart_Rate'].median() df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median) print(df)


    📌 Step-by-Step Explanation Notes 🔷 1️⃣ Importing Libraries import numpy as np import pandas as pd 👉 Explanation numpy (np) → Numerical calculations और missing values (np.nan) handle करने के लिए pandas (pd) → DataFrame बनाने और data analysis करने के लिए 🔷 2️⃣ Creating a Dataset data = { "Heart_Rate": [72, 85, np.nan, 90, 76], "Blood_Pressure": [120, 130, 125, np.nan, 118] } 👉 Explanation data एक Python dictionary है इसमें दो columns हैं: "Heart_Rate" "Blood_Pressure" np.nan का मतलब है — Missing / Not Available value 🔷 3️⃣ Creating a DataFrame df = pd.DataFrame(data) 👉 Explanation Dictionary को DataFrame में convert किया गया df अब तालिका (table) जैसी structured data format में बदल गया 🔷 4️⃣ Calculating Median of Heart Rate Heart_rate_median = df['Heart_Rate'].median() 👉 Explanation df['Heart_Rate'] → इस column को select करता है .median() → Missing values को ignore करके Heart Rate का मध्य (median) value निकालता है Result को Heart_rate_median variable में store किया गया 🔷 5️⃣ Replacing Missing Values df["Heart_Rate"] = df["Heart_Rate"].fillna(Heart_rate_median) 👉 Explanation (Very Important) इस line में दो काम हो रहे हैं: ✔ Right Side df["Heart_Rate"].fillna(Heart_rate_median) Missing values को median से temporarily replace करता है ✔ Left Side df["Heart_Rate"] = ... Updated values को वापस उसी column में assign करता है इसलिए अब DataFrame permanently update हो जाता है Note: अगर हम सिर्फ fillna() लिख देते तो output temporary होता, DataFrame नहीं बदलता। 🔷 6️⃣ Displaying Final Data print(df) 👉 Explanation Final updated DataFrame print होगा अब missing Heart Rate value median से replace हो चुकी है

    🧠 Final Summary

    StepWhat happens
    Create dictionaryRaw data store होता है
    Convert to DataFrameTable format मिलता है
    Calculate medianMissing values को fill करने के लिए
    fillna() + assignmentDataFrame permanently update
    print(df)Final clean dataset दिखता है
    NOTE: WILL PREFER MEDIAN INSTEAD OF MEAN


    >> How to drop

    📘 Pandas – Handling Missing Data

    df.dropna(axis=0)

    (Removing Rows Containing Missing Values)


    🧾 Code

    df.dropna(axis=0)

    📌 Step-by-Step Explanation Notes (Hindi)


    🔷 1️⃣ dropna() क्या करता है?

    • dropna() function DataFrame से missing (NaN) values वाली entries हटाने के लिए इस्तेमाल होता है।

    • 🔷 2️⃣ axis=0 का मतलब

      • axis=0 → Rows पर operation
        यानी जिन rows में एक भी NaN value होगी, वो row हट जाएगी।

    • 🔷 3️⃣ अगर DataFrame ऐसा हो:

      Heart BP 72 120 85 130 NaN 125 90 NaN 76 118

      तो:

      df.dropna(axis=0)

      Output होगा:

      Heart BP 72 120 85 130 76 118

      क्योंकि:

    • जिन rows में missing values (NaN) थीं → वो delete हो गईं।

    🔷 4️⃣ Important Point

    dropna() original DataFrame को change नहीं करता, जब तक हम:

    inplace=True

    नहीं लगाते।

    Example:

    df.dropna(axis=0, inplace=True)

    अब DataFrame permanently update हो जाएगा।


    ⭐ Final Summary

    PartMeaning
    dropna()Missing data हटाने वाला function
    axis=0Rows हटाता है
    axis=1Columns हटाता है
    inplace=Falseसिर्फ result दिखाता है
    inplace=TrueDataFrame permanently बदल देता है

    📘 Pandas – Detecting Duplicate Rows

    df.duplicated()

    (Duplicate Rows को पहचानने का तरीका)


    🧾 Code

    df.duplicated()

    📌 Step-by-Step Explanation Notes (Hindi)


    🔷 1️⃣ duplicated() क्या करता है?

    • यह function पूरे DataFrame को row-by-row scan करता है

    • और बताता है कि कौन सी rows duplicate (dupe) हैं

    • Output में एक Boolean Series मिलता है:

    ValueMeaning
    FalseRow unique है
    TrueRow duplicate है

    📘 Pandas – Merging Two DataFrames

    pd.merge(data1, data2, on="customer_id", how="inner")


    🧾 Code

    merge_df = pd.merge(data1, data2, on="customer_id", how="inner")

    🔷 Step-by-Step Explanation (Hindi)

    pd.merge(...)

    • Pandas का function है

    • दो DataFrames को एक common column के आधार पर जोड़ता है

    • SQL के JOIN जैसा काम करता है

    data1, data2

    • ये दो DataFrames हैं जिन्हें merge किया जा रहा है

    on = "customer_id"

    • इस parameter से बताया जाता है कि
      किस common column पर merge करना है

    मतलब:

    • दोनों DataFrames में customer_id column होना चाहिए

    • इसी column के matching values के आधार पर rows मिलाई जाएँगी

    how = "inner"

    • यह join का type है

    • inner join का मतलब:

    मतलब:

    • सिर्फ matching rows ही result में मिलेंगी

    • ⑤ Output कहाँ store हो रहा है?

      merge_df = ...
      • Merge के बाद जो final DataFrame बनता है

      • उसे merge_df नाम के variable में save कर दिया गया

    ⭐ Final Summary

    PartMeaning
    pd.merge()Two DataFrames ko join करता है
    data1, data2वो दो datasets जिनको merge किया गया
    on="customer_id"किस column पर join करना है
    how="inner"सिर्फ same/matching values वाली rows आएँगी
    merge_dfOutput variable    

    📘 Pandas – Merging Two DataFrames (left_on & right_on)

    ✅ Code

    merged_df = pd.merge( data1, data2, left_on="customer_id", right_on="customer_id", how="inner" )

    🔷 Step-by-Step Explanation (Hindi)


    merged_df = ...

    • Merge का final output एक नए variable
      merged_df में store किया जा रहा है।

    • इस variable को बाद में print या analyze कर सकते हैं।


    pd.merge(...)

    • Pandas का function जो दो DataFrames को जोड़ता है

    • SQL JOIN जैसा behavior करता है।


    data1, data2

    • ये दो DataFrames हैं जिन्हें merge किया जा रहा है।


    left_on="customer_id"

    • बताता है कि पहले DataFrame (data1) में कौन सा column matching के लिए उपयोग होगा।


    right_on="customer_id"

    • बताता है कि दूसरे DataFrame (data2) में कौन सा column matching के लिए उपयोग होगा।


    🔔 Why use left_on and right_on?

    • तब उपयोग होता है जब:

    DataFrameColumn Name
    data1customer_id
    data2cust_id

    लेकिन इस example में दोनों के नाम same हैं, फिर भी यह लिखना allowed है।


    how="inner"

    • Join का type बताता है।

    • inner join मतलब:

    Row को Output में रखा जाएगाजब?
    customer_id दोनों DataFrames में match करे

    अगर कोई value सिर्फ एक DataFrame में है → वो result में नहीं आएगी।


    ⭐ Final Summary (One Shot Table)

    PartMeaning
    pd.merge()Two DataFrames को join करता है
    left_onपहले DF में किस column से join करना है
    right_onदूसरे DF में किस column से join करना है
    how="inner"सिर्फ matching rows आएँगी
    merged_dfFinal output DataFrame

    1️⃣ pd.concat() – क्या करता है?

    concat() का इस्तेमाल दो या अधिक DataFrames को ऊपर-नीचे (row-wise) या बगल-बगल (column-wise) जोड़ने के लिए होता है।

    ✔ Syntax

    pd.concat([df1, df2], axis=0)

    ✔ axis मतलब क्या?

    • axis=0 → Rows नीचे जोड़ता है (default)

    • axis=1 → Columns साइड में जोड़ता है

    📘 pd.concat() Example

    import pandas as pd df1 = pd.DataFrame({ "ID": [1,2,3], "Name": ["A","B","C"] }) df2 = pd.DataFrame({ "ID": [4,5,6], "Name": ["D","E","F"] }) result = pd.concat([df1, df2], axis=0) print(result)

    🧠 क्या होता है?

    • df1 और df2 की rows एक के नीचे एक जुड़ जाती हैं।

    • Columns same रहने चाहिए (अगर अलग होंगे तो missing जगहों पर NaN आएगा)


    📝 pd.concat() Notes

    • यह SQL में “UNION” जैसा काम करता है

    • DataFrame की shape बढ़ाता है

    • Default row-wise जोड़ता है

    • axis=1 करने पर column-wise जोड़ता है

    🚀 2️⃣ Merge() – क्या करता है?

    pd.merge() दो DataFrames को common column / key पर जोड़ता है।

    SQL में:

    • INNER JOIN

    • LEFT JOIN

    • RIGHT JOIN

    • FULL JOIN

    जैसे होते हैं, वैसे ही Pandas में merge होता है।


    📘 Syntax

    pd.merge(df1, df2, on="column_name", how="inner")

    ✔ how के options:

    howक्या करता है
    inner (default)Common values वाली rows ही रखता है
    leftLeft DF की सारी rows रखता है
    rightRight DF की सारी rows रखता है
    outerDono DF की सारी rows रखता है

    📘 Example

    merged_df = pd.merge(df1, df2, on="customer_id", how="inner")

    🧠 क्या हो रहा है?

    • customer_id दोनों tables में common है

    • सिर्फ वही rows आएंगी जिनका customer_id दोनों में same है

    🔗 3️⃣ DataFrame.join() – क्या करता है?

    join() भी DataFrames जोड़ता है लेकिन:

    • Default index पर join होता है

    • Common column पर join करने के लिए पहले index set करना पड़ता है

    📘 Example

    df1.join(df2)

    🧠 कब use करें?

    • जब दोनों DataFrames में index meaningful हो

    • जब आपको quickly side-by-side tables जोड़नी हों

    📝 join() Notes

    • Default index आधारित join

    • SQL LEFT JOIN जैसा

    • Column based join करने के लिए:

      df1.set_index("id").join(df2.set_index("id"))

    ⭐ FINAL COMPARISON TABLE

    OperationSimilar to SQLJoin BasisOutput Shape
    pd.concat()UNIONNo key neededRows/Columns बढ़ते हैं
    merge()INNER/LEFT/RIGHT/OUTER JOINColumn (key) basedMatched / Unmatched पर depend
    join()LEFT JOINIndex basedColumn-wise merge

    📘 Pandas – Pivot Table Code Explanation (Step-By-Step Notes)

    🔹 Goal (उद्देश्य)

    इस code में:

    • एक sales dataset बनाया गया है

    • Total Sale निकाला गया है

    • Pivot Table बनाकर यह देखा गया है कि
      कौन से salesperson ने कितने products बेचे

    1️⃣ Importing Pandas Library

    import pandas as pd

    📌 Explanation:

    • Python में Pandas library को use करने के लिए import किया गया।

    • Pandas data analysis के लिए सबसे important library है (DataFrame बनाने, filters, pivot table आदि के लिए).

    2️⃣ Creating the Dataset

    data = { "Order_ID": [101,102,103,104,105,106,107,108,109,110], "Date": ["2025-01-10","2025-01-11","2025-01-12","2025-01-13","2025-01-14", "2025-01-15","2025-01-16","2025-01-17","2025-01-18","2025-01-19"], "Region": ["North","South","East","North","West","South","West","East","North","South"], "Salesperson": ["Amit","Rohan","Suman","Amit","Neha","Rohan","Neha","Suman","Amit","Rohan"], "Product": ["Laptop","Mobile","Laptop","Tablet","Mobile","Laptop","Tablet","Mobile","Mobile","Tablet"], "Quantity": [2,5,1,3,4,2,6,3,4,2], "Price": [55000,15000,55000,12000,15000,55000,12000,15000,15000,12000], }

    📌 Explanation:

    • data एक Python Dictionary है

    • Dictionary में columns keys की तरह हैं
      जैसे "Order_ID", "Salesperson"

    • हर key की value एक list है, जो उस column का data है

    3️⃣ Converting Dictionary to DataFrame

    df = pd.DataFrame(data)

    📌 Explanation:

    • Dictionary को Pandas DataFrame में convert किया गया

    • DataFrame एक excel sheet जैसा table structure है

    • df variable में पूरा data store है

    4️⃣ Creating a New Column – Total Sale

    df["Total_Sale"] = df["Quantity"] * df["Price"]

    📌 Explanation:

    • एक नया column Total_Sale बनाया गया

    • हर order का calculation:

    • Example:
      2 × 55000 = 110000

    DataFrame में अब एक नया column add हो गया।

    5️⃣ Creating Pivot Table

    pivot = pd.pivot_table( df, values="Order_ID", index="Salesperson", columns="Product", aggfunc="count" )

    🔎 Explanation Line-By-Line

    pd.pivot_table()

    • Pandas का function जो data को summarize करता है

    • Rows और Columns में grouping करके summary देता है

    values="Order_ID"

    • किस column पर calculation करनी है

    • यहाँ Order की गिनती करनी है

    index="Salesperson"

    • Pivot table की rows का grouping Salesperson के अनुसार होगी

    columns="Product"

    • Columns में products दिखाई देंगे

      • Laptop

      • Mobile

      • Tablet

    aggfunc="count"

    • Calculation method = Count

    • मतलब:
      किस salesperson ने कौन सा product कितनी बार बेचा



    6️⃣ Printing Output

    print(dfpivot)

    📘 Notes – Laptop Sales Count using Groupby & Filter

    🧠 Code

    df[df["Product"] == "Laptop"].groupby("Salesperson")["Order_ID"].count()

    df

    • df हमारा पूरा DataFrame है जिसमें सभी sales records हैं।

    • इसी पर हम filtering और grouping करेंगे।

    df["Product"] == "Laptop"

    • यह हर row चेक करती है कि product "Laptop" है या नहीं।

    • Output एक Boolean Series:

    0 True 1 False 2 True

    df[df["Product"] == "Laptop"]

    • यह Boolean Series का उपयोग करके DataFrame को filter करता है।

    • अब हमारे पास वही rows बचती हैं जिनका product "Laptop" है।

    Filtered Data कुछ ऐसा दिखेगा:

    Order_ID | Product | Salesperson 101 Laptop Amit 103 Laptop Suman 106 Laptop Rohan

    .groupby("Salesperson")

    • अब filtered rows को Salesperson के आधार पर groups में विभाजित कर देता है

    Amit → उसकी Laptop sales Rohan → उसकी Laptop sales Suman → उसकी Laptop sales

    NOTE: what we need to group by = groupby, aggregated column, agg function
    Aggregation we can use = max/min/count/sum/var/mean/median

    .sum()

    🔍 Meaning

    • अब हर Salesperson के laptop orders के Order_ID का sum किया जा रहा है

    • अगर Order_ID numeric है → total sum

    • अगर numeric नहीं है → IDs concatenate हो सकते हैं (मतलब meaningful नहीं रहेगा)

    📌 Pandas: Converting to Date Using pd.to_datetime()

    (Date column को सही Date format में बदलने के लिए)

    🧠 Syntax

    pd.to_datetime(df["Date"])

    ✔ Step-by-Step Explanation

    df["Date"]

    • यह DataFrame df से Date नाम का column लेता है

    • इस column में values text/string format में होती हैं।

    • Example:

      "2025-01-10" "2025-01-11"

    pd.to_datetime(...)

    • Pandas का built-in function है।

    • इसका काम है:

    किसी भी date वाले string column को
    “proper datetime format” में convert करना।

    मतलब text → real timestamp,
    जिसे Python date की तरह समझ सके।


    ③ Return क्या करता है?

    यह एक Pandas Series लौटाता है, जिसमें:

    ✔ Year
    ✔ Month
    ✔ Day
    ✔ Time (if available)

    properly parse हो जाते हैं।


    📊 Example

    Input Column

    "2025-01-10" "2025-01-11" "2025-01-12"

    Output after conversion

    2025-01-10 00:00:00 2025-01-11 00:00:00 2025-01-12 00:00:00

    ⭐ क्यों ज़रूरी है?

    क्योंकि datetime format में convert करने के बाद:

    ✔ Sorting सही होती है
    ✔ Filtering possible (e.g., df[df["Date"] > '2025-01-15'])
    ✔ Week, Month, Year निकाल सकते हैं
    ✔ Time-based analysis कर सकते हैं (resample(), groupby() आदि)

    🗓 Pandas: Extracting Year from Date Column

    Code:

    pd.to_datetime(df["Date"]).dt.year

    ✔ Step-by-Step Explanation


    df["Date"]

    • यह DataFrame df से Date column चुनता है।

    • इस column में dates अभी string/text format में हैं।

    • Example:

      "2025-01-10" "2025-03-12"

    pd.to_datetime(df["Date"])

    • Pandas function जो string dates को proper datetime format में convert करता है।

    • अब Pandas इन्हें date के रूप में समझ सकता है

    Example conversion:

    "2025-01-10"2025-01-10 00:00:00

    .dt.year

    • .dt Pandas का Date/Time accessor है।

    • .year उससे year extract करता है।

    मतलब:

    2025-01-10 2025

    🧠 Final Output क्या होगा?

    मान लो Date column ऐसा है:

    Date
    2025-01-10
    2024-03-12
    2023-08-05

    तो code produce करेगा:

    Year
    2025
    2024
    2023

    ⭐ क्यों इस्तेमाल करते हैं?

    इससे हम:

    ✔ Year-wise Analysis
    ✔ Year-wise Grouping
    ✔ Time-based filtering
    ✔ Trend visualization

    आसानी से कर सकते हैं।


    📌 Short Notes Summary (Exam Style)

    • pd.to_datetime() → Date column को real datetime format में बदलता है

    • .dt → Date related properties access करने के लिए

    • .year → केवल year निकालने के लिए

    📌 df.sort_index() – Kya karta hai?

    sort_index() Pandas DataFrame ke index ko ascending order me sort (क्रमबद्ध) karta hai.

    Kaise kaam karta hai?

    👉 Step 1:

    df – Aapka DataFrame.

    👉 Step 2:

    .sort_index()
    Index (row labels) ko chhota → bada order me arrange kar deta hai.

    ⭐ Important Points

    ✔ Default ascending = True

    Yaani index upar se neeche chhote → bade order me sort hota hai.

    ✔ Only index sort hota hai

    Columns ke values change nahi hote, sirf ordering change hoti hai.

    ✔ Descending chahiye?

    df.sort_index(ascending=False)

    📘 Notes – pd.date_range()


    🧠 pd.date_range() Kya Hai?

    Pandas ka function hai jo ek date/time ka continuous sequence generate karta hai.

    🧾 Code

    pd.date_range(start='2020-01-01', periods=20, freq='H')

    🧩 Step-by-Step Explanation

    ✔️ 1️⃣ pd.date_range()

    • Pandas ka function

    • Continuous date/time values banane ke kaam aata hai


    ✔️ 2️⃣ start='2020-01-01'

    • Sequence ka starting point

    • Default time: 00:00:00

    Start timestamp:

    2020-01-01 00:00:00

    ✔️ 3️⃣ periods=20

    • Kitne timestamps generate karne hain

    • Total 20 values banenge


    ✔️ 4️⃣ freq='H'

    • Frequency = Hourly

    • Har value ke beech 1 hour ka gap


    Expected Output Pattern

    2020-01-01 00:00:00 2020-01-01 01:00:00 2020-01-01 02:00:00 ... 2020-01-01 19:00:00

    Total = 20 timestamps


    Quick Summary Table

    ParameterMeaning
    startStarting date/time
    periodsTotal number of timestamps
    freqFrequency of intervals

    🧠 Common Frequency Codes

    CodeMeaning
    DDaily
    HHourly
    WWeekly
    MMonth-end
    MSMonth-start
    YYearly
    SSeconds


    📘 Python Notes – Adding 2 Months in Date (Using timedelta)

    📌 Code

    # all dates are reported 2 month extra from datetime import datetime, timedelta df["Date"] = pd.to_datetime(df["Date"]) future_date_after_2mnth = df["Date"] + timedelta(days=60) print(future_date_after_2mnth)

    📝 Step-by-Step Explanation (Hindi + English)


    1️⃣ from datetime import datetime, timedelta

    • datetime module ko import kar rahe hain

    • timedelta ek class hai jo time difference batati hai

    • Isse hum days, seconds, hours, weeks add ya subtract kar sakte hain


    2️⃣ df["Date"] = pd.to_datetime(df["Date"])

    • DataFrame ki Date column ko string se real Date/Time format me convert karta hai

    • Taaki Python usse date ke form me samajh sake

    • Jitne bhi date operations karne hain (add, subtract, extract), sab iske baad possible hote hain

    Example:

    "2025-01-10"2025-01-10 00:00:00 (datetime format)

    3️⃣ timedelta(days=60)

    • timedelta ka use karke 60 days add karne ka object banaya hai

    • Yaha hum मान रहे हैं:

      2 months ≈ 60 days

    ध्यान दो: timedelta months directly add नहीं करता, इसलिए hum “60 days” use कर रहे हैं.


    4️⃣ future_date_after_2mnth = df["Date"] + timedelta(days=60)

    • Yeh line DataFrame ki har date ke saath 60 days add kar deti hai

    • Example:

      Old Date: 2025-01-10 +60 Days = 2025-03-11
    • Result ek new series banata hai jisme updated future dates milti hain


    5️⃣ print(future_date_after_2mnth)

    • New dates ko screen par show karta hai


    📌 Final Understanding

    StepKya Kiya
    Importdatetime & timedelta import
    ConvertDate column ko datetime format me convert
    Create Difference60 days ka time difference object banaya
    ApplyHar date me 60 days add kiye
    PrintNew future dates display ki

    Important Concept

    • timedelta sirf days, seconds, weeks handle karta hai

    • Months add karne ke liye better hota hai:

      pd.DateOffset(months=2)

    📘 Pandas: Date Column me 2 Months Add Karna

    df["Date"] = df["Date"] + pd.DateOffset(months=2) print(df["Date"])

    🔍 Step-by-Step Explanation


    df["Date"]

    • df humara DataFrame hai.

    • df["Date"] DataFrame ki Date column ko select karta hai.

    • Yaha sabhi existing dates store hoti hain.

    Example:

    2025-01-10 2025-01-11 2025-01-12

    pd.DateOffset(months=2)

    • DateOffset() pandas ka ek function hai.

    • Ye batata hai ki date me kitna time add karna hai.

    • Yaha months = 2 diya hai, matlab 2 months add karne hain, na ki days.

    Agar:

    2025-01-10

    Then after adding 2 months:

    2025-03-10

    Days same rahenge, bas month 2 badh jayega.


    df["Date"] = df["Date"] + pd.DateOffset(months=2)

    • Yaha hum:

      • purani date column me 2 months add kar rahe hain

      • aur result ko dobara Date column me overwrite (update) kar rahe hain

    • Matlab:

      old date+2 months → new date

      print(df["Date"])

      • Ye sirf updated Date column ko output me print karta hai.

      • Output me sab dates 2 month badhi hui dikhengi.


    ✔ Why DateOffset Instead of timedelta?

    • timedelta days me kaam karta hai.

    • DateOffset months/year me bhi kaam karta hai, isliye zyada accurate hota hai.


    Data Visualisation ----> Data Analysis ----------> Insights/Patterns


    NOTE: we've 3 options to viualisation in python


    import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

    import pandas as pd

    • pandas को import करके उसका short name pd रख रहे हैं

    • Ye mainly data analysis & dataframe handling ke लिए use होता है

    import seaborn as sns

    • seaborn ek advanced data visualization library है

    • sns naam se short form me use karte हैं

    import matplotlib.pyplot as plt

    • यह basic graph plotting library है

    • plt नाम से use करते हैं


    Line Plot - Trend Analysis Using Pandas (Example: Blood Pressure)

    Purpose of Line Plot:

    • Data me trend analysis ke liye use hota hai.

    • Data ke ups and downs, pattern aur time-based changes ko visualize karne ke liye best.

    • Example: Blood pressure, temperature, sales, stock price.

    1. Library Import Karna (Function & Technique)

    import pandas as pd import matplotlib.pyplot as plt

    Explanation:

    • pandasData manipulation aur analysis ke liye library.

    • matplotlib.pyplotGraph aur visualization ke liye library.

    • Technique: Library import kar ke plotting aur data handling ka setup ready karna.


    Line Plot Banana (Function & Technique)

    Code:
    df["Blood_Pressure"].plot(kind="line")

    Explanation:

    • .plot() → Pandas ka built-in plotting function.

    • kind="line"Line chart create karta hai.

    • Technique: Time series ya trend analysis ke liye line chart ka use.

    Result:

    • X-axis → Index (row numbers ya time points)

    • Y-axis → Blood Pressure values

    • Line se trend aur ups-downs visualize hote hain


    📘 Python Notes – Line Plot for Trend Analysis (Blood Pressure)

    📌 Code

    df["Blood_Pressure"].plot( kind="line", figsize=(5,3), xlabel="Patient Id", ylabel="Blood Pressure", title="Trend of Blood Pressure", linestyle="--", marker="o", grid=True )

    📝 Step-by-Step Explanation (Hindi + English)

    ✔ 1️⃣ df["Blood_Pressure"]

    • Blood_Pressure column ko select kiya

    • Technique: Relevant column extract karna jo plot me use hoga

    ✔ 2️⃣ .plot(kind="line")

    • .plot() → Pandas ka built-in plotting function

    • kind="line"Line chart create karta hai

    • Technique: Trend aur ups-downs visualize karna

    ✔ 3️⃣ figsize=(5,3)

    • Graph ka size set karta hai (width=5 inch, height=3 inch)

    • Technique: Plot ko readable aur compact banane ke liye size adjust karna

    ✔ 4️⃣ xlabel="Patient Id"

    • X-axis ka label set karta hai

    • Technique: Axis ka meaning clearly batana

    ✔ 5️⃣ ylabel="Blood Pressure"

    • Y-axis ka label set karta hai

    • Technique: Axis ka meaning clearly batana

    ✔ 6️⃣ title="Trend of Blood Pressure"

    • Graph ka heading set karta hai

    • Technique: Graph ka summary aur purpose show karna

    ✔ 6️⃣ linestyle="--"

    • Line ko dashed style dene ke liye

    • Technique: Trend ko visually distinct aur readable banana

    • Common Linestyles in Python/Matplotlib:

      StyleDescriptionExample
      "-"Solid linelinestyle='-'
      "--"Dashed linelinestyle='--'
      "-."Dash-dot linelinestyle='-.'
      ":"Dotted linelinestyle=':'

    ✔ 7️⃣ marker="o"

    • Line ke har data point par circle marker lagana

    • Technique: Points ko highlight karke trend ko clearly dikhana

    ✔ 8️⃣ grid=True

    • Graph par grid lines add karta hai

    • Technique: Data ka comparison aur reading easy banane ke liye

    📌 Final Understanding

    StepKya Kiya
    Column SelectBlood_Pressure column select kiya
    Plot Type.plot(kind="line") → Line chart create kiya
    Figure Sizefigsize=(5,3) → Graph ka size set kiya
    X-axis Labelxlabel="Patient Id" → X-axis ko label diya
    Y-axis Labelylabel="Blood Pressure" → Y-axis ko label diya
    Titletitle="Trend of Blood Pressure" → Graph ka heading set kiya 

    Important Concept

    • Line plot se trend aur pattern easily visualize hota hai

    • figsize, xlabel, ylabel, title → Graph ko informative aur presentation-ready banate hain

    • Pandas .plot() → Quick plotting ke liye ideal





    📘 Python Notes – Line Plot using Matplotlib (plt.plot)

    📌 Code

    import matplotlib.pyplot as plt # Simple line plot plt.plot(df["Blood_Pressure"]) plt.show()

    📝 Step-by-Step Explanation (Hindi + English)

    ✔ 1️⃣ import matplotlib.pyplot as plt

    • Matplotlib library ke pyplot module ko import kar rahe hain

    • plt alias use karke graph plotting ke functions call kar sakte hain

    • Technique: Python me plotting environment setup karna


    ✔ 2️⃣ plt.plot(df["Blood_Pressure"])

    • plt.plot() → Matplotlib ka basic line plotting function

    • df["Blood_Pressure"] → Line plot ke liye data series pass kiya

    • Technique: Trend ya pattern visualize karne ke liye line chart create karna

    Example:
    Index → 0, 1, 2, …
    Blood Pressure → 120, 125, 118, …
    Plot → Line ke through data ke ups-downs dikhte hain


    ✔ 3️⃣ plt.show()

    • Graph ko screen par display karne ke liye use hota hai

    • Technique: Graph ko render aur visual output ke liye finalize karna


    📌 Optional Customizations (Matplotlib ke saath)

    plt.figure(figsize=(5,3)) plt.plot(df["Blood_Pressure"], color='red', linestyle='--', marker='o') plt.title("Blood Pressure Trend") plt.xlabel("Patient Index") plt.ylabel("Blood Pressure") plt.grid(True) plt.show() color='red' → Line ka color set karna linestyle='--' → Dashed line marker='o' → Data points highlight title, xlabel, ylabel → Graph ko informative banana grid=True → Grid lines add karna 📌 Final Understanding Step Function/Parameter Purpose Import import matplotlib.pyplot as plt Graph plotting environment setup Plot Data plt.plot(df["Blood_Pressure"]) Line chart create karna Display plt.show() Graph screen par show karna Optional color, linestyle, marker, title, xlabel, ylabel, grid Graph ko readable aur informative banane ke liye ⭐ Important Concept plt.plot() → Matplotlib ka most basic line plotting function .show() → Graph ko display karna mandatory hai. Ise hamesha last me likhte hai Advanced customization ke liye color, linestyle, marker, labels, grid use kiya ja sakta hai


    📌 Histogram Plot Using Pandas– df["Treatment_Cost"].plot(kind="hist")

    df["Treatment_Cost"].plot( kind ="hist", figsize = (5,3), bins = [0,5000,1000,1500,2000], #iske bina bhi graph banega but we should use this edgecolor = "black" )

    🟥 1️⃣ Yeh Graph Ka Purpose (Histogram)

    ✔ Histogram Used For:

    • Numerical data ka distribution dekhne ke liye

    • Data kis range me kitna spread hai?

    • Kaunsi values ज्यादा होती हैं और kaunsi कम?

    • Data skewed (tilted) hai ya normal distribution ke jaisa hai?

    Example:
    Hospital data me yeh samajhna:

    • Kitne patients low treatment cost wale hain?

    • Mid range me kitne hain?

    • High cost patients kitne hain?


    🟦 2️⃣ Feature Used – .plot(kind="hist")

    .plot()

    Pandas ka built-in visualization function.

    kind="hist"

    Tells pandas that we want a Histogram.


    🟩 3️⃣ Parameter Explanation

    ✔ (A) figsize = (5,3)

    • Graph window ka size set karta hai.

    • Format → (width, height)

    ✔ (B) bins = [...]

    🔷 Meaning:

    • Bins define karte hain value ranges

    • Har bin me ghar data girta hai → frequency count hota hai

    🔷 Without bins:

    • Pandas automatically bin size choose karega.

    🔷 With custom bins:

    bins = [0, 5000, 1000, 1500, 2000]

    Matlab bins:

    • 0 – 5000

    • 5000 – 1000

    • 1000 – 1500

    • 1500 – 2000

    ⭐ Bins Ka Use:

    • Business analytics me customer segmentation

    • Finance me expense group analysis

    • Medical data me risk grouping


    ✔ (C) edgecolor="black"

    • Har bar ke border ko black color deta hai

    • Graph aur readable ho jata hai


    🟨 4️⃣ Histogram Kaha Use Hota Hai

    ✔ Real-life Use Cases

    1️⃣ Income distribution
    2️⃣ Product price range analysis
    3️⃣ Medical treatment cost grouping
    4️⃣ Age-group analysis
    5️⃣ Student marks distribution
    6️⃣ Transaction amount analysis


    🟫 5️⃣ What Insight We Get

    • Kitne log low-cost treatment wale?

    • Kitne mid-range wale?

    • Kitne high-cost wale?

    • Data ka spread kaisa hai—normal, skewed or uneven?


    ⭐ Final Summary Notes (Short Version)

    TopicExplanation
    Chart TypeHistogram, NOTE: Histogram me sirf count hoga
    PurposeNumerical data distribution check karna, Bucket(Bins) wise count distribution
    LibraryPandas
    Functiondf["column"].plot(kind="hist")
    BinsValue ranges define karta hai
    EdgecolorBars ke border ko highlight karta hai
    Use CasesCost grouping, marks analysis, income distribution



    📘 Python Notes – Histogram Plot using plt.hist() (Matplotlib)


    🧠 1️⃣ What This Code Does?

    Yeh code Cholesterol column ka histogram banata hai, jisme data ko frequency distribution ke form me show kiya ja raha hai:


    plt.hist(df["Cholesterol"], bins = [0,100,200,300], edgecolor = 'black') plt.show()


    🧵 2️⃣ Line-by-Line Explanation (Hindi + English)


    Line 1: plt.hist(...)

    plt.hist(df["Cholesterol"], bins = [0,100,200,300], edgecolor = 'black')

    🔹 Function Used: plt.hist()

    • This is a Matplotlib function
    • Used to draw a Histogram
    • Histogram shows distribution of continuous numerical data
    • Counts how many data points fall into each bin

    🔧 Parameters Explanation

    df["Cholesterol"]

    • Yeh dataset ka column pass kiya gaya hai

    • Jiska histogram banana hai

    bins = [0,100,200,300]

    • Defines the ranges of data groups:

    Used to draw a Histogram Histogram shows distribution of continuous numerical data Counts how many data points fall into each bin

    📘 Python Notes – Bar Chart using value_counts().plot(kind="bar")


    🧠 1️⃣ What This Code Does?

    df["Smoking"].value_counts().plot( kind = "bar") plt.show()

    Yeh code Smoking column ka bar chart banata hai, jisme har category (Yes/No, Smoker/Non-Smoker) ka count graph me dikhaya jata hai.

    🧵 2️⃣ Step-by-Step Explanation (Hindi + English)


    Step 1: df["Smoking"]

    • DataFrame ki Smoking column ko select kiya.

    • Is column me values hoti hain jaise:

      • Yes

      • No

      • Occasional

      • etc.

    Step 2: value_counts()

    df["Smoking"].value_counts()

    🔹 What it does?

    • Counts how many times each value appears.

    • Example:

    SmokingCount
    Yes50
    No150
    Sometimes10

    ✓ Why used?

    • Bar chart categorical frequency ke liye banta hai

    • value_counts() se frequency table mil jata hai

    Step 3: .plot(kind="bar")

    .value_counts().plot(kind="bar")

    🔧 Function Used:

    • Pandas ka built-in plotting function

    • Internally Matplotlib use kar raha hai

    📝 Why Bar Chart?

    Bar chart use hota hai jab:

    • Data categories me ho

    • Un categories ka comparison karna ho

    📘 Python Notes – Bar Chart using plt.bar() (Matplotlib)


    🧠 1️⃣ What This Code Does?

    plt.figure(figsize = (5,4)) plt.bar(df["Gender"].unique(), df["Gender"].value_counts()) plt.show() Yeh code Gender column ka bar chart banata hai jisme: X-axis → Different gender categories (Male/Female etc.) Y-axis → Har category ka count


    🧵 2️⃣ Line-by-Line Explanation (Hindi + English)


    Line 1: plt.figure(figsize=(5,4))

    plt.figure(figsize = (5,4))

    🔹 Meaning

    • Creates a new figure (plot window)

    • figsize = (width, height) in inches

    🔧 Why used?

    • Graph ka size control karne ke liye

    • Default size sometimes small hota hai

    Example:

    • (5,4) → Width 5 inches, Height 4 inches

    Line 2: plt.bar(...)

    plt.bar(df["Gender"].unique(), df["Gender"].value_counts())

    🔍 Function: plt.bar()

    • Matplotlib function

    • Used to draw a Bar Chart


    🔹 Parameter 1: df["Gender"].unique()

    • Returns unique categories in the Gender column

    • Example:

    ["Male", "Female"]

    Ye X-axis labels ban jayenge.


    🔹 Parameter 2: df["Gender"].value_counts()

    • Counts frequency of each gender

    • Example:

    GenderCount
    Male120
    Female80

    Ye Y-axis values ban jayenge.


    ✔ Why unique() + value_counts()?

    • unique() → X-axis ke categories

    • value_counts() → Un categories ka count

    Matlab:

    “Har unique gender kitni baar repeat hua hai — uska bar plot bana rahe hain.”

    📘 Scatter Plot using Pandas Plot Function (Not Matplotlib Directly)Code:

    import pandas as pd import matplotlib.pyplot as plt data = { "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010", "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"], "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil", "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao", "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das", "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"], "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27], "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"], "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid", "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis", "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes", "COPD","Thyroid"], "Blood_Pressure": [ 150, 135, 160, 130, 155, 120, 162, 140, 148, 122, 158, 130, 140, 124, 170, 118, 145, 138, 160, 119 ], "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74], "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165], "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No", "Yes","No","Yes","No","Yes","No"], "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication", "Medication","Angiography","Insulin","Liver Treatment","Pain Management", "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy", "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support", "Medication"], "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800, 4200,3200,28000,4500,98000,2200,29000,5700,32000,2300] } df = pd.DataFrame(data) df.plot(figsize = (5,3), kind = 'scatter', x= 'Age', y = 'Cholesterol', alpha =1) plt.show()

    ① import pandas as pd

    ✔ pandas library को program में लाने के लिए होता है
    ✔ इसे pd नाम से short form में use करते हैं
    ✔ DataFrame बनाने, data clean करने, visualize करने की base यही है



    ② import matplotlib.pyplot as plt

    ✔ matplotlib का plotting module import किया
    ✔ इसे plt नाम से use करते हैं
    ✔ सभी plots को स्क्रीन पर दिखाने के लिए plt.show() जरूरी होता है



    🧩 Data Dictionary बनाना

    ③ data = { … }

    यह एक Python dictionary है।
    इसके अंदर:

    • Keys = column names

    • Values = list of values

    • हर list में 20 values (मतलब हमारी 20 patients की information)

    Example:

    • "Age": [45,52,60,...]

    • "Cholesterol": [240,220,290,...]

    • "Name": [...]

    • "Blood_Pressure": [...]

    👉 यह पूरा dictionary बाद में DataFrame बनने के लिए raw data का काम करता है।


    🔄 Dictionary → DataFrame Convert करना

    ④ df = pd.DataFrame(data)

    ✔ यह line dictionary को DataFrame (table format) में convert करती है
    ✔ अब df एक proper table की तरह behave करेगा
    ✔ Columns: Age, Gender, Blood_Pressure, Cholesterol etc.
    ✔ Rows: P001 से P020 तक patients की जानकारी

    👉 अब हम इस df को plot कर सकते हैं, filter कर सकते हैं, analyse कर सकते हैं।

    ⑤ df.plot(figsize=(5,3), kind="scatter", x="Age", y="Cholesterol", alpha=1)

    ➡ हम यह plot Pandas के plot() function से बना रहे हैं,
    ना कि Matplotlib के plt.scatter() से।

    Parameters Explanation:

    • figsize=(5,3) → plot का size

    • kind="scatter" → Pandas को बोल रहा है कि scatter plot चाहिए

    • x="Age" → X-axis पर Age

    • y="Cholesterol" → Y-axis पर Cholesterol

    • alpha=1 → dots बिल्कुल opaque (no transparency)

    👉 Pandas internally Matplotlib का use करता है, लेकिन command Pandas को दे रहे हैं।


    🖥 Show the plot

    ⑥ plt.show()

    • Plot को screen पर display करता है


    📘 Step-by-Step Notes: Creating Scatter Plot of Age vs Cholesterol using Matplotlib


    1️⃣ Libraries Import karna import pandas as pd import matplotlit.pyplot as plt pandas → Data ko table ya spreadsheet format me manage karne ke liye use hota hai. matplotlib.pyplot → Data ka graphical visualization (charts/plots) banane ke liye use hota hai. Note: Code me typo hai: matplotlit.pyplot should be matplotlib.pyplot. Correct line: import matplotlib.pyplot as plt 2️⃣ Data Create karna data = { "Patient_ID": ["P001","P002", ... ,"P020"], "Name": ["Rahul Verma","Anita Singh", ... ,"Kiran Solanki"], "Age": [45,52,60,...,27], "Gender": ["M","F",...,"F"], "Disease": ["Hypertension","Diabetes",...,"Thyroid"], "Blood_Pressure": [150,135,...,119], "Heart_Rate": [88,90,...,74], "Cholesterol": [240,220,...,165], "Smoking": ["No","No",...,"No"], "Treatment": ["Medication","Insulin",...,"Medication"], "Cost": [3500,5500,...,2300] } Yaha humne Python dictionary data create kiya hai jisme columns ke naam keys hai aur unke values list ke form me diye gaye hain. Ye table me 20 patients ka health data represent karta hai. Columns ka matlab: Patient_ID → unique ID har patient ke liye Name → patient ka naam Age → patient ki age Gender → Male/Female Disease → diagnosed health issue Blood_Pressure → systolic BP value Heart_Rate → beats per minute Cholesterol → cholesterol level Smoking → Yes/No Treatment → treatment type Cost → treatment ka cost in INR         3️⃣ DataFrame me Convert karna
    df = pd.DataFrame(data) pd.DataFrame() → dictionary ya list ko table format me convert kar deta hai. df ab ek DataFrame object hai jise hum easily analyze aur visualize kar sakte hain. Example view: Patient_ID Name Age Gender Disease ... Cost P001 Rahul Verma 45 M Hypertension ... 3500 P002 Anita Singh 52 F Diabetes ... 5500 ... ... ... ... ... ... ... 4️⃣ Scatter Plot Banana python Copy code plt.scatter(x=df["Age"], y = df["Cholesterol"]) plt.scatter() → scatter plot banata hai, jisme points x-y plane me dikhaye jaate hain. Parameters: x=df["Age"] → x-axis me Age values plot hongi y=df["Cholesterol"] → y-axis me Cholesterol values plot hongi Scatter plot se hum Age vs Cholesterol ke relation ko visualize kar sakte hain: Agar points upar ja rahe hain → age badhne par cholesterol badh raha hai Agar points scatter hai → relation strong nahi hai 5️⃣ Figure Size Set karna python Copy code plt.figure(figsize=(5,3)) plt.figure() → plot ka canvas set karta hai figsize=(5,3) → width=5 inches, height=3 inches Note: Ye line ko scatter ke pehle likhna better hota hai, warna size plot me effect nahi karega. 6️⃣ Axis Labels Add karna python Copy code plt.xlabel("Age") plt.ylabel("Cholesterol") plt.xlabel() → x-axis label plt.ylabel() → y-axis label Ye chart ko readable aur meaningful banata hai 7️⃣ Plot Show Karna python Copy code plt.show() Ye function plot ko screen par display karta hai Agar ye line nahi likhi → plot show nahi hoga ✅ Summary Notes pandas → data ko table me store aur manage karne ke liye matplotlib.pyplot → data visualize karne ke liye dict → data ko key-value pair me define kiya pd.DataFrame() → dictionary ko DataFrame me convert kiya plt.scatter() → scatter plot banaya (Age vs Cholesterol) plt.figure(figsize=(w,h)) → plot ka size set kiya plt.xlabel()/plt.ylabel() → axis labels add kiye plt.show() → plot ko display kiya

    📘 Matplotlib kya hai? Matplotlib Python ki ek library hai jo data ko graph ya chart ke form me visualize karne ke liye use hoti hai. Matlab: Agar aapke paas numbers aur data hain, to usko pictures (plots) me dikhane ke liye matplotlib use karte hain. Iska sub-module pyplot commonly use hota hai, jo Matlab-style plotting functions provide karta hai.

    📘 Step-by-Step Notes: Creating Boxplot for Blood Pressure using Matplotlib

    1️⃣ Code Explanation

    plt.boxplot(df["Blood_Pressure"]) #this is code

      📌 Boxplot kya hai?

      • Boxplot ek graphical representation hai jo data ke distribution ko dikhata hai.

      • Matlab: Ye batata hai ki data kis range me spread hai, median kya hai, aur unusual values (outliers) kaunse hain.

      Boxplot ke components: Box (Rectangle) Data ka middle 50% show karta hai. Box ke neeche ka edge → Q1 (25th percentile) Box ke upar ka edge → Q3 (75th percentile) Box ka height → IQR (Interquartile Range) = Q3 – Q1 Median line Box ke andar ek horizontal line hoti hai → median (Q2), yani data ka middle value Whiskers (lines above and below the box) Box ke upar aur neeche extend hoti hain → normal range ke min aur max values Formula: Lower = Q1 – 1.5*IQR Upper = Q3 + 1.5*IQR Outliers (dots outside whiskers) Jo values normal range se bahar hain → wo outliers hote hain
    • plt.boxplot()Boxplot banata hai.

    • Boxplot kya dikhata hai?

      1. Median (Q2) → data ka middle value

      2. Quartiles (Q1, Q3) → data ko 4 equal parts me divide karta hai

        • Q1 → 25th percentile

        • Q3 → 75th percentile

      3. Interquartile Range (IQR) → Q3 - Q1

      4. Whiskers → minimum aur maximum values (excluding outliers)

      5. Outliers → data ke unusual values jo normal range se bahar hote hain

    • df["Blood_Pressure"] → yaha hum Blood Pressure column ka data plot kar rahe hain.


    plt.show()
    • plt.show() → boxplot ko screen par display karta hai.


    2️⃣ Boxplot se kya pata chalta hai?

    • Median → patient group ka central BP value

    • IQR (Box) → majority patients ka BP range >> Q3 - Q1 is basically IQR, Q3 and Q1 ke beech ka gap

    • Whiskers → typical minimum aur maximum values

      Boxplot – Whiskers ka matlab

      Boxplot me Whiskers wo lines hoti hain jo box ke upar aur neeche extend hoti hain. Ye data ke minimum aur maximum typical values ko dikhati hain, outliers ko exclude karke.




    • Outliers → unusually high ya low BP values Whiskers se bahar hoti hai

    💡 Example: Agar kisi patient ka BP 170 hai aur mostly patients 120–160 me hain, to 170 point outlier dikhega.


    3️⃣ Headline Style (Descriptive)

    Boxplot of Blood Pressure using Matplotlib – Shows Quartile Distribution and Outliers

    📘 Seaborn Library kya hoti hai?

    Seaborn Python ki ek data visualization library hai jo matplotlib par built hai—but matplotlib se zyada easy, clean, stylish aur advanced plots banati hai.

    Matlab:
    👉 Matplotlib = Basic plotting tools
    👉 Seaborn = Stylish, professional, beautiful graphs


    📌 Seaborn ka use kyun hota hai?

    Seaborn ki help se hum:

    • Automatically beautiful graphs bana sakte hain

    • Statistical plots (analysis ke liye) easy milte hain

    • Color palettes, themes, style sab built-in hote hain

    • Less code me zyada powerful visualization milta hai

    Code:
    import pandas as pd import seaborn as sns data = { "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010", "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"], "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil", "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao", "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das", "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"], "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27], "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"], "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid", "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis", "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes", "COPD","Thyroid"], "Blood_Pressure": [ 150, 135, 160, 130, 155, 120, 162, 140, 148, 122, 158, 130, 140, 124, 170, 118, 145, 138, 160, 119 ], "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74], "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165], "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No", "Yes","No","Yes","No","Yes","No"], "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication", "Medication","Angiography","Insulin","Liver Treatment","Pain Management", "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy", "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support", "Medication"], "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800, 4200,3200,28000,4500,98000,2200,29000,5700,32000,2300] } df = pd.DataFrame(data) sns.lineplot(df["Blood_Pressure"])


    Difference between Pandas, Matplotlit,Seaborn
    >> Pie Chart avalable on Matplotlit not in Seaborn
    >> Countplot is on Seaborn not in Matplotlit


    📘 Seaborn Countplot – Notes (Blood_Pressure Example)

    Countplot kya hota hai?

    • Countplot ek categorical plot hota hai.

    • Ye kisi category ke frequency (kitni baar value aayi) ko bar chart ke form me show karta hai.

    • Countplot tab use hota hai jab data categorical ho — jaise Male/Female, Smoking Yes/No, Disease Types.

    • Continuous values (jaise Blood Pressure) ko countplot me directly lagane se graph meaningful nahi banta, kyun ki har value unique ho sakti hai.

    Code:
    import pandas as pd import seaborn as sns data = { "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010", "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"], "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil", "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao", "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das", "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"], "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27], "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"], "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid", "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis", "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes", "COPD","Thyroid"], "Blood_Pressure": [ 150, 135, 160, 130, 155, 120, 162, 140, 148, 122, 158, 130, 140, 124, 170, 118, 145, 138, 160, 119 ], "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74], "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165], "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No", "Yes","No","Yes","No","Yes","No"], "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication", "Medication","Angiography","Insulin","Liver Treatment","Pain Management", "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy", "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support", "Medication"], "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800, 4200,3200,28000,4500,98000,2200,29000,5700,32000,2300] } df = pd.DataFrame(data) sns.countplot(df["Smoking"]) #this is code to create countplot

    📘 Countplot kab use karein? (Important for notes)

    • Gender

    • Smoking (Yes/No)

    • Disease Types

    • Treatment Types

    • Male vs Female count

    • Yes/No based columns

    📘 Boxplot vs Histplot (Difference)

    1. Purpose (Kya dikhata hai?)

    📦 Boxplot

    • Data ka summary dikhata hai:
      ✔ Minimum
      ✔ Q1 (25%)
      ✔ Median (50%)
      ✔ Q3 (75%)
      ✔ Maximum
      Outliers

    • Ek hi chart me poora spread samajh aa jata hai.

    📊 Histplot (Histogram)

    • Data ka distribution (kitni values kis range me aayi) dikhata hai.

    • Kis range me kitni frequencies hain – yeh batata hai.


    📘 1. Line Plot (रेखा-चित्र) ✔ क्या है? Line plot trend dikhane के लिए use होता है. Time-series data me सबसे ज़्यादा use होता है. ✔ कब use होता है? Sales over time Temperature over days Heart rate trend Stock market movement ✔ Example: plt.plot(df["Blood_Pressure"]) 📘 2. Histplot / Histogram (वितरण-चित्र) ✔ क्या है? Numeric data का distribution (range-wise frequency) dikhाता है. Data किस range me कितना फैला है, ये बताता है. ✔ कब use होता है? Cholesterol distribution Age distribution Blood pressure distribution ✔ Example: plt.hist(df["Age"]) 📘 3. Count Plot (गिनती-आधारित चार्ट) ✔ क्या है? Categories की count/frequency dikhाता है. Seaborn ka function है: sns.countplot() ✔ कब use होता है? Male vs Female count Smoker vs Non-smoker Disease wise patient count ✔ Example: sns.countplot(data=df, x="Disease") 📘 4. Scatter Plot (बिन्दु-चित्र) ✔ क्या है? दो numeric variables के बीच relationship / correlation dikhाता है. ✔ कब use होता है? Age vs Cholesterol Blood Pressure vs Heart Rate Cost vs Age ✔ Example: plt.scatter(df["Age"], df["Cholesterol"]) 🟦 इनके अलावा और कितने प्रकार के Plots होते हैं? नीचे सबसे important 15 main types हैं: 🔶 5. Boxplot Outliers + quartiles dikhata है. 🔶 6. Violin Plot Boxplot + distribution दोनों का mix. 🔶 7. Bar Plot Categories ka comparison (values के साथ). 🔶 8. Pie Chart Percentage share dikhाता है. 🔶 9. Heatmap Relationship/Correlation को color form me dikhाता है.
    Corelation dikhata hai from -1 to +1 Example: sns.heatmap(df.corr())

    code: import pandas as pd import seaborn as sns data = { "Patient_ID": ["P001","P002","P003","P004","P005","P006","P007","P008","P009","P010", "P011","P012","P013","P014","P015","P016","P017","P018","P019","P020"], "Name": ["Rahul Verma","Anita Singh","Vijay Gupta","Meena Sharma","Sachin Patil", "Neha Soni","Suresh Yadav","Rita Mehta","Aman Khan","Pooja Rao", "Rakesh Tiwari","Divya Jain","Sunil Thakur","Preeti Kaur","Sanjay Das", "Ayesha Ali","Rohit Mishra","Suman Lata","Harish Rana","Kiran Solanki"], "Age": [45,52,60,38,47,29,63,54,41,33,57,49,42,36,59,31,50,44,61,27], "Gender": ["M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F","M","F"], "Disease": ["Hypertension","Diabetes","Heart Disease","Asthma","Hypertension","Thyroid", "Heart Disease","Diabetes","Liver Disease","Migraine","Hypertension","Arthritis", "Kidney Stone","PCOD","Heart Disease","Anaemia","Liver Disease","Diabetes", "COPD","Thyroid"], "Blood_Pressure": [ 150, 135, 160, 130, 155, 120, 162, 140, 148, 122, 158, 130, 140, 124, 170, 118, 145, 138, 160, 119 ], "Heart_Rate": [88,90,110,95,92,78,115,93,85,80,96,88,82,84,118,75,90,92,106,74], "Cholesterol": [240,220,290,180,245,170,300,225,260,190,255,210,230,195,315,160,275,215,265,165], "Smoking": ["No","No","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","No", "Yes","No","Yes","No","Yes","No"], "Treatment": ["Medication","Insulin","Bypass Surgery","Nebulization","Medication", "Medication","Angiography","Insulin","Liver Treatment","Pain Management", "Medication","Pain Relief Therapy","Stone Removal","Hormone Therapy", "Stent Surgery","Blood Therapy","Medication","Insulin","Respiratory Support", "Medication"], "Cost": [3500,5500,145000,2000,4000,2500,65000,6000,35000,1800, 4200,3200,28000,4500,98000,2200,29000,5700,32000,2300] } df = pd.DataFrame(data) sns.heatmap(df[["Age", "Cholesterol"]].corr(), cmap="coolwarm")

    📘 Heatmap — Complete Notes (All Options Explained)

    Seaborn heatmap एक powerful visualization है जो numbers को color intensity से show करता है।

    Basic syntax:

    sns.heatmap(data)

    लेकिन इसके साथ कई important parameters होते हैं।


    1. annot=True

    Heatmap पर values लिखकर दिखाता है।

    sns.heatmap(df.corr(), annot=True)

    ✔ Useful for: Data समझने में easy


    2. fmt = "d" / ".2f"

    Numbers किस format में दिखेंगे?

    sns.heatmap(df.corr(), annot=True, fmt=".2f")

    .2f → 2 decimal
    d → integer


    3. cmap (Color Map)

    Heatmap के रंग define करता है।

    Code: sns.heatmap(df[["Age", "Cholesterol"]].corr(), cmap="coolwarm")

    Common options:

    • coolwarm

    • viridis

    • plasma

    • magma

    • Greens, Blues

    • YlGnBu (Yellow-Green-Blue)

    🔶 10. Pairplot Sab numeric columns ke scatterplots + distribution ek saath. Saare Group ko ek saath dekhne ke liye hota hai, this is for numeric columns

    👇


    📘 Pairplot Using Seaborn — Relationship Between Age & Cholesterol

    इस notes में हम समझेंगे कि pairplot क्या होता है, यह कैसे काम करता है, और आपके कोड में क्या हो रहा है।

    1. Libraries Import करना

    import pandas as pd import seaborn as sns
    ✔ इसका मतलब: pandas (pd) → Data को table (DataFrame) में रखने और manipulate करने के लिए। seaborn (sns) → Advanced data visualization (charts/plots) बनाने के लिए।

    ✅ 2. Data Dictionary बनाना data = { "Patient_ID": [...], "Name": [...], "Age": [...], ... } ✔ इसका मतलब: आपने एक dictionary structure बनाया है। हर key (जैसे "Age", "Gender") एक column represent करती है। हर value एक list है, जो उस column के rows हैं।

    3. Dictionary को DataFrame में Convert करना

    df = pd.DataFrame(data)

    4. Pairplot Plot करना

    sns.pairplot(df[["Age", "Cholesterol"]])

    📌 Pairplot क्या है?

    Pairplot एक ऐसा chart है जो:

    • एक ही figure में

    • सभी selected numeric columns के लिए
      👉 scatter plots
      👉 histograms
      दिखाता है।

    ✔ यह क्या दिखाता है?

    आपके code में सिर्फ 2 columns लिए हैं:

    • Age

    • Cholesterol

    तो pairplot:

    🔹 (1) एक histogram Age का दिखाएगा

    🔹 (2) एक histogram Cholesterol का दिखाएगा

    🔹 (3) एक scatter plot Age vs Cholesterol का दिखाएगा

    इससे relationship समझ में आता है — cholesterol age के साथ बढ़ रहा है या नहीं।


    📌 Summary (1-Line Me)

    Pairplot एक combined chart है जो histograms + scatterplots साथ में दिखाकर variables के बीच relationship समझने में मदद करता है।


    🔶 11. KDE Plot Smooth distribution curve. 🔶 12. Joint Plot 1 scatter plot + 2 histograms combination. 🔶 13. Barh (Horizontal Bar Chart) Long category names ke liye best. 🔶 14. Area Plot Line plot jaisa hi, लेकिन area filled होता है. 🔶 15. Swarm Plot Individual points + category distribution.

    Comments

    Popular posts from this blog

    SQL and rest python for Data analysis

    SQL (Structured Query Language) की ओर — ये डेटा हैंडलिंग का अगला स्टेप है, जहाँ हम database से data को fetch, update, delete, filter, aur organize करना सीखते हैं। 💾 SQL क्या है (What is SQL)? SQL का मतलब है Structured Query Language — ये एक database language है जिसका इस्तेमाल data को store, access, और manage करने के लिए किया जाता है। जैसे Excel में data sheets होती हैं, वैसे SQL में tables होती हैं। Type Keyword Use 1️⃣ DDL (Data Definition Language) CREATE , ALTER , DROP Database structure change करने के लिए 2️⃣ DML (Data Manipulation Language) INSERT , UPDATE , DELETE Data change करने के लिए 3️⃣ DQL (Data Query Language) SELECT Data निकालने के लिए 4️⃣ DCL (Data Control Language) GRANT , REVOKE Permission देने या हटाने के लिए 5️⃣ TCL (Transaction Control Language) COMMIT , ROLLBACK Changes को confirm या cancel करने के लिए Download MY SQL From Google >>>Lets learn this concept compare with excel Concept in SQL Excel Equivalen...

    Add CSS using external CSS

    >>> U just need to create a another page and save it with the name style.css >>> and then go to link that style page with your html docs how to link your css with html page ? >>> You can find code below , it will help you to link your external page with your html docs <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Divyanshu Khare || MERN Developer</title> <meta description = "description" content="Divyanshu Khare's website"> <link rel="stylesheet" type="text/css" href="style.css">   <!----------link external css page ---------> </head> <body> </body> </html>