Q- how to Print Hello World
print("Hello World")
Variables in python -------
age = 30 #variable should be intutive so that we can learn any time
print(age)
Note: Shift+Enter is shortcut to run command
2) '#' this is for writing the comment in python
Rules for Variables---
- Variable can not be start with any number like - 1age
- Number can use in between and end with variable like - age1 age2
- Special characters are not allowed expect _ (underscore) like - age_my
- Space not allowed in variable
- Python is case sensitive
- Integer = age1 to age3 is basically integer , Integer is basically full number
lets check = type(age1) #it will give u print int - float= basically decimal values
Interest = 30.24
type(Interest) #answer is float - Message = Sequence of character is basically and type will be string ,Note: If we are using quote "" the it will string
Message="My Name Is Divyanshu"
type(Message) #print will str #we can use any quote 'I can use this' , "also use this" but whenever we've multiline string then will use '''triple quote''' - Boolean = 2 values are available here True and False
Like = data = False #here data is basically variable and false is data type
type(data) #give u type of data, print bool
bool
- Addition = +
num1 = 20
num2 = 37
result = num1+num2
print(result) #/another way direct: print(num1+num2), without storing data on new var - Substraction = -
num1 = 50
num2 = 20
result = num1-num2
print(result) #/another way direct: print(num1-num2), without storing data on new var - Multiplication = *
num1 = 20
num2 = 39
result = num1 * num2
print(result) #/another way direct: print(num1*num2), without storing data on new var - Integer Division = // #this is basically integer division
num1 = 20
num2 = 3
result = num1 // num2
print(result) #/another way direct: print(num1//num2), without storing data on new var
answer = 6, bcse it will not give u float value - Float Division = / #this is basically float division
num1 = 20
num2 = 3
result = num1 / num2
print(result) #/another way direct: print(num1/num2), without storing data on new var
answer = 6.66666666667 , bcse it is float division - Power = **
num1 = 2
num2 = 5
result = num1 ** num2
print(result) - Modulus = % #it will give u remainder
num1 = 20
num2 = 3
result = num1 % num2
print(result)
answer = 2 , it is remainder
- to give age1 type so we've to typecast here
- age1 = int(input())
- len(string) = basically this is for find length of the character
string = "Divyanshu Khare" #space is also count as string
len(string) #it will give u length of the string - ls = collection of item (list)
Defined by [ ] square bracket
Example: list_name = [item1, item2, item3, ...]
type[ls] #it will give u type of variable - max = find maximum number of list
ls[1,2,4,5,6]
max(ls)
print(max) - min = minimum number of list
ls[1,2,3,4,5,6,7]
min(ls) - sum = sum of numbers
ls[1,2,3,4,6,7,8]
sum(ls) - len(ls) = give u lenth of list
len(ls) - max(string) = it will give u maximum ASCII value's character
string = ("Divyanshu")
max(string) - min(string) = it will give u minimum ASCII value's character
string = ("Divyanshu")
min(string)
Note: ASCII is american standard value of number's in computer - sorted(ls) = it will sort list in accending order
- sorted(ls,reverse = True) = it will sort list in decending orders
- round() = it will round off
round(number,what place u want to round)
example: round(12356.54645,2)
12356.55 #it will give u this as a answer - abs() = it will give u any number as a absolute(positive Number)
abs(-23538) - f = format string
example:
name = "divyanshu"
age = 30
profession = "Data Science"
introduction = f"{name} is {age} year old professional working as {profession}"
print(introduction)
- if else
- cibil_score = int(input("Enter Cibil Score:"))
if(cibil_score>600):
print("u re eligible for loan")
else:
print("u re not eligible") - elif = when we have more than 2 conditions
color = input("Enter color - Red, Green, Yellow")
if color == "Red":
print("Stop")
elif color == "Yellow":
print("Wait")
else:
print("Go") - loops (control structure) -- repeatation of the task
exmple: string = "Data Science"
for i in string:
print(i)
example: for i in range (0,101):
print(i)
example to print square: for i in range(0,100,2) :
print(i)
example :
ls = [1,2,3,4,5,6,7,8]
for i in ls:
print(i) - while = it will run if condition is True/ Tab tk chlega jab tk condition true hai
i = 1
while i < 10:
print("Divyanshu khare")
i = i +1 - Control the loop
>> Break: it will stop all iteration once requirement finished (Stop Iteration)
ls = [1,2,3,4,5,6,7,8,9,10]
for i in ls:
if i == 6:
print("Yes")
break
>>Continue : it will stop that particular iteration only and will jump on another iteration
ls = [1,2,3,4,5,6,7,8,9,10]
for i in ls:
if i == 6:
print("Yes")
continue
>>Pass : if will do nothing, it will not break anything
ls = [1,2,3,4,5,6,7,8]
for i in ls:
if i > 0:
pass #do nothing
else:
print("Negative Number")
>>> DATA STRUCTURE IN PYTHON [this is not algorithm]
- List
- tuples
- set
- dictionary
- String operations
- Indexing = process of fetching character from the collection
- Example:
string = "String"
string[2] - Slicing = process of fetching sequence of character / process of fetching sub-string from the given string
- Example:
string = "Data Science" #lets say I want to fetch data from this string
string[start index:end index + 1] #this is index
string[0:3 + 1] - Example 2 : If i want to slicing from right to left
string = "Data Science"
string[-11: -9] - Example 3: 👉 "If I want to slice by skipping 1 character."
string = "Data Science"
string[5:12:2] - Example 4: I want to reverse print and I dont want to put first index
string = "Data Science"
string[::-1] #If I don’t enter the first index, it will start from the 0th index - In-built function for string
string = "Data"
type(string) - len(string) = length of the string
- convert string into lowercase =
- string = "DIVYANSHU"
string.lower() - convert string into uppercase =
- string = "divyanshu"
string.upper() - convert string into capitalize
- string = "divyanshu"
new_string = string.capitalize()
print(new_string) - #Ye Python ka built-in method hai jo string ke first letter ko capital (uppercase) me badal deta hai #
capitalize()se jo naya result (modified string) milta hai, wo new_string me store ho jata hai. - lstring.islower() -- it will check whether string is lowercase or uppercase
- lstring = "DivyanshuJi" #lstring is basically variable
lstring.islower() - string.isupper() --- it will check whether string is uppercase or lowercase
- ustring = "this is small"
ustring.isupper() - string.isdigit() --- it will check whether string is digit or not ?
- numstring = "435345" #this is check string is number or not
numstring.isdigit() #numstring is basically variable name of string - string.swapcase() -- it will swap the case of given string
- string = "this is small case"
string.swapcase() - string.replace("word to change", "word with change")
- string = "Data science"
string.replace("a", "d") #main data sience me 'a' word ko 'd' se change kr rha - string.split() == it will split string, ("jaha se split krna hai, by default space se krta hai")
- string = "Divyanshu@khare"
string.split("@")
- ls = [1,2,34,5,6,7,"mango"]
type(ls) #to check the data type - len(ls) == #check length of list
- ls[3] =#it will give u value of this index from length
- ls [::-1] = #reverse list
- list concatination =
ls1 = ["apple", "mango"]
ls2 = ["another"]
ls3 = ls1+ls2
print(ls3) - ls.append("apple") -= #add single element on list from end of the list
- ls.extend(["grapes", "gyan"]) = #add multiple element on the list from end
- ls.insert(index number,"element") = #it will add on specific index
- ls.index("grapes") = it will give u positive index of the element
- ls.remove("element name") = #it will remove first occurrence element wise remove, we cant remove directly all element
- ls.sort() = #sort list in accending orders
- ls.sort(reverse=True) = #sort list in descending order, it will change existing list
- sorted(ls) #it will not change existing list
- ls.pop(index number) = #it will remove element from index
- ek baar ban जाने ke baad uske elements badle nahi ja sakte
- Tuple is Immutable
- Tuple ek data structure hai — jaise list, lekin immutable hoti hai
- tuple me wo inform store krte hai jaha ham chahte hai koi bhi program intensionally ya unintensonally change na kar sake
- tuple () is bracket se define krte hai
example: tup = (3,4,6,7,8544,76,34,5)
type(tup) - max(tup) = #give u maximum value of tuple
- mix(tup) = #give u minimum value of tuple
- sum(tup) =#give u sum of tuple
- sorted(tup)
- list >> tuple >> list == typecasting list to tuple and tuple to list
tup1 = tuple(ls)
tup1 - tup.sort()
- del = #i want to remove element from list index wise
ls = [2,3,5,6,7,4]
del ls[3] #it will not return anything
file handling
>> f.read() #it will help to read or access the file
.append() Excel sheet में एक नई row add करने के लिए use होती है।import csv → CSV फाइल को पढ़ने के लिए Python का built-in मॉड्यूल है।
with open("titanic.csv", mode="r") as f: → titanic.csv फाइल को read mode में खोल रहे हैं।
csv.reader(f) → फाइल की हर लाइन को एक लिस्ट की तरह पढ़ेगा।
next(reader) → पहली row (header) को निकाल देगा ताकि वो दोबारा लूप में ना आए।
enumerate(reader) → हर पंक्ति के साथ उसका index (i) देगा।
print(i, row) → हर रिकॉर्ड को उसकी क्रम संख्या (index) के साथ प्रिंट करेगा।
>> Execption handling
Try- Except Block -
try:
result = 10/0
print("This line will not be executed bcoz of error")
except ZeroDivisionError:
print("you cant divide by zero")
Ans: yes
>>RAISE == jab exception customized raise krna hai, jo system raise nhi krega main apne requirement ke according raise krna chahta hu
def withdraw (balance, amount): if balance-amount < 1000: raise Exception("Withdrawin denied: Minimum balance of 1000 INR to be maintaned") else: remaining = balance - amount return balance-amounttry: remaining_amount=withdraw(balance=1000, amount=2000) print(f"After transaction remaining balance is {remaining_amount}")except Exception as e: print(f"Traction failed: {e}")
return ka matlab
-
Python me return ka kaam hai: function ke andar se koi value bahar bhejna
-
Matlab function calculate karke result wapas main program me deta hai
def = function define krta hai / define keyword hai
withraw = function ka naam, apne according
(Balance, amount) = is argument
Return: Function ye value bahar bhejta hai → try block me remaining_amount variable me store hoti hai
Try blockk
>>withdraw(balance=5000, amount=2000)
-
Ye function call hai.
Matlab hum withdraw function ko execute kar rahe hain aur usme:
5000 :balance
2000 : amount
- remaining_amount
- function ka return value remaining_amount variable me store ho rha hai
- Mtlb withdraw ke baad bacha hua paisa ab is variable me hai
>>Built-in Modules #math#random#datetime#os#sys
>> import math #lets say i want to get square rootmath.sqrt(34) #yaha 34 ka square nikal rhe
>>import random #module has function to generate random datarandom.random() #generate random decimal number (0-1 range tak)
random.randint(1,100)
>>from datetime import datetime #it will give u date timedatetime.now()
>>import os #files check krne ke liyeos.getcwd() #file ka path check krne ke liye
>
numpy =numerical python , numpy ke strct numeric import krke use kr skte hai
import numpy as np = #np is a shortname /np ki jgh kuch v de skte hai, np is a alias ise hamne isliye likha taaki in future jab bhi hame numpy likhna ho to pura likhne ke bajaye np likh ke hi upyog kr paye
numpy jis Data sturcture pe based hai >>>>>>>>>>>
>>Array ---------import numpy as nparr =np.array([2,3,4,5,6,]) #list input me le rha, arr is basically variable name type(arr)arr.ndim #check dimension of array, other way to check dimension , last bracket jitna hai utni dimension>> 2 / Multi Dimension Array ------arr2 = np.array([[1,2,3], [5,46,67]])arr2.ndim #arr2 is array's name and ndim will help to check dimension>> arr2.shape == #it will help us to check rows and columns of arraysarr2 = np.array([[1,2,3], [5,46,67]])arr2.ndimarr2.shape
>> arr2.size ==== it will give u no of elements of array in rows onlyarr2 = np.array([[1,2,3,5], [5,46,67,5]])arr2.ndimarr2.shapearr2.size #no of elements
>> arr2.dtype>>zeros_array = np.zeros((row_number, column_number)) import numpy as np zeros_array = np.zeros((10,4)) #it will give u float 0 according to row and column print(zeros_array)
>>ones_array = np.ones((row_number, column_number)) import numpy as np ones_array = np.ones((6,4)) #it will give u float number 1 according to row and column print(ones_array)>>full_array = np.full((row_number, colum_number, fill_value = value_number) here full_array is variable name
np.full = function name it will update full row number like 3, column number like 4
fill_value = Ye parameter batata hai ki array ke sabhi elements me kya bhara jayeExample: full_array = np.full((6,4), fill_value = 23)print(full_array)>>np.random.rand(dimension_number)example: r_array=np.random.rand(5)
np.random.rand = Ye NumPy ke random module ka function hai.Ye function 0 aur 1 ke beech random numbers generate krta hai.(dimension_number) Ye batata hai ki kitne random numbers chahiye.
>> np.round(r_array,number_round_in)example:r_array=np.random.rand(5)np.round(r_array,2)example:arr = np.arange(1,11)print(arr)
#yah NumPy का function है जो 1 से लेकर 10 तक की संख्याएँ generate करेगा। bcse end number 11 will not count
>> Indexing for single dimension arrayimport numpy as nparr = np.array([1,3,4,8,5,])arr[2] #2 is basically index of array
>> Slicing for double dimension arrayimport numpy as nparr = np.array([[1,2,3], [5,46,67]])arr[1:2] #1 = start , 2 is end #this is for single dimension
Example: this is for multidimentional array#slicing in multidimentional array, lets suppose I want to slice 2 3 5 6#arr2[start_row:end_row+1,start_column:end_column+1]arr2 = np.array([[1,2,3,4], [4,5,6,9], [7,8,9,10]])arr2[0:2,1:3]
>> Iteration ----------- just like literation in loopfor i in arr2: print(i);
>> Joining -- if I've more than 1 array than how I can merge#lets suppose ek hospital me 2 ward hai general and ICU jiske data ko jodna haigeneral = np.array([[98,43,5345], [42,45,32]])icu = np.array([[98,92,73], [89,42,52]])np.concatenate((general,icu),axis = 1) #side by side (row wise merge)
2nd Example: merge column wisegeneral = np.array([[1,2,3], [5,6,7]])np.concatenate((general,icu), axis = 0)
>> SPLITINGimport numpy as nparr = np.arange(1,11)arr #run arrnp.array_split(arr,3) #3 is basically number of split in how many part we want to split, it is function to split array, it will split array into equal part
arr2 = np.arange(1,11)arr2np.split(arr2,2) #it will work when equal divisible possible only
>>ARRAY SORTINGimport numpy as nparr = np.array([1,2,46,7,86])np.sort(arr) it will sort array, by default axis will be 0 #row_wise sorting
#coulmn_wise sorting import numpy as nparr = np.array([[1,2,46,7,86], [32,4324,543,4324,4324]]) np.sort(arr, axis = 1)arr
>> Searching = import numpy as nparr3 = np.array([1,2,46,7,86]) np.where(arr3>40) #stands for conditional search, it will give u index of that value which is greater than 40
>> np.nonzero(arr) = i will return index of that value whereable u've non zero , basically it will not give u 0 number's indeximport numpy as nparr = np.arange(1,11)np.nonzero(arr)
>>Filteration = how to filter the dataimport numpy as nparr = np.array([13,5,64,10,78,10])arr[arr>7] #it will return element which is greater than 7
>>Mathematical Operations in Numpy
x=np.array([[2,4], [6,10]])y = np.array([[12,23], [34,8]])x+y #======= it will add x +y
>> x//y = integer divisionx=np.array([[2,4], [6,10]])y = np.array([[12,23], [34,8]])x//y #======= it will divide x by y
>> np.divide(x,y) = float divisionimport numpy as npx=np.array([[2,4], [6,10]])y = np.array([[12,23], [34,8]])np.divide(x,y) #=it will give u float division
>> np.multiply(x,y) #it will multiply
>>matrix = rows multiply by column #condition ye hai = number of column of first array should be equal to the number of second arrayarr1 = np.array([[2,4],[1,3]]) arr2 = np.array([[3,6],[7,3]]) np.matmul(x,y) # matmul is function for matrix multiplication, (2×3) + (4×7) = 6 + 28 = 34
>>reshape array = #array can be reshape if size before and after reshaping are sameExample: lets suppose I have 14 dimension array, I want to make two dimension array (2,7) but (2,8) isme ham nhi kar skte arr = np.arange(1,15)print(arr)reshape_array= arr.reshape(2,7) # this is function and shape is 2 and 7, reshape_array this is variable where I'm storing reshaped arrayWhy? Two change dimension --- one dimension to 2 dimension
>>another way to reshapereshape_array= arr.reshape(2,-1) #basically -1 is only a placeholder it will calculate automatically #automatic dimension calculator
>>another way to reshape = from 3 dimension to 1 dimensionreshape_array = arr.reshape(-1) #पूरे array को एक single dimension (1D array) में convert कर दो। , -1 is basically placeholder and arguments
>>Mathematical Operations1)temp = np.array([23,32,35,54,54])print(temp)np.mean(temp) #calculate avg
2) np.min(temp) #find minimum(3)np.std(temp) #standard deviation means array के values औसत (mean) से कितना दूर या फैले हुए हैं।
4)np.percentile(temp,40) #to find percentage, in terms of count, median wali value hi aati h5)np.sum(temp) #to sum6) np.median(temp) #to find like beech ka number7)np.prod(temp) #product of all elements product (गुणा)8)np.cumsum(temp) #cumlative sum
9)np.cumprod(temp) = Cumulative Product,
यानि हर element तक का गुणा step-by-step दिखाना।
-------------------PANDAS------------------------------Pandas एक Python library है जो हमें data को store, clean, analyze और manipulate (बदलने) में मदद करती है।
आसान शब्दों में:
जैसे Excel में हम rows और columns में data रखते हैं,
उसी तरह Python में Pandas हमें data को Excel की तरह handle करने की सुविधा देता है।
CSV file read krne me help krti hai
pandas = pannel data
>> Pandas ================ indexing IN pandas
import pandas as pd #pd is short name of pandas u can use according to u
import pandas as pd
pd.Series([23,43,54,65]), index = ["Mon","Tue","Wed", "Thu"] #it is basically one column in my pandas, #here u can put index as per my requirement,#here I've updated tue for index 2
#another way for create series
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s["Mon"] #s is basically series name and Mon is index jiski value nikal rhe
Note: This is basically example for indexing
>>> SLICING IN SERIESs[1:3] #main 1st index se 2nd tak ja rha +1 rhta hai isliye 2 ke bajaye 3 likha hai
>>Filtering in Series
import pandas as pd
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s[s>45] #here it will give u value which is greater than 45
>> SHAPE IN PANDAS
s.shape #it will give u shape of pandas
>>INDEX OF PANDAS
s.index #it will give u index values of series
>>MATHEMATICAL OPERATIONS IN PANDAS
import pandas as pd
s=pd.Series({"Mon":23, "Tue":45, "Wed": 54, "Thu":65})
s*2 #multiply
s+2 #adding
s/2 #divide
>>Operations based on 2 region senarioregion_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})total = region_a+region_b #we are adding region_a value's with region_b valuestotal
diff = region_a - region_b #difference between regiondiff
multi = region_a * region_b #multiplication heremulti
Note: In a series, the position is not important; the addition will be performed according to the index, and values like “Jan to Jan” will be added together even if I change the sequence.
>>>> other mathematicals operations based on regionregion_a =pd.Series({"Jan":12, "Feb":13, "March": 16,"April":78})region_b = pd.Series({"Jan":42, "Feb":53, "March": 36,"April":98})region_a.max() #it iwll give u maximum value region_a.min() #it will give u minimum value region_a.mean() #it will give u region_a.sum() #it will give u sum value for regionregion_a.prod() #calculates the cumulative sum of a Series
>>another functions for pandas1) apply == i want to assign some valueex: if sales > 30:
return "High Value"
elif sales < 50:
return "Moderate"
else:
return "High"
region_a.apply(sales_category)
Notes based on this: def → This keyword is used to define a function in Python.
sales_category → This is the name of the function (you can choose any valid name).
(sales) → This is the parameter (a placeholder for the value that will be passed to the function when it is called).
2)map = map() → “replace or transform” each value of a Series according to rules you give.ex: dept_names = {"HR": "Human Resources" "Eng": Engineering, "Sal": "Science", "FIN": "Finance"}
dept_codes.map(dept_names)
Explanation:
pd.Series([...]) → creates a Pandas Series (a one-dimensional labeled data array).dept_names → is a dictionary mapping department codes to their full names.map() → replaces each value in the Series (dept_codes) with the corresponding value from the dictionary (dept_names).Note: > order does not matter here > (0,1,2,3) → Index number (position)
A data scientist wants to extract only the months where customer churn rate exceeded 8%. The correct approach is asume churn is a pandas series.- churn[churn < 8]
- churn.where(churn > 8) #this is correct answer
- churn.mask(churn > 8)
- churn.clip(uppoer = 8)
Ans: churn = pd.Series([10,8,4,212,14], index = ["Jan", "Feb", "Mar", "Apr" "May","jun"])churn.where(churn > 8)
Explain: 1) Why use .where()?
Because where() keeps only those values which satisfy the condition, and replaces others with NaN.
2) Churn ka matlab hota hai —
kisi company ke customers ka chhod kar chale jaana ya service cancel kar dena.
-----------------DATA FRAME ---------------------------------
DataFrame Pandas library ka ek 2D (two-dimensional) data structure hota hai —
jaise ek Excel sheet ya table, jisme rows aur columns hote hain.Soch lo jaise:
-
Rows = records / entries
-
Columns = fields / variables
eXAMPLE:datapd.DataFrame([[)
Example: import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita"], "Age": [25,27,29], "City": ["Delhi", "Rganj", "Patna"]}
df = pd.DataFrame(data)dfdf.to_csv("dat.csv", index = True)🔹 Step 1: import pandas as pd
-
Ye line Pandas library ko import karti hai.
-
pandas ek Python library hai jo data ko table (rows & columns) ke form me handle karne ke liye use hoti hai.
-
as pd ka matlab — jab bhi hum “pandas” ka function use karein, hum usko shortcut naam “pd” se likh sakte hain.
🔹 Step 2: data = {...}
Yaha humne ek dictionary banayi hai jisme 3 keys hain:
Dictionary:
data ek dictionary hai 🧠
Python me dictionary ek data structure hoti hai jo key-value pairs me data store karti hai.
Matlab:
-
"Name" → list of names
-
"Age" → list of ages
-
"City" → list of cities
Iska structure kuch aisa hai:
⚠️ Note: Aapke code me "Delhi" aur "Rganj" ke aas-paas quotes nahi lage hain —
Unhe "Delhi" aur "Rganj" likhna chahiye, warna Python error dega (kyunki wo variable samjhega).
🔹 Step 3: df = pd.DataFrame(data)
-
Ye line dictionary ko ek DataFrame me convert karti hai.
-
DataFrame basically ek Excel sheet jaisa table hota hai jisme rows aur columns hote hain.
Result kuch aisa dikhega 👇
Name Age City 0 divyanshu 25 Delhi 1 Neha 27 Rganj 2 Ankita 29 Delhi
🔹 Summary:
Line Kya karta hai import pandas as pdPandas library ko import karta hai data = {...}Dictionary me data store karta hai pd.DataFrame(data)Dictionary ko table (DataFrame) me badalta hai dfFinal DataFrame object hai jisme data rows-columns me hota hai
4️⃣ df.to_csv("dat.csv", index=True)
👉 Ye line df (DataFrame) ko ek CSV file me save kar rahi hai.
-
"dat.csv" → file ka naam hai (ye tumhare system ke folder me ban jayegi,anaconda ke folder me)
-
index=True → iska matlab hai row numbers (0,1,2...) bhi file me save karna.
🔸 Agar tum index=False likhte ho, to row numbers CSV me nahi aate.
🔹 CSV file kya hoti hai?
CSV (Comma-Separated Values) ek simple text file hoti hai jisme data comma se alag hota hai
>>> how to Save data frame (df) into excel -------------------
import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita"], "Age": [25,27,29], "City": ["Delhi", "Rganj", "Patna"]}
df = pd.DataFrame(data)dfdf.to_excel("file_name.xlsx", index = False)5️⃣ to_excel() function kya karta hai
👉 Ye function DataFrame ko Excel file (.xlsx) format me save karta hai.
Matlab tumhara data ab Excel sheet ke form me likha jayega.
>>> How to read file created using data frame using pandas ??
df = pd.read.csv("file_name")
example:
import pandas as pd
data = {
"Name": ["divyanshu","Neha", "Ankita"],
"Age": [25,27,29],
"City": ["Delhi", "Rganj", "Patna"]
}
df = pd.DataFrame(data)
df.to_csv("NewCsv.csv", index = False)
df = pd.read_csv("NewCsv.csv")
>>>df.head == by default it shows 5 rows, how mamy rows u want to see
Explaination: head() function DataFrame ke top rows (upar ke records) dikhata hai.
-
Default: agar tum likhte ho df.head() → to pehle 5 rows show karta hai.
-
df.head(1) → sirf pehli row (first record) show karega.
>>df.tail(1) == how to see data from bottom
import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita"], "Age": [25,27,29], "City": ["Delhi", "Rganj", "Patna"]}
df = pd.DataFrame(data)df.to_csv("NewCsv.csv", index = False)df = pd.read_csv("NewCsv.csv")dfdf.head(1)df.tail(1) #1 is basically number from bottom
🔹 Meaning:
tail() function DataFrame ke last rows (niche ke records) dikhata hai.
-
Default: agar tum likhte ho df.tail() → to last 5 rows show karega.
-
df.tail(1) → sirf last 1 row (aakhri record) show karega.
>> df.info() == entire meta data🔹 Meaning:
info() function DataFrame ke structure aur basic details batata hai —
jaise columns ke naam, unke data types, aur har column me kitne non-null (filled) values hain.
>>df.describe() == it will give u complete statistical summary
describe() function DataFrame ke numerical columns ka statistical summary deta hai.
Ye automatically numbers wale columns (like marks, age, salary etc.) ke liye
important measures calculate karta hai jaise:
-
count → kitne values hain
-
mean → average value
-
std → standard deviation (data kitna spread hai)
-
min → smallest value
-
25%, 50%, 75% → percentiles (quartiles)
-
max → largest value
>>df.shape ==== how many rows and columns we have in dataFrame
>>how to index one column in dataFrame ?df["City"] #series Explain: तो ये DataFrame की सिर्फ एक column (City) को निकालता है,
और इसका output होता है Pandas Series.
>> df.loc[0:1] = fetch data row wise
0:1 -----------
→ यह एक slice है (range selection) जिसका मतलब है —
row index 0 से लेकर 1 तक की rows निकालो (inclusive).---------------------.loc[] का इस्तेमाल label-based indexing के लिए होता है।
यानी तुम row labels (index) के आधार पर rows निकाल सकते हो।
df.set_index
>>df.set_index("City", inplace = True) = if Inplace = true, existing df itself will get updated rather than a new dfइस line का मतलब है कि "City" कॉलम को index बना दो — यानी अब DataFrame में हर row की पहचान City के नाम से होगी, न कि 0, 1, 2 से।

>>> df.icoc[0:1] = #position based indexing
import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita", "Junaid"], "Age": [25,27,29,48], "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]}
df = pd.DataFrame(data)print(df.iloc[0:2])यह Pandas को बताता है कि 0 से शुरू होकर 2 से पहले तक की rows दिखाओ।
मतलब —
यह केवल पहली दो rows दिखाएगा
>>>Modifying data opeations
df["Age"] = df["Age"]/100 यह Pandas DataFrame में "Age" column की हर value को 100 से divide कर देता है।📘 Important Concepts:
-
df["Age"] → यह Age column को select करता है (Series form में)
-
/100 → यह हर value को 100 से divide करता है
-
df["Age"] = ... → यह modified values वापस Age column में assign कर देता है
>>df.rename("column" ={"COLUMN_NAME":"CHANGED_COLUMN NAME"},inplace = True) = rename column name, rename header of column
import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita", "Junaid"], "Age": [25,27,29,48], "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]}
df = pd.DataFrame(data)print(df.iloc[0:2])df["Age"] = df["Age"]/100df.rename(columns ={"City":"Place"},inplace = True)df
Explain: columns = {...}→ यह dictionary है जिसमें तुम पुराने column नाम और नए column नाम define कर रहे हो।यहाँ "City" को "Place" से replace किया जा रहा है।
inplace = True→ इसका मतलब है कि ये बदलाव सीधे original DataFrame (df) में लागू हो जाएगा।यानी नया DataFrame बनाने की जरूरत नहीं पड़ेगी।
>>df.drop("Age",axis = 1, inplace = True) = it will drop the columnEx: import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita", "Junaid"], "Age": [25,27,29,48], "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]}
df = pd.DataFrame(data)print(df.iloc[0:2]) #index wise viewdf["Age"] = df["Age"]/100 #this is devide entire columndf.drop("Age",axis = 1, inplace = True)df📘 Explanation:
-
drop() → किसी row या column को DataFrame से हटाने के लिए use होता है।
-
"Age" → यह बताता है कौन-सा column हटाना है।
-
axis = 1 →
-
axis = 0 → rows के लिए होता है
-
axis = 1 → columns के लिए होता है
इसलिए यहाँ column हटाया जा रहा है।
-
inplace = True →
-
इसका मतलब: change सीधे original DataFrame में लागू हो जाएगा।
-
अगर ये False रहता तो हटाने का effect temporary होता।
>> df["New_Column_name"] = [1,2,3,4..... values] = create new columnex: import pandas as pddata = { "Name": ["divyanshu","Neha", "Ankita", "Junaid"], "Age": [25,27,29,48], "City": ["Delhi", "Rganj", "Patna", "Saharanpur"]}
df = pd.DataFrame(data)df["Age"] = df["Age"]/100 #this is devide entire columndf.drop("Age",axis = 1, inplace = True)dfdf["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 valuedf
Explaination:df["RR"]→ इसका मतलब है: DataFrame df में "RR" नाम से एक नया column बनाना।(अगर "RR" पहले से है तो उसकी values update हो जाएँगी।)
= ["Yes","No","Yes","No"]→ यह values उस column की हर row में assign की जा रही हैं।मतलब —
दूसरी में “No”
तीसरी में “Yes”
चौथी में “No”
>>> Filtering Datadf.loc["Delhi", "City"] = 30 Explanation:
-
df.loc[ ] → Pandas का label-based selector है।
इसका मतलब है कि हम किसी row और column को label (नाम) से access करते हैं, न कि index number से।
-
"Delhi" → ये row label है।
यानी तुम उस row को target कर रहे हो जिसका index "Delhi" है।
⚠️ इसका मतलब यह है कि तुम्हारे DataFrame में “City” column index बना हुआ होना चाहिए,
>>> How to filter Data greater than,Pandas query() Function — Conditional Filtering Example
df["age"] > 25df[df["Age"] > 25]Explanation:
🔹 1️⃣ df["Age"] > 25
यह line कोई data नहीं निकालती,
बल्कि एक Boolean Series (True/False values) बनाती है।
मतलब Pandas हर row की "Age" को check करता है कि
क्या वो 25 से बड़ी है या नहीं।
🔹 2️⃣ df[df["Age"] > 25]
यह line ऊपर वाली Boolean Series का इस्तेमाल करके
True वाली rows को filter करती है।
मतलब — “Age 25 से बड़ी वाली rows दिखाओ।
>> Another way to filter --df[(df["Age"] > 25) & (df["Stake"] < 1)]
>>Pandas query() Function — Conditional Filtering Exampleimport pandas as pddata = { "Name": ["Sumair","Neha", "Dinesh", "Junaid"], "Age": [2235,27,29,48], "City": ["Delhi", "Chainpur", "Pune", "Saharanpur"]}
df = pd.DataFrame(data)#df["RR"] = ["Yes","No","Yes","No"] #bcse we have only 3 value thats why using 1,2,3 value, we can create new columndfdf["Stake"] = [45,454,124,756]df.loc["Delhi", "Name"] = 30 dfdf["Age"] > 125df[df["Age"] > 125]df.query("Age > 25 and Stake < 100")
🔹 1️⃣ .query() क्या करता है?
.query() एक filtering method है जो तुम्हें SQL-style condition लिखने देता है —
यानि "Age > 25 and Stake < 100" जैसी string के अंदर directly condition दे सकते हो।
ये वही काम करता है जैसा ये code: df[(df["Age"] > 25) & (df["Stake"] < 100)]
🔹 2️⃣ "Age > 25 and Stake < 100"
यहाँ दो conditions हैं:
-
Age > 25 → मतलब सिर्फ वो rows जिनकी Age 25 से ज़्यादा है
-
Stake < 1 → और साथ में Stake column की value 100 से कम हो
and का मतलब दोनों conditions True होनी चाहिए।
🔹 3️⃣ Output
यह query सिर्फ उन्हीं rows को return करेगी जहाँ
Age 25 से बड़ी और Stake 100 से कम है।
>>df.to_clipboard() — Copy DataFrame to Clipboard
Explanation:
यह function पूरे DataFrame (df) को clipboard में copy कर देता है।
मतलब — आप इस data को Ctrl + V दबाकर सीधे Excel, Google Sheets या Notepad में paste कर सकते हैं।
>> df.to_hdf("File_name.h5", key='My_data') — Save DataFrame in HDF5 Format
🧠 Explanation:
यह function Pandas DataFrame को HDF5 file format (Hierarchical Data Format) में save (store) करने के लिए use किया जाता है।
यह format बड़ी मात्रा में data को compressed (संपीड़ित) और efficient तरीके से store करने में मदद करता है —
विशेष रूप से जब data बहुत बड़ा हो (जैसे millions of rows)।
🔸 key parameter HDF5 file में DataFrame को unique name देने के लिए जरूरी होता है। basically table name on that file
🔸 बिना key दिए .to_hdf() काम नहीं करता (error देता है)।
🔸 एक ही .h5 file में multiple DataFrames अलग-अलग keys से store किए जा सकते हैं।
🔸 HDF file is basically stored in binary format so u can not directly access this file
>>
📘 Pandas: Reading HDF5 File using pd.read_hdf()
df_hdf = pd.read_hdf("file_name.h5", key='My_data')Definition:
pd.read_hdf() Pandas ka ek function hai jo HDF5 format file ko read (load) karke DataFrame ke form me return karta hai।
यह function .to_hdf() से save की गई file को दोबारा memory में लाने/Reading के काम आता है।
🔹read_hdf()
यह Pandas का function है जो HDF5 file format की file को read (load) करने के लिए use होता है।
यह file खोलकर उसमें से data निकालता है और उसे DataFrame के रूप में वापस देता है।
🔹"file_name.h5"
यह उस file का नाम है जिसे आप read करना चाहते हैं।
.h5 या .hdf5 extension बताता है कि file का format HDF5 है।
🔹key='My_data'
यह HDF5 file के अंदर stored DataFrame का unique name या label है।
क्योंकि एक HDF5 file में multiple DataFrames store किए जा सकते हैं, हर एक के लिए अलग key होती है।
>> Filtering Rows Based on Multiple Values using isin() Method
🔍 एक से ज़्यादा Values के आधार पर Rows को Filter करना — isin() Method
import pandas as pd
data = { "Employee_ID": [101, 102, 103, 104, 105], "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"], "Department": ["IT", "HR", "Sales", "Finance", "IT"], "Age": [25, 29, 32, 28, 26], "Experience_Years": [2, 5, 7, 3, 4], "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]}df = pd.DataFrame(data)df
Step-by-Step Explanation:
1️⃣ df — DataFrame
-
df वो variable है जिसमें पूरा dataset (table) store है।
-
Example के लिए, मान लो हमारा DataFrame ऐसा है:
2️⃣ df["Department"]
-
यह सिर्फ "Department" column को select करता है।
-
Output होगा:
3️⃣ isin(["Sales", "IT"])
-
यह check करता है कि "Department" column की values "Sales" या "IT" में हैं या नहीं।
-
Output देगा एक Boolean Series (True/False values):
4️⃣ df[df["Department"].isin(["Sales", "IT"])]
-
अब ये Boolean Series पूरे df पर apply होती है।
-
सिर्फ वही rows दिखाई जाएँगी जहाँ value True है।
-
मतलब केवल "Sales" और "IT" departments के employees दिखेंगे
✅ Final Output:
➡️ केवल "Sales" और "IT" departments की rows filter हो जाएँगी।
📘 Pandas: Excluding Rows Using ~isin() Function
📘 Pandas में ~isin() Function से कुछ Rows को Exclude करना (हटाना)
import pandas as pd
data = { "Employee_ID": [101, 102, 103, 104, 105], "Name": ["Divyanshu", "Ankita", "Junaid", "Neha", "Ravi"], "Department": ["IT", "HR", "Sales", "Finance", "IT"], "Age": [25, 29, 32, 28, 26], "Experience_Years": [2, 5, 7, 3, 4], "Monthly_Salary": [50000, 60000, 65000, 55000, 52000]}df = pd.DataFrame(data)dfdf[~df["Department"].isin(["Sales", "IT"])]
🧠 Step-by-Step Explanation:
1️⃣ df
यह आपका DataFrame है — जिसमें सभी employees का data है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary 101 Divyanshu Sales 25 2 40000 102 Anshika HR 27 3 45000 103 Neha IT 28 4 50000 104 Junaid Finance 30 5 55000
2️⃣ df["Department"].isin(["Sales", "IT"])
यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।
Output एक Boolean Series होती है
🧠 Step-by-Step Explanation:
1️⃣ df
यह आपका DataFrame है — जिसमें सभी employees का data है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary 101 Divyanshu Sales 25 2 40000 102 Anshika HR 27 3 45000 103 Neha IT 28 4 50000 104 Junaid Finance 30 5 55000
2️⃣ df["Department"].isin(["Sales", "IT"])
यह check करता है कि “Department” column की value “Sales” या “IT” में है या नहीं।
Output एक Boolean Series होती है 👇
3️⃣ ~ (Tilde Operator)
-
यह एक NOT operator है (मतलब उल्टा कर देता है)।
-
True → False और False → True बन जाता है।
इसलिए अब output बनेगा 👇
4️⃣ df[~df["Department"].isin(["Sales", "IT"])]
अब DataFrame में केवल वो rows बचेंगी जहाँ condition False थी पहले —
यानि अब वो employees जिनका department “Sales” और “IT” नहीं है 👇
Employee_ID Name Department Age Experience_Years Monthly_Salary 102 Anshika HR 27 3 45000 104 Junaid Finance 30 5 55000
✅ Final Output:
➡️ यह code उन सभी rows को दिखाएगा जो "Sales" और "IT" department में नहीं हैं।
Short Summary Table:
Symbol / Function Meaning isin()Checks if value is present in list ~Reverses the condition (True → False) df[...]Filters DataFrame based on condition
Question: How do we sort data Python ?Question: Data Handling In Python ?Question: Data Cleaning In Python ?Question: data Handling Missing Value in Python ?Question: Handling Duplicates in python ?
📘 Pandas – Reading a CSV File and Previewing Data (Notes)
➡️ Code:
import pandas as pddf = pd.read_csv("day.csv")df.head(2)
📝 Notes – Line-by-Line Explanation
1️⃣ import pandas as pd
English
This line imports the Pandas library and gives it a short name pd, so we don’t have to type pandas again and again.
Hindi
Ye line Pandas library ko import karti hai aur uska short name pd rakhti hai, jisse baar-baar pura pandas likhna na pade.
Iska matlab:
Ab hum Pandas functions pd. lagakar use kar sakte hain.
Example: pd.read_csv(), pd.DataFrame() etc.
2️⃣ df = pd.read_csv("day.csv")
English
This reads the CSV file named day.csv and loads it into a DataFrame called df.
Hindi
Ye day.csv file ko read karke Pandas DataFrame me convert karti hai jiska naam df rakha gaya hai.
Important Points
-
read_csv() → function to read CSV files
-
"day.csv" → filename
-
df → variable storing the table-like data
3️⃣ df.head(2)
English
This displays the first 2 rows of the DataFrame.
It helps quickly preview the data and check whether it loaded correctly.
Hindi
Ye DataFrame ki pehli 2 rows screen par show karta hai.
Isse hum file sahi load hui ya nahi, ye quickly check kar sakte hain.
General Rule
-
df.head() → shows first 5 rows (default)
-
df.head(2) → shows first 2 rows
-
df.head(10) → shows first 10 rows
🧾 Pandas – df["season"].count() (Short Notes in Hindi)
✔️ Code
🧠 इस कोड में क्या हो रहा है?
-
df["season"] → DataFrame से season column select करता है
-
.nunique() → उस column में कितने unique (अलग-अलग) values हैं, उनकी संख्या बताता है
👉 मतलब यह check करता है कि season column में कितने अलग-अलग season आए हैं।
📌 Example
अगर season column में values हों:
तो output होगा:
Comments
Post a Comment