n = 3
df = pd.DataFrame(data=np.arange(0,n**2,1,dtype=np.int16).reshape((n,n)))#, columns=["a","b","c"])
df| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 0 | 1 | 2 |
| 1 | 3 | 4 | 5 |
| 2 | 6 | 7 | 8 |
n = 3
df2 = pd.DataFrame(data=np.arange(0,n**2,1,dtype=np.int16).reshape((n,n)), columns=[1,3,0], index=[2,0,1])
df2| 1 | 3 | 0 | |
|---|---|---|---|
| 2 | 0 | 1 | 2 |
| 0 | 3 | 4 | 5 |
| 1 | 6 | 7 | 8 |
Pandas method reindex selects existing indexes/columns and fills non-existing ones
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 5 | 3 | 0 | 4 |
| 1 | 8 | 6 | 0 | 7 |
| 2 | 2 | 0 | 0 | 1 |
| 3 | 0 | 0 | 0 | 0 |
Operations without elements on both df are filled with NaN
By using pandas method add we can choose how to fill in these situations.
Other operators:
raddsub, rsubdiv, rdivfloordiv, rfloordivmul, rmulpow, rpow| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 5.0 | 4.0 | 2.0 | 5.0 |
| 1 | 11.0 | 10.0 | 5.0 | 8.0 |
| 2 | 8.0 | 7.0 | 8.0 | 2.0 |
| 3 | 1.0 | 1.0 | 1.0 | 1.0 |
Broadcasting in DataFrames requires the additional information of matching axis, if not by rows.
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 0.000000 | 1.0 | 2.000000 |
| 1 | 0.750000 | 1.0 | 1.250000 |
| 2 | 0.857143 | 1.0 | 1.142857 |
The opposite is true for apply, where you would select columns to have the summation happen through the columns
And can also return series. Probably how describe and info work
Element wise “apply” for DataFrames is called applymap. Equivalent to map in Series
Creating from cut
nums = np.random.randint(0,9,20, dtype = np.int8)
bins = range(0,10,2)
cat_var = pd.cut(nums, bins)
cat_var[(2, 4], (2, 4], (4, 6], (2, 4], (6, 8], ..., (6, 8], (2, 4], (0, 2], (6, 8], (6, 8]]
Length: 20
Categories (4, interval[int64, right]): [(0, 2] < (2, 4] < (4, 6] < (6, 8]]
array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True])