[Python] Writing csv files and reading them (simplified)
What are CSV files?
CSV stands for "Comma Seperated Values", and that's exactly what it is. It contains values, seperated by commas or semicolons (";"). The well known software "Excel" from Microsoft for example, uses semicolons to seperate the values.
For example: when we open the following file ...
... in a text editor like Notepad, we would see a format like:
Mercedes;AMG GT;117000 VW;Golf R;41000 BMW;X3;44000
And with knowing that, we can work with csv files easily. As it is only string manipulation to sort the data.
In order to follow up this tutorial, you can copy/paste the above three lines into a Notepad file and name it "file.csv".
Reading CSV files
Let's write some code, that opens a csv file and reads it's content :
1 2 3 4 5 6 | with open('file.csv', "r") as file: rows = [] for line in file: rows.append(line) print(rows) |
The Output is as follows:
['Mercedes;AMG GT;117000\n', 'VW;Golf R;41000\n', 'BMW;X3;44000\n']
As you can see, we have a list where each line of the file is an element from type "string" of this list.
So at first, let's remove the line breaks from each string with the string method "strip()":
.strip("\n")
The code looks like this:
1 2 3 4 | with open('datei.csv', "r") as file: rows =[] for line in file: rows.append(line.strip("\n")) |
And the output:
['Mercedes;AMG GT;117000', 'VW;Golf R;41000', 'BMW;X3;44000']
All the line breaks are removed.
Now we need to remove the seperators (";") as well.
At the same time, we will create a new list out of each line, and every value that was seperated by a semicolon, will be an element of this new list.
So at the end, we will have a list, that contains each line as another list.
And for doing that, the seperators come into play. According to them, we know where to split the string to store the value as an element of the new list.
To do so, we can use the string method "split()"!
.split(";")
Now, the code looks like as follows:
1 2 3 4 | with open('file.csv', "r") as file: rows = [] for line in file: rows.append(line.strip("\n").split(";")) |
And the output:
[['Mercedes', 'AMG GT', '117000'], ['VW', 'Golf R', '41000'], ['BMW', 'X3', '44000']]
Here we have a so called "encapsulated list".
But how does it work exactly?
Let's go through each line of code:
rows =[]Here we are declaring an empty list.
for line in file:Here, we are reading the file line by line.
rows.append(line.strip("\n").split(";"))And here, all the "magic" happens...
With "rows.append", we append each line as a new element to the list "rows".
With "line.strip()", we remove all line breaks within the line.
So "strip()" processes each string, it was called from (in our case "line")
and returns a new string.
The method "split()" internally creates a new list and searches through the string it was called from. It looks for the passed string (in our case ";"), removes it and appends the new string to the new created list. If there are more then one found, this process will repeat accordning to the number of found strings.
So "Mercedes;AMG GT;117000" will become: "Mercedes", "AMG GT" and "117000" . These three elements will be added to the new created list. And this list in turn, will be added to the list "rows".
Now we can access each line easily by its index:
print(rows[0])
Output:
['Mercedes', 'AMG GT', '117000']
And if we want to access an element within each sublist, we simply specify a second index:
print(rows[0][0])
Output:
Mercedes
Writing CSV files
To write a csv file, is as easy as reading one. The only thing we have to be aware of, is, that we keep the format.
So let's write some code that writes a csv file:
1 2 3 4 5 6 7 8 | with open("file.csv", "w") as file: for y in range(0, len(rows)): for x in range(0, len(rows[y])): if x == len(rows[y])-1: file.write(rows[y][x]) else: file.write(rows[y][x] + ";") file.write("\n") |
When we run this code and open the file in a text editor like Notepad, we can see the exact same output:
Mercedes;AMG GT;117000 VW;Golf R;41000 BMW;X3;44000
But what does this code exactly do?
Let's step through each line:
with open("file.csv", "w") as file:Here we open the file in write mode.
for y in range(0, len(rows)):We step through each element in "rows".
The function "len(rows)" returns the number of elements within the list "rows".
for x in range(0, len(rows[y])):We step through each sublist in "rows".
"len(rows[y])" returns the number of elements within each sublist "rows".
if x == len(rows[y])-1:If it is the last entry on this line...
file.write(rows[y][x])... write the content of cell "[y][x]" to the file.
else:Otherwise...
file.write(rows[y][x]+";")... write the content of cell "[y][x]" and a following semicolon to the file.
file.write("\n")After each line, we have to write a line break into the file, otherwise, all entries would be on one big line.
And thats all the magic! Wasn't too hard, wasn't it? :)
If you have any questions, feel free to leave a comment.
Have a nice day!
Your Rednib
Comments
Post a Comment