python中文件的读写——read_csv()读取文件
read_csv()读取文件
1.python读取文件的几种方式
- read_csv 从文件,url,文件型对象中加载带分隔符的数据。默认分隔符为逗号
- read_table 从文件,url,文件型对象中加载带分隔符的数据。默认分隔符为制表符(“ ”)
- read_fwf 读取定宽列格式数据(也就是没有分隔符)
- read_cliboard 读取剪切板中的数据,可以看做read_table的剪切板。在将网页转换为表格时很有用
2.读取文件的简单实现
程序代码:
df=pd.read_csv("D:/project/python_instruct/test_data1.csv")
print("用read_csv读取的csv文件:", df)
df=pd.read_table("D:/project/python_instruct/test_data1.csv", sep=",")
print("用read_table读取csv文件:", df)
df=pd.read_csv("D:/project/python_instruct/test_data2.csv", header=None)
print("用read_csv读取无标题行的csv文件:", df)
df=pd.read_csv("D:/project/python_instruct/test_data2.csv", names=["a", "b", "c", "d", "message"])
print("用read_csv读取自定义标题行的csv文件:", df)
names=["a", "b", "c", "d", "message"]
df=pd.read_csv("D:/project/python_instruct/test_data2.csv", names=names, index_col="message")
print("read_csv读取时指定索引:", df)
parsed=pd.read_csv("D:/project/python_instruct/test_data3.csv", index_col=["key1", "key2"])
print("read_csv将多个列做成一个层次化索引:")
print(parsed)
print(list(open("D:/project/python_instruct/test_data1.txt")))
result=pd.read_table("D:/project/python_instruct/test_data1.txt", sep="s+")
print("read_table利用正则表达式处理文件读取:")
print(result)
输出结果:
用read_csv读取的csv文件: a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
用read_table读取csv文件: a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
用read_csv读取无标题行的csv文件: 0 1 2 3 4
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
用read_csv读取自定义标题行的csv文件: a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
read_csv读取时指定索引: a b c d
message
hello 1 2 3 4
world 5 6 7 8
foo 9 10 11 12
read_csv将多个列做成一个层次化索引:
value1 value2
key1 key2
one a 1 2
b 3 4
c 5 6
d 7 8
two a 9 10
b 11 12
c 13 14
d 15 16
[" A B C
", "aaa -0.26 -0.1 -0.4
", "bbb -0.92 -0.4 -0.7
", "ccc -0.34 -0.5 -0.8
", "ddd -0.78 -0.3 -0.2"]
read_table利用正则表达式处理文件读取:
A B C
aaa -0.26 -0.1 -0.4
bbb -0.92 -0.4 -0.7
ccc -0.34 -0.5 -0.8
ddd -0.78 -0.3 -0.2
3.分块读取大型数据集
先看代码:
reslt=pd.read_csv("D:projectpython_instructweibo_network.txt")
print("原始文件:", result)
输出:
Traceback (most recent call last):
File "<ipython-input-5-6eb71b2a5e94>", line 1, in <module>
runfile("D:/project/python_instruct/Test.py", wdir="D:/project/python_instruct")
File "D:Anaconda3libsite-packagesspyderutilssitesitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "D:Anaconda3libsite-packagesspyderutilssitesitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, "exec"), namespace)
File "D:/project/python_instruct/Test.py", line 75, in <module>
reslt=pd.read_csv("D:projectpython_instructweibo_network.txt")
File "D:Anaconda3libsite-packagespandasioparsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "D:Anaconda3libsite-packagespandasioparsers.py", line 325, in _read
return parser.read()
File "D:Anaconda3libsite-packagespandasioparsers.py", line 815, in read
ret = self._engine.read(nrows)
File "D:Anaconda3libsite-packagespandasioparsers.py", line 1314, in read
data = self._reader.read(nrows)
File "pandasparser.pyx", line 805, in pandas.parser.TextReader.read (pandasparser.c:8748)
File "pandasparser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandasparser.c:9003)
File "pandasparser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandasparser.c:9731)
File "pandasparser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:9602)
File "pandasparser.pyx", line 1865, in pandas.parser.raise_parser_error (pandasparser.c:23325)
CParserError: Error tokenizing data. C error: out of memory
发现数据集大得已经超出内存。我们可以读取几行看看,如前10行:
result=pd.read_csv("D:projectpython_instructweibo_network.txt", nrows=10)
print("只读取几行:")
print(result)
输出结果:
1787443 413503687
0 0 296 3 1 10 1 12 1 13 1 14 1 16 ...
1 1 271 8 1 17 1 22 1 31 0 34 1 6742...
2 2 158 0 0 5 1 10 1 11 1 13 1 16 0...
3 3 413 0 1 5 1 194 1 354 1 3462 1 8...
4 4 142 1 0 5 1 7 1 11 1 14 1 18 1...
5 5 272 2 1 3 1 4 1 12 1 13 1 14 1...
6 6 59 9 1 13 1 46991 0 66930 0 85672...
7 7 131 4 1 11 1 20 1 24 1 26 0 30 ...
8 8 326 0 0 1 1 12 1 13 1 17 1 19 1...
9 9 12 0 0 6 1 10 1 13 1 18 0 466527...
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
- 上一篇:没有了
- 下一篇:没有了