入门客AI创业平台(我带你入门,你带我飞行)
博文笔记

python中文件的读写——read_csv()读取文件

创建时间:2017-03-11 投稿人: 浏览次数:12936

read_csv()读取文件

1.python读取文件的几种方式

  • read_csv 从文件,url,文件型对象中加载带分隔符的数据。默认分隔符为逗号
  • read_table 从文件,url,文件型对象中加载带分隔符的数据。默认分隔符为制表符(“ ”)
  • read_fwf 读取定宽列格式数据(也就是没有分隔符)
  • read_cliboard 读取剪切板中的数据,可以看做read_table的剪切板。在将网页转换为表格时很有用

2.读取文件的简单实现

程序代码:

df=pd.read_csv("D:/project/python_instruct/test_data1.csv")
print("用read_csv读取的csv文件:", df)
df=pd.read_table("D:/project/python_instruct/test_data1.csv", sep=",")
print("用read_table读取csv文件:", df)

df=pd.read_csv("D:/project/python_instruct/test_data2.csv", header=None)
print("用read_csv读取无标题行的csv文件:", df)
df=pd.read_csv("D:/project/python_instruct/test_data2.csv", names=["a", "b", "c", "d", "message"])
print("用read_csv读取自定义标题行的csv文件:", df)

names=["a", "b", "c", "d", "message"]
df=pd.read_csv("D:/project/python_instruct/test_data2.csv", names=names, index_col="message")
print("read_csv读取时指定索引:", df)

parsed=pd.read_csv("D:/project/python_instruct/test_data3.csv", index_col=["key1", "key2"])
print("read_csv将多个列做成一个层次化索引:")
print(parsed)

print(list(open("D:/project/python_instruct/test_data1.txt")))
result=pd.read_table("D:/project/python_instruct/test_data1.txt", sep="s+")
print("read_table利用正则表达式处理文件读取:")
print(result)

输出结果:

用read_csv读取的csv文件:    a   b   c   d message
0  1   2   3   4   hello
1  5   6   7   8   world
2  9  10  11  12     foo
用read_table读取csv文件:    a   b   c   d message
0  1   2   3   4   hello
1  5   6   7   8   world
2  9  10  11  12     foo
用read_csv读取无标题行的csv文件:    0   1   2   3      4
0  1   2   3   4  hello
1  5   6   7   8  world
2  9  10  11  12    foo
用read_csv读取自定义标题行的csv文件:    a   b   c   d message
0  1   2   3   4   hello
1  5   6   7   8   world
2  9  10  11  12     foo
read_csv读取时指定索引:          a   b   c   d
message               
hello    1   2   3   4
world    5   6   7   8
foo      9  10  11  12
read_csv将多个列做成一个层次化索引:
           value1  value2
key1 key2                
one  a          1       2
     b          3       4
     c          5       6
     d          7       8
two  a          9      10
     b         11      12
     c         13      14
     d         15      16
["      A     B    C 
", "aaa -0.26 -0.1 -0.4
", "bbb -0.92 -0.4 -0.7
", "ccc -0.34 -0.5 -0.8
", "ddd -0.78 -0.3 -0.2"]
read_table利用正则表达式处理文件读取:
        A    B    C
aaa -0.26 -0.1 -0.4
bbb -0.92 -0.4 -0.7
ccc -0.34 -0.5 -0.8
ddd -0.78 -0.3 -0.2

3.分块读取大型数据集

先看代码:

reslt=pd.read_csv("D:projectpython_instructweibo_network.txt")
print("原始文件:", result)

输出:

Traceback (most recent call last):

  File "<ipython-input-5-6eb71b2a5e94>", line 1, in <module>
    runfile("D:/project/python_instruct/Test.py", wdir="D:/project/python_instruct")

  File "D:Anaconda3libsite-packagesspyderutilssitesitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "D:Anaconda3libsite-packagesspyderutilssitesitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, "exec"), namespace)

  File "D:/project/python_instruct/Test.py", line 75, in <module>
    reslt=pd.read_csv("D:projectpython_instructweibo_network.txt")

  File "D:Anaconda3libsite-packagespandasioparsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "D:Anaconda3libsite-packagespandasioparsers.py", line 325, in _read
    return parser.read()

  File "D:Anaconda3libsite-packagespandasioparsers.py", line 815, in read
    ret = self._engine.read(nrows)

  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1314, in read
    data = self._reader.read(nrows)

  File "pandasparser.pyx", line 805, in pandas.parser.TextReader.read (pandasparser.c:8748)

  File "pandasparser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandasparser.c:9003)

  File "pandasparser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandasparser.c:9731)

  File "pandasparser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:9602)

  File "pandasparser.pyx", line 1865, in pandas.parser.raise_parser_error (pandasparser.c:23325)

CParserError: Error tokenizing data. C error: out of memory

发现数据集大得已经超出内存。我们可以读取几行看看,如前10行:

result=pd.read_csv("D:projectpython_instructweibo_network.txt", nrows=10)
print("只读取几行:")
print(result)

输出结果:

                                  1787443	413503687
0  0	296	3	1	10	1	12	1	13	1	14	1	16	...
1  1	271	8	1	17	1	22	1	31	0	34	1	6742...
2  2	158	0	0	5	1	10	1	11	1	13	1	16	0...
3  3	413	0	1	5	1	194	1	354	1	3462	1	8...
4  4	142	1	0	5	1	7	1	11	1	14	1	18	1...
5  5	272	2	1	3	1	4	1	12	1	13	1	14	1...
6  6	59	9	1	13	1	46991	0	66930	0	85672...
7  7	131	4	1	11	1	20	1	24	1	26	0	30	...
8  8	326	0	0	1	1	12	1	13	1	17	1	19	1...
9  9	12	0	0	6	1	10	1	13	1	18	0	466527...
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
  • 上一篇:没有了
  • 下一篇:没有了
未上传头像