今天在研究怎么把wordpress导出来的xml文件里面的文章分割出来,让每篇文章独立成为一个文件,然后发表于hexo。为什么选择用python,其热度就不说了,而且最近也在学习简单的文件处理。当然,这只是万里长征第一步,因为只是把- 和
之间的内容取了出来,后面还要做进一步处理。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
import re import linecache def fileParse(): inputfile = input('Input SourcFile:') fp = open(inputfile, 'r',encoding=':utf-8') number =[] lineNumber = 0 keyword = input('Slice Keyword:') outfilename = input('Outfilename:') for eachLine in fp: m = re.search(keyword, eachLine) if m is not None: number.append(lineNumber) lineNumber = lineNumber + 1 size = int(len(number)) for i in range(0,size-1): start = number[i] end = number[i+1] destLines = linecache.getlines(inputfile)[start+1:end-1] fp_w = open(outfilename + str(i)+'.txt','w',encoding='utf-8') for key in destLines: fp_w.write(key.replace(u'\xa0', u' ')) fp_w.close() if __name__ == "__main__": fileParse()
|