性能批量加载从XML文件到MySQL的数据(Performance bulk-loading data from an XML file to MySQL)
将80GB的XML数据导入MySQL需要5天以上才能完成吗?
我目前正在导入一个大小约为80GB的XML文件,我正在使用的代码就在这个要点中 ,虽然一切正常,但它已连续运行了近5天,甚至还没有完成......
平均表大小大致为:
Data size: 4.5GB Index size: 3.2GB Avg. Row Length: 245 Number Rows: 20,000,000
如果需要更多信息,请告诉我!
服务器规格:
请注意,这是一个linode VPS
英特尔至强处理器L5520 - 四核 - 2.27GHZ 4GB总拉姆
XML示例
https://gist.github.com/2510267
谢谢!
在研究了关于这个问题的更多内容之后,这似乎是平均的,我发现这个答案描述了提高进口率的方法。
Should an import of 80GB's of XML data into MySQL take more than 5 days to complete?
I'm currently importing an XML file that is roughly 80GB in size, the code I'm using is in this gist and while everything is working properly it's been running for almost 5 straight days and its not even close to being done ...
The average table size is roughly:
Data size: 4.5GB Index size: 3.2GB Avg. Row Length: 245 Number Rows: 20,000,000
Let me know if more info is needed!
Server Specs:
Note this is a linode VPS
Intel Xeon Processor L5520 - Quad Core - 2.27GHZ 4GB Total Ram
XML Sample
https://gist.github.com/2510267
Thanks!
After researching more regarding this matter this seems to be average, I found this answer which describes ways to improve the import rate.
满意答案
有一件事有助于提供更多的事情,而不是每次一次。 我建议从每几百行开始一次提交,然后从那里调整。
此外,你正在做的事情,你做存在检查 - 转储; 它大大增加了您需要运行的查询数量。 相反,使用
ON DUPLICATE KEY UPDATE
(MySQL扩展,而不是标准兼容)来使重复INSERT
自动执行正确的操作。最后,考虑构建工具以将XML转换为适合与
mysqlimport
工具一起使用的文本形式,并使用该批量加载器。 这将干净地将XML解析所需的时间与数据库摄取所需的时间分开,并且还通过使用为此目的设计的工具来加速数据库导入本身(而不是INSERT
或UPDATE
命令,mysqlimport
使用专门的LOAD DATA INFILE
扩展)。One thing which will help a great deal is to commit less frequently, rather than once-per-row. I would suggest starting with one commit per several hundred rows, and tuning from there.
Also, the thing you're doing right now where you do an existence check -- dump that; it's greatly increasing the number of queries you need to run. Instead, use
ON DUPLICATE KEY UPDATE
(a MySQL extension, not standards-compliant) to make a duplicateINSERT
automatically do the right thing.Finally, consider building your tool to convert from XML into a textual form suitable for use with the
mysqlimport
tool, and using that bulk loader instead. This will cleanly separate the time needed for XML parsing from the time needed for database ingestion, and also speed the database import itself by using tools designed for the purpose (rather thanINSERT
orUPDATE
commands,mysqlimport
uses a specializedLOAD DATA INFILE
extension).
相关问答
更多有什么数据库专业书籍介绍?
MySQL:从非常格式化的XML文件中提取数据(MySQL: Extracting data from an unusually formatted XML file)
如何从php和mysql制作xml文件(How to make xml file from php and mysql)
性能批量加载从XML文件到MySQL的数据(Performance bulk-loading data from an XML file to MySQL)
SQLite是否支持批量加载(sort-then-indexing)?(Does SQLite support bulk-loading (sort-then-indexing)?)
MySQL LOAD XML导入性能(MySQL LOAD XML import performance)
XML PHP MYSQL数据解析(XML PHP MYSQL data parsing)
可以加载文件中的数据(mysql bulk uploads)读取压缩文件?(Can load data in file(mysql bulk uploads) read compressed files?)
在SSIS包中批量加载XML文件时出错(Error bulk loading XML file in SSIS package)
批量数据插入php mysql速度(Bulk data insert php mysql speed)
相关文章
更多(二)solr data import
Becoming a data scientist
移动MM failed to find resource file{mmiap.xml}
MySQL 5.6 my.cnf 配置文件详解
无法启动Hbase hbase-default.xml file seems to be for and old version of HBase
高性能MYSQL的构架与相关软件介绍
data-config
用‘button’跟‘text’组合代替‘file’,选择文件后点‘submit’,‘file’的值被清空
《Big Data Glossary》笔记
最新问答
更多绝地求生、荒野行动、香肠派对 哪个更好玩???(都是吃鸡类游戏)
如何在jQuery集合中选择第n个jQuery对象?(How to select the nth jQuery object in a jQuery collection?)
ASP NET使用jQuery和AJAX上传图像(ASP NET upload image with jQuery and AJAX)
SQL Server XML查询中包含名称空间的位置(SQL Server XML query with namespaces in the where exist)
宁夏银川永宁县望远镇哪里有修mp5的?
我想用更新的日期标记所有更新的行(I would like to mark all updated rows with the date that they have been updated)
郑州会计培训班
如何定位数组中的负数,并得到所有正数的总和?(How to target e negative number from an array, and get the sum of all positive numbers?)
在响应图像上叠加网格(Overlay grid on responsive image)
无法让POST在Azure网站上运行(Could not get POST to work on Azure Website)
Copyright ©2023 656463.com All Rights Reserved.滇ICP备2022006988号-50
本站部分内容来源于互联网,仅供学习和参考使用,请莫用于商业用途。如有侵犯你的版权,请联系我们,本站将尽快处理。谢谢合作!