Python / IPython奇怪的不可重现列表索引超出范围错误(Python/IPython strange non reproducible list index out of range error)

我最近学习了一些Python以及如何将它应用到我的工作中。 我已成功编写了几个脚本,但我遇到了一个我无法弄清楚的问题。

我打开的文件大约有4000行,每行有两个制表符分隔的列。 在读取输入文件时,我收到索引错误,指出列表索引超出范围。 然而,虽然我每次都得到错误,但每次都不会在同一行上发生错误(因为,每次都会在不同的行上抛出错误!)。 因此,由于某种原因,它通常有效,但随后(看似)随机失败。

因为我上周才开始学习Python,所以我很难过。 我已经四处寻找同样的问题,但没有找到类似的东西。 此外,我不知道这是一个特定于语言或IPython的问题。 任何帮助将不胜感激!

input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()

output = open("output.txt", "w")

for each in input:
    splits = each.split("\t")
    changelist = list(splits[0])
    second = int(splits[1])

print second

if changelist[7] == ";":   
    changelist.insert(6, "000")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

elif changelist[8] == ";":   
    changelist.insert(6, "00")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

elif changelist[9] == ";":   
    changelist.insert(6, "0")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

else:
    #output.write(str("".join(changelist)))
    va = "".join(changelist)
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

output.close()

错误

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
     57     splits = each.split("\t")
     58     changelist = list(splits[0])
---> 59     second = int(splits[1])
     60 
     61     print second

IndexError: list index out of range

输入:

ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C  50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C  36

期望的输出:

ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr  12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C  50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C  36

I have recently been learning some Python and how to apply it to my work. I have written a couple of scripts successfully, but I am having an issue I just cannot figure out.

I am opening a file with ~4000 lines, two tab separated columns per line. When reading the input file, I get an index error saying that the list index is out of range. However, while I get the error every time, it doesn't happen on the same line every time (as in, it will throw the error on different lines everytime!). So, for some reason, it works generally but then (seemingly) randomly fails.

As I literally only started learning Python last week, I am stumped. I have looked around for the same problem, but not found anything similar. Furthermore I don't know if this is a problem that is language specific or IPython specific. Any help would be greatly appreciated!

input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()

output = open("output.txt", "w")

for each in input:
    splits = each.split("\t")
    changelist = list(splits[0])
    second = int(splits[1])

print second

if changelist[7] == ";":   
    changelist.insert(6, "000")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

elif changelist[8] == ";":   
    changelist.insert(6, "00")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

elif changelist[9] == ";":   
    changelist.insert(6, "0")
    va = "".join(changelist) 
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

else:
    #output.write(str("".join(changelist)))
    va = "".join(changelist)
    var = va + ("\t") + str(second)
    listtosort.append(var)
    output.write(var)

output.close()

The error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
     57     splits = each.split("\t")
     58     changelist = list(splits[0])
---> 59     second = int(splits[1])
     60 
     61     print second

IndexError: list index out of range

Input:

ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C  50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C  36

Desired output:

ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr  12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C  50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C  36
2023-03-28 20:03

满意答案

你得到IndexError的原因是你的输入文件显然不是完全用制表符分隔的。 这就是为什么当您尝试访问它时, splits[1]没有任何内容。

您的代码可以使用一些重构。 首先,你正在重复使用if -checks,这是不必要的。 这只是将cds0到7个字符,这可能不是你想要的。 我将以下内容放在一起,以演示如何重构您的代码,使其变得更加pythonic和干燥。 我无法保证它能够与您的数据集一起使用,但我希望它可以帮助您了解如何以不同的方式执行操作。

    to_sort = []
    # We can open two files using the with statement. This will also handle 
    # closing the files for us, when we exit the block.
    with open("count.txt", "r") as inp, open("output.txt", "w") as out:
        for each in inp:
           # Split at ';'... So you won't have to worry about whether or not
           # the file is tab delimited
           changed = each.split(";")

           # Get the value you want. This is called unpacking.
           # The value before '=' will always be 'ID', so we don't really care about it.
           # _ is generally used as a variable name when the value is discarded.
           _, value = changed[0].split("=")

           # 0-pad the desired value to 7 characters. Python string formatting
           # makes this very easy. This will replace the current value in the list.
           changed[0] = "ID={:0<7}".format(value)

           # Join the changed-list with the original separator and
           # and append it to the sort list.
           to_sort.append(";".join(changed))

       # Write the results to the file all at once. Your test data already
       # provided the newlines, you can just write it out as it is.
       output.writelines(to_sort)

       # Do what else you need to do. Maybe to_list.sort()?

您会注意到,此代码将代码减少到8行,但实现完全相同的事情,不会重复,并且很容易理解。

请阅读PEP8 ,蟒蛇之 ,并阅读官方教程


The reason you're getting the IndexError is that your input-file is apparently not entirely tab delimited. That's why there is nothing at splits[1] when you attempt to access it.

Your code could use some refactoring. First of all you're repeating yourself with the if-checks, it's unnecessary. This just pads the cds0 to 7 characters which is probably not what you want. I threw the following together to demonstrate how you could refactor your code to be a little more pythonic and dry. I can't guarantee it'll work with your dataset, but I'm hoping it might help you understand how to do things differently.

    to_sort = []
    # We can open two files using the with statement. This will also handle 
    # closing the files for us, when we exit the block.
    with open("count.txt", "r") as inp, open("output.txt", "w") as out:
        for each in inp:
           # Split at ';'... So you won't have to worry about whether or not
           # the file is tab delimited
           changed = each.split(";")

           # Get the value you want. This is called unpacking.
           # The value before '=' will always be 'ID', so we don't really care about it.
           # _ is generally used as a variable name when the value is discarded.
           _, value = changed[0].split("=")

           # 0-pad the desired value to 7 characters. Python string formatting
           # makes this very easy. This will replace the current value in the list.
           changed[0] = "ID={:0<7}".format(value)

           # Join the changed-list with the original separator and
           # and append it to the sort list.
           to_sort.append(";".join(changed))

       # Write the results to the file all at once. Your test data already
       # provided the newlines, you can just write it out as it is.
       output.writelines(to_sort)

       # Do what else you need to do. Maybe to_list.sort()?

You'll notice that this code is reduces your code down to 8 lines but achieves the exact same thing, does not repeat itself and is pretty easy to understand.

Please read the PEP8, the Zen of python, and go through the official tutorial.

相关问答

更多

Wordnet synset - 奇怪的列表索引超出范围错误(Wordnet synset - strange list index out of range Error)

最有可能的是, wn.synsets(placementItem, 'a')返回了一个空列表。 如果placementItem不在wordnet中,则会发生这种情况。 因此,当您执行iterationSet[0] ,它会抛出超出范围的异常。 相反,您可以将支票更改为: if iterationSet: print( .... .... 代替 if iterationSet[0]: print(... Most likely, wn.synsets(placementIte...

Python / IPython奇怪的不可重现列表索引超出范围错误(Python/IPython strange non reproducible list index out of range error)

你得到IndexError的原因是你的输入文件显然不是完全用制表符分隔的。 这就是为什么当您尝试访问它时, splits[1]没有任何内容。 您的代码可以使用一些重构。 首先,你正在重复使用if -checks,这是不必要的。 这只是将cds0到7个字符,这可能不是你想要的。 我将以下内容放在一起,以演示如何重构您的代码,使其变得更加pythonic和干燥。 我无法保证它能够与您的数据集一起使用,但我希望它可以帮助您了解如何以不同的方式执行操作。 to_sort = [] # We...

列表索引超出Python 2的范围错误(list index out of range error with Python 2)

matches_counts={} velvet_folders = glob.glob(velvet_output +'/*/') for folder in velvet_folders: print folder xml_file=glob.glob(folder + '/*.xml')[0] matches = parsing_blast(xml_file,opts.length_cutoff) 通过在velvet_folder行中添加第二个'/',我解决了这个问...

Python Pandas Index错误:列表索引超出范围(Python Pandas Index error: List Index out of range)

您的数据框中很可能包含无效的电子邮件。 您可以使用以识别这些 df[~df.Email_Address.astype(str).str.contains('@')] 您可以使用此方法来提取域 def extract_domain(email): email_domain = email.split('@') if len(email_domain) > 1: return email_domain[1] df['domain'] = df['email_lowe...

Python错误〜列表索引超出范围(Python Error~ List index out of range)

你的while条件应该被修改: def getSublists(L,n): List=L sublists=[] for i in range(len(L)-(n-1)): print ['i: ', i] ii=0 sub=[] while ii<= n-1: print ['ii: ', ii] a=List[ii+i] sub.app...

Python:列表索引中的列表超出范围错误(Python: list inside a list index out of range error)

您不应该索引到一个空列表。 您应该在列表本身上调用append 。 改变这个: baseList[x][y].append(values[count]) 为此: baseList[x].append(values[count]) 结果: [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] 看它在线工作: ideone You shouldn't index into an empty list. You should call append on t...

Python中的索引超出范围错误(IndexError:列表索引超出范围)(Index out of range error in Python (IndexError: list index out of range))

这应该有所帮助: layerZ = [layer_1,layer_2,layer_3,layer_4,layer_5,layer_6,layer_7,layer_8,layer_9,layer_10,layer_11,layer_12,layer_13] layerZ_total = [] layerZ_sp = [] layerZ_nonSp = [] for x in range(0, 12): layerZ_total.append(np.size(layerZ[x])) ...

python 3文件中的“列表索引超出范围”错误(“List index out of range” error in python 3 file)

我没有对此进行测试,但看起来您需要将大于号更换为大于或等于,因为它可能会使1指数过高。 if scrtrankcounter >= len(scrtlst): scrtrankcounter = 0 currentscrtrank += 1 + scrtlst[scrtrankcounter] 例如,如果scrtlst的长度为5,则最高索引为4,因此如果您尝试scrtlst[5]则会出错。 I've not tested this, but it looks l...

错误:列表索引超出范围Python(Error: List Index out of Range Python)

缺乏更多的上下文我会猜测你的意思是调用self.shuffle()而不是self.shuffle 。 另外,如果deck只是一个列表,你可以用最后两行代替: self.hand.append(self.deck.pop(0)) Lacking more context I'm going to guess that you mean to be calling self.shuffle() and not self.shuffle. Also, if deck is just a list yo...

Python:IndexError:列表索引超出范围错误(Python: IndexError: list index out of range Error)

很高兴看到一些聪明的蔬菜编程。 首先,你的问题。 就像@Vasiliy所说,你有3个指数。 n ,因为你可以用你的条件来保护它。 1很好,因为enumerate总是生成2件事。 那只是留下了m 。 这是你的问题。 假设你在strlist有N元素。 对于strlist每个元素e ,您可以对其应用split() 。 e.split()的元素数量并不总是等于N m守时条件是针对N而不是len(e.split()) ,因此指数超出范围。 要解决此问题,请首先拆分字符串,然后循环遍历它。 当你在它的时候,不...

相关文章

更多

Python 列表(list)操作

列表就像java里的collection,所具有的特性也要比元组更多,更灵活,其character总结 ...

python2和python3的区别

python2和python3的区别,1.性能 Py3.0运行 pystone benchmark的速 ...

Python 写的Hadoop小程序

该程序是在python2.3上完成的,python版本间有差异。 Mapper: import sys ...

【转帖】Python 资源索引

原文地址:http://wiki.woodpecker.org.cn/moin/ObpLovelyPy ...

Python资源索引 【转载】

原文地址:http://blog.chinaunix.net/uid-25525723-id-3630 ...

利用SolrJ操作solr API完成index操作

使用SolrJ操作Solr会比利用httpClient来操作Solr要简单。SolrJ是封装了http ...

使用mybatis执行sql的时候为什么会出现Parameter index out of range (1 > number of parameters, which is 0)?

写like语句的时候 一般都会写成 like '% %' 在mybatis里面写就是应该是 ...

Guava Range类-范围处理

C upperEndpoint()返回此范围的上限端点

spark--scala-douban模仿做了个python的版本

初识spark-基本概念和例子 | _yiihsia[互联网后端技术] 初 ...

Python 字符串操作

Python 字符串操作,字符串序列用于表示和存储文本,python中字符串是不可变的,一旦声明,不能 ...

最新问答

更多

绝地求生、荒野行动、香肠派对 哪个更好玩???(都是吃鸡类游戏)

PC上的绝地求生,是最早也是最火的大逃杀游戏。 荒野行动是网易抄袭蓝洞绝地求生制作的手游。相似度90%,还有他一起出的终结折2,这2款正在被蓝洞告,打官司呢。 手游上的绝地求生有2部都是蓝洞授权(收钱)给腾讯开发的正版ID手游。所以跟PC上做的一模一样,蓝洞也没话说。 加上吃鸡国服也是腾讯独家代理,所以根本没有什么可说的。只要这个类型的 过于相似的,腾讯都可以借蓝洞之手起诉。打压同行是国内BAT最爱干的事嘛! 香肠派对画风虽然不一样,但核心玩法还是跟人家正版的一样的,同样也是没有被授权的。 98

如何在jQuery集合中选择第n个jQuery对象?(How to select the nth jQuery object in a jQuery collection?)

你可以使用eq : var rootElement = $('.grid').find('.box').eq(0); rootElement.find('.a'); /* Use chaining to do more work */ You can use eq: var rootElement = $('.grid').find('.box').eq(0); rootElement.find('.a'); /* Use chaining to do more work */

ASP NET使用jQuery和AJAX上传图像(ASP NET upload image with jQuery and AJAX)

您可以自己手动设置FormData键和值。 Upload 创建FormData并设置新的键/值 $("#btnUpload").on("click", function(e) { e.preventDefault(); var file = $("#imguploader").get(0).file

SQL Server XML查询中包含名称空间的位置(SQL Server XML query with namespaces in the where exist)

您可能希望使用#temp.identXml.query而不是#temp.identXml.query 。 您可以在这里阅读更多相关信息SQL Server XML exists() 我相信你也可以像这样使用它 Select #temp.identXml.value('(/*:PersonIdentity/*:MasterIndexes/*:PersonIndex/*:SourceIndex)[1]','varchar(100)') as Ident ,#temp.identXml.value(

宁夏银川永宁县望远镇哪里有修mp5的?

胜利街有家电维修,电脑城,银川商场多得很…

我想用更新的日期标记所有更新的行(I would like to mark all updated rows with the date that they have been updated)

您可以使用更新后触发的触发器来执行此操作。 给出如下表: create table your_table (id int primary key, val int, last_update datetime) 每当您更新表中的内容时,此触发器将设置last_update值。 CREATE TRIGGER trigger_name ON your_table AFTER UPDATE AS BEGIN UPDATE your_table SET your_ta

郑州会计培训班

招生的,至于时间吗,就看你自己的时间段了,你可以致电0371-63300220.他们会帮你选择一下的。离你最近,最专业的培训班。

如何定位数组中的负数,并得到所有正数的总和?(How to target e negative number from an array, and get the sum of all positive numbers?)

只需创建一个条件来检查它是正数还是负数,然后定义一个空的数组negatives ,如果数字是负数,则将其推到负数组中,如果是正数,则将其添加到sum变量中,请查看下面的工作示例。 function SummPositive( numbers ) { var negatives = []; var sum = 0; for(var i = 0; i < numbers.length; i++) { if(numbers[i] < 0) { negati

在响应图像上叠加网格(Overlay grid on responsive image)

使用两个linear-gradient s,我们可以创建两个简单的线条,然后每隔n%重复一次background-size 。 它看起来像这样: background: linear-gradient(to bottom, #000 2px, transparent 2px), linear-gradient(to right, #000 2px, transparent 2px); background-size: 10%; 两个渐变创建两条相交的线,长度为百分比,如下所示: 使用默认的b

无法让POST在Azure网站上运行(Could not get POST to work on Azure Website)

最后我找到了答案......我不得不删除尾随的斜线! 我使用了“ https://example.com/api/messages/ ”,这将自动产生GET,无论我使用PostAsync还是PostAsJsonAsync。 使用“ https://example.com/api/messages”,GET和POST似乎都运行良好! Finally I've found the answer.... I had to remove the trailing slash! I've used "ht