Hudson jobs missing after a crash!? Restore them from the ashes

aurelien.pupier's picture
aurelien.pupier
Blog Categories: 

This morning, the VM containing the master of our Hudson instance froze. I "powered it off" and then restarted...ah, Hudson restarted and the list of jobs appeared. No, oh no, not all the jobs! Some of them vanished. Noooooo...

Wait, don't panic, this blog post provides an explanation of how to raise Hudson jobs from their ashes!

How to find the issue

On the main Hudson page, go to Manage Hudson --> System Log --> All Hudson logs

You will find this kind of error:

SEVERE: Failed Loading job XXX-JOBNAME-XXX
hudson.util.IOException2: /var/lib/hudson/jobs/XXX-JOBNAME-XXX/nextBuildNumber doesn't contain a number
at hudson.model.Job.onLoad(Job.java:369)
at hudson.model.AbstractProject.onLoad(AbstractProject.java:342)
at hudson.model.BaseBuildableProject.onLoad(BaseBuildableProject.java:102)
at hudson.model.Items.load(Items.java:117)
at hudson.model.Hudson$13.run(Hudson.java:2368)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:146)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:259)
at hudson.model.Hudson$4.runTask(Hudson.java:698)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:187)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:94)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:493)
at java.lang.Integer.parseInt(Integer.java:514)
at hudson.model.Job.onLoad(Job.java:366)
... 12 more
In fact, the file nextBuildNumber mentioned in the stack is empty.

How to restore jobs

It is really simple (but can be really tedious if you have a lot of missing jobs).

Put a number in nextBuildNumber files for each job mentioned in the logs.

Be sure to put a number higher than the last job run.

Potential improvements

It would be nice to have the job visible in the job list even if the nextBuildNumber file is corrupted. Perhaps we could put a warning decorator on this job, and also put an arbitrary value in this file from the UI.

I suppose the files were corrupted because the jobs were running when the VM crashed. So, it seems that this file is empty when a job is running. Could we prevent it from being "empty" for too long, and so minimize the opportunity for corruption in the event of a crash?

Add your 2 cents!

To continue this discussion, come on over to the Eclipse forum and share your ideas. I opened a topic just for this :-)

Note: you might also have issues with fingerprints, but it shouldn't be a blocking point. See this topic on the Eclipse forum for more details.

Comments

Submitted by HarleyFlynn on Fri, 04/12/2019 - 13:48

All you need is a just nice resume — nothing else. And here there are two ways. First, you can learn about resume writing tips on any suitable resource. Why not? Read about keywords, structure, and words choosing. And you would certainly write some text. And there is another way. You can just click the link boomessays . And choose one of that resource. Each resource is supposed to be one of the best writing resources. You need to choose which one do you prefer.

Notifications