[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Pipeline Production Problems
As you all should know by now, 9 computers are involved in the pipeline
processing.
There are 3 Windows machines collecting data, one per telescope.
There are 6 linux processors reducing the data. These include two dual
processor machines and two single machines.
From time to time one of the windows machines hangs. They are running
Windows 98. A reboot usually fixes the offending computer but sometimes
the linux end also hangs..
The way I have set up the pipeline, when the night's data taking is
completed start 6 programs. One each for early evening and late evening
data from each telescope.
Each program copies the images over the ethernet to a local work area, and
then starts processing.
From time to time one of the programs hangs. This is almost always due to
the Windows machine hanging. A reboot fixes the Windows machine, but then
sometimes the linux end is hung also. Attempts to communicate with the
formerly offending device gets a "device busy" error.
I can fix this by rebooting the linux computer. However, in the case of
the dual processor machines, one program might be running OK, I am forced
to wait for the working half to complete before starting the hung pipeline.
Is there a way to get communication with a Windows 98 computer restarted
without rebooting the linux machine?
I have looked at fstab, mtab and they are OK. Where else should I look?
Closing the offending window in the linux machine does not help. The
network stuff is working because I can communicate with the other computers
on the network. It is just that I can no longer communicate with the
computer where Windows hung, even though it is now OK.
Any suggestions would be appreciated. My present plan is to set up a spare
computer which will have all the pipelines available. I can then restart
the Windows machine and move the hung pipeline job to the spare computer.
Tom Droege