Page 1 of 2

*shudder* i hate almost losing a server

Posted: Thu Jul 26, 2007 9:08 am
by SOAPboy
Some of you may know, Im a network admin for a casino.

Im on graves atm. Its a total snoozefest normally, but tonight, not so much.

First, i get a call about some computer crashing, no biggy. Cant be fixed, needs replaced, yadda yadda.

I get back to our server room, and our server that runs the entire casino floor is beeping, like crazy.

Panic mode.

Turns out, its just a HD going out. Company will be here tomorrow.


But jesus christ. Talk about scary shit.

Posted: Thu Jul 26, 2007 9:17 am
by AmIdYfReAk
yea, i hate it when that happeneds...

is it mirroring or?

Posted: Thu Jul 26, 2007 9:23 am
by SOAPboy
Yeah mirroring. Its all good. Its just the beeping it makes is the same sound it would make if it was going to go down.

Its hooked up to our "outside" Company, so they knew about it. They just failed to mention it to me. :olo:

Posted: Thu Jul 26, 2007 10:10 am
by Nightshade
Can't replace a HDD? SUPAR ADMIN!

Posted: Thu Jul 26, 2007 10:55 am
by Doombrain
i think he's talking about a blade, not a hd?

Posted: Thu Jul 26, 2007 11:28 am
by SOAPboy
Doombrain wrote:i think he's talking about a blade, not a hd?
HD in one of the many racks of HDs, so yes technicly.
Its an odd setup. Not like normal rack servers, but similar.
Nightshade wrote:Can't replace a HDD? SUPAR ADMIN!
Theres a reason we pay the big money for these servers bud. So we dont have to stock spare HDs everywhere..

Its about time something finally went out on it tho. its not had a single reboot, nor a second of downtime in 3 years? prolly longer. Konami + Linux is an amazing beast. Fucking thing never goes down. HD goes out, big whoop, some dude shows up that day (or next morning in this case), plops another one in, and thats that.

Our windows servers on the other hand. Lmfao. Were lucky to see 1 month on those peices of shit.


Mind spelling errors and shit. Its 6am -_-

Posted: Thu Jul 26, 2007 11:37 am
by Nightshade
Yeah, how's that having no spares thing working out for you?

Posted: Thu Jul 26, 2007 11:51 am
by +JuggerNaut+
nay0k wrote:Yeah, how's that being a gay retard working out for you?
not here.

Posted: Thu Jul 26, 2007 11:57 am
by SOAPboy
Nightshade wrote:Yeah, how's that having no spares thing working out for you?
Great.

We have spares for other servers.


You just need to understand the industry im in. Casinos arnt office buildings, and these servers arnt something "we" need to be fooling with. Hense, the hiring of companys to do it for us.

Its not "Bobs law firm" here. Its a fucking casino. >_<


Now lets assume for 1/2 a second, that it actually goes tits up.
I make 1 phone call, and someones here within the hour. Id be running damage control for all the morons freaking out on the floor :P

Posted: Thu Jul 26, 2007 12:04 pm
by Grudge
great

Posted: Thu Jul 26, 2007 12:37 pm
by Nightshade
SOAPboy wrote:
Nightshade wrote:Yeah, how's that having no spares thing working out for you?
Great.

We have spares for other servers.


You just need to understand the industry im in. Casinos arnt office buildings, and these servers arnt something "we" need to be fooling with. Hense, the hiring of companys to do it for us.

Its not "Bobs law firm" here. Its a fucking casino. >_<


Now lets assume for 1/2 a second, that it actually goes tits up.
I make 1 phone call, and someones here within the hour. Id be running damage control for all the morons freaking out on the floor :P
I guess I'm not really understanding your position then. Are the servers your responsibility? Is that just that one not?

Posted: Thu Jul 26, 2007 1:11 pm
by SOAPboy
Nightshade wrote:
SOAPboy wrote:
Nightshade wrote:Yeah, how's that having no spares thing working out for you?
Great.

We have spares for other servers.


You just need to understand the industry im in. Casinos arnt office buildings, and these servers arnt something "we" need to be fooling with. Hense, the hiring of companys to do it for us.

Its not "Bobs law firm" here. Its a fucking casino. >_<


Now lets assume for 1/2 a second, that it actually goes tits up.
I make 1 phone call, and someones here within the hour. Id be running damage control for all the morons freaking out on the floor :P
I guess I'm not really understanding your position then. Are the servers your responsibility? Is that just that one not?
All of them are, to an extent.

That "one" is just, well, something we dont generally fuck with due to its uptime, and its importance to the firm as a whole.

If we have to do shutdowns, then yes we have to screw with it.


All the other servers tho, im responsible for.

Posted: Thu Jul 26, 2007 1:22 pm
by Qr7
why the fuck does the whole thing run on one machine. thats a terrible design.

Posted: Thu Jul 26, 2007 1:31 pm
by Giraffe }{unter
lol try being mid day working for a company with a 1 - 2 million a day shipping goal and having the entire data center shut down without notice.

On any normal day this would not be an issue. We have a 100KVA ups running our data center backed by 2 generators.

A generator is only as good as the person who maintains it. Turns out our Maint department forgot to lubricate the transfer switch. The generators were running but not feeding power. They tinkered with it for a while then came in and said you have about 5 minutes to shu....

poof lights go out servers go down "I guess it was less than 5"

Dammage to the UPS from being drained caused it to fail the following day when we took another hit taking down our data center yet again.

Talk about stress!

Nevermind

Posted: Thu Jul 26, 2007 1:32 pm
by LawL
I'm far too hetero for this thread. Carry on.

Posted: Thu Jul 26, 2007 1:42 pm
by Fender
Giraffe }{unter wrote:On any normal day this would not be an issue. We have a 100KVA ups running our data center backed by 2 generators.
Our entire building is powered by a massive flywheel. We have a rather large diesel engine that spins the flywheel if we lose power from the electric company. Our UPS are used to bridge the short gap between loss of power and the diesel start up.

Posted: Thu Jul 26, 2007 2:37 pm
by SOAPboy
Giraffe }{unter wrote:lol try being mid day working for a company with a 1 - 2 million a day shipping goal and having the entire data center shut down without notice.

On any normal day this would not be an issue. We have a 100KVA ups running our data center backed by 2 generators.

A generator is only as good as the person who maintains it. Turns out our Maint department forgot to lubricate the transfer switch. The generators were running but not feeding power. They tinkered with it for a while then came in and said you have about 5 minutes to shu....

poof lights go out servers go down "I guess it was less than 5"

Dammage to the UPS from being drained caused it to fail the following day when we took another hit taking down our data center yet again.

Talk about stress!

Nevermind
lmfao i hear ya on UPS bullshit.
Some crazy ass ammount of money, for 15 min of extra uptime. And if the generator dont kick on? Casinos shut the fuck down. And god forbid our AC for the server room actually comes back up when we kick over to generator power :olo:


Qr7 wrote:why the fuck does the whole thing run on one machine. thats a terrible design.
You evidently dont understand how casino floors work.

1 server runs the slot machines, and any other "ticket" in and out setup. Example, Those silly virtual blackjack tables.

Its very simple. It makes sense, and there IS fail safe setups involved.

Theres a reason the thing runs 24 some odd hard drives for very little data.

Look into gaming systems. Its simple stuff. Now if we were a huge las vegas size casino, we wouldnt be on 1 server. And here in the very near future were doubling everything.

Posted: Thu Jul 26, 2007 4:03 pm
by Giraffe }{unter
Fender wrote:
Giraffe }{unter wrote:On any normal day this would not be an issue. We have a 100KVA ups running our data center backed by 2 generators.
Our entire building is powered by a massive flywheel. We have a rather large diesel engine that spins the flywheel if we lose power from the electric company. Our UPS are used to bridge the short gap between loss of power and the diesel start up.
That's the same situation our UPS was drained due to the generator not transferring once it came up to speed. On a normal day the UPS should take a 1 minute hit, then regulate the Generator feed if necessary.

We're switching to individual Rack UPS systems now, Each UPS will be able to power down it's servers in the event of generator faliure. 10 guys shutting down 70+ servers properly in 5 minutes is just not possible.

Posted: Thu Jul 26, 2007 4:25 pm
by SOAPboy
Giraffe }{unter wrote:
Fender wrote:
Giraffe }{unter wrote:On any normal day this would not be an issue. We have a 100KVA ups running our data center backed by 2 generators.
Our entire building is powered by a massive flywheel. We have a rather large diesel engine that spins the flywheel if we lose power from the electric company. Our UPS are used to bridge the short gap between loss of power and the diesel start up.
That's the same situation our UPS was drained due to the generator not transferring once it came up to speed. On a normal day the UPS should take a 1 minute hit, then regulate the Generator feed if necessary.

We're switching to individual Rack UPS systems now, Each UPS will be able to power down it's servers in the event of generator faliure. 10 guys shutting down 70+ servers properly in 5 minutes is just not possible.
Thank christ were only running 8ish. Its not to bad to shut down those 1-3 manned in 15 min.

And i really would like to look into other UPS solutions. this big heap of metal in the center of our room is obnoxious.

Posted: Thu Jul 26, 2007 6:08 pm
by ^misantropia^
Qr7 wrote:why the fuck does the whole thing run on one machine. thats a terrible design.
I didn't read the whole thread (yet) but to answer your post: heaps of applications don't scale beyond a single-machine setup. People* who think it's just a matter of hooking up a couple more servers are plain and demonstrably wrong.

* In my professional life this often equates to "managers" or "sales reps".

Posted: Thu Jul 26, 2007 6:12 pm
by Foo
Giraffe }{unter wrote:We're switching to individual Rack UPS systems now, Each UPS will be able to power down it's servers in the event of generator faliure. 10 guys shutting down 70+ servers properly in 5 minutes is just not possible.
We have 67 servers, 2 or 3 guys if we're lucky, a single unmanaged UPS with no alert generating system. Oh and no emergency downing plan.

We're so fucked.

Re: *shudder* i hate almost losing a server

Posted: Thu Jul 26, 2007 6:59 pm
by duffman91
SOAPboy wrote:Some of you may know, Im a network admin for a casino.

Im on graves atm. Its a total snoozefest normally, but tonight, not so much.

First, i get a call about some computer crashing, no biggy. Cant be fixed, needs replaced, yadda yadda.

I get back to our server room, and our server that runs the entire casino floor is beeping, like crazy.

Panic mode.

Turns out, its just a HD going out. Company will be here tomorrow.


But jesus christ. Talk about scary shit.
What city and casino group do you work for that has such a weak infrastructure for a casino floor?

Vegas has Several AS/400 servers with multiple backbones that control the entirety of each gaming resort group of casinos. On top of this, all System i servers have support contracts with IBM. The standard is for the IBM tech to show up at the data center before anybody notices anything went down.

Just curious....

Posted: Thu Jul 26, 2007 7:14 pm
by Qr7
^misantropia^ wrote:
Qr7 wrote:why the fuck does the whole thing run on one machine. thats a terrible design.
I didn't read the whole thread (yet) but to answer your post: heaps of applications don't scale beyond a single-machine setup. People* who think it's just a matter of hooking up a couple more servers are plain and demonstrably wrong.

* In my professional life this often equates to "managers" or "sales reps".
and heaps of applications are shit. if you have mission critical applications such as this, you should design it to scale across multiple machines, or at least to have a failback.

Where I work, if we lost power in a DC, another DC would pick right up. I can go unplug a whole rack and no one would really care. at any point we can have up to 5% of our machines be down and no one really cares.

Hardware is cheap, and building parallel communication into an application is getting easier and easier. So don't give me this 'apps don't scale'.

Do you think your bank runs on 1 server? lol.

Posted: Thu Jul 26, 2007 7:18 pm
by Nightshade
lol, Queer7's upset again. :olo:

Posted: Thu Jul 26, 2007 7:32 pm
by Qr7
Nightshade wrote:lol, Queer7's upset again. :olo:
once again you add nothing to the conversation and resort to attacks.