Tags:
create new tag
,
view all tags
---+ Queda temporaria no Cluster ---++Description Sao 10h56. O ganglia afirma que somente a spg00 esta up. O painel frontal da SPRAID esta piscando. Entretanto o resultado do condor : <pre> [mdias@sprace mdias]$ ssh spgrid '. /OSG/setup.sh ;condor_status' Name OpSys Arch State Activity LoadAv Mem ActvtyTime vm1@node01.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node01.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:43 vm1@node02.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node02.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:41 vm1@node03.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node03.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:39 vm1@node04.gr LINUX INTEL Unclaimed Idle 0.000 500 0+00:24:57 vm2@node04.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:41 vm1@node05.gr LINUX INTEL Unclaimed Idle 0.000 500 0+21:49:34 vm2@node05.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:05 vm1@node06.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node06.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:39 vm1@node07.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node07.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:40 vm1@node08.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node08.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:37 vm1@node09.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:39:57 vm2@node09.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:05 vm1@node10.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node10.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:40 vm1@node11.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:05 vm2@node11.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:36 vm1@node12.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:05 vm2@node12.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:39 vm1@node13.gr LINUX INTEL Unclaimed Idle 0.000 500 0+05:44:37 vm2@node13.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:05 vm1@node14.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node14.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:40 vm1@node15.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:25:04 vm2@node15.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:25:35 vm1@node16.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node16.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:35 vm1@node17.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node17.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:39 vm1@node18.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:35:04 vm2@node18.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:35:37 vm1@node21.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:30:05 vm2@node21.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:30:41 vm1@node22.gr LINUX INTEL Unclaimed Idle 0.000 500 0+01:30:04 vm2@node22.gr LINUX INTEL Unclaimed Idle 0.000 500 1+01:30:38 vm1@node23.gr LINUX INTEL Unclaimed Idle 0.000 1003 0+01:30:04 vm2@node23.gr LINUX INTEL Unclaimed Idle 0.000 1003 1+01:30:35 vm1@spgrid.if LINUX INTEL Unclaimed Idle 1.000 1003 0+02:10:04 vm2@spgrid.if LINUX INTEL Unclaimed Idle 10.460 1003 1+02:10:53 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 44 0 0 44 0 0 0 Total 44 0 0 44 0 0 0 </pre> Os nos aceitam ping <pre> [root@sprace:root]# ping node38 PING node38.cluster (192.168.1.38) from 192.168.1.200 : 56(84) bytes of data. 64 bytes from node38.cluster (192.168.1.38): icmp_seq=1 ttl=64 time=0.193 ms 64 bytes from node38.cluster (192.168.1.38): icmp_seq=2 ttl=64 time=0.190 ms --- node38.cluster ping statistics --- 2 packets transmitted, 2 received, 0% loss, time 999ms rtt min/avg/max/mdev = 0.190/0.191/0.193/0.013 ms </pre> mas entrar via ssh nao. Na spraid <pre> [mdias@spraid mdias]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 2063536 599940 1358772 31% / none 1027720 0 1027720 0% /dev/shm /dev/sda7 1035660 34728 948324 4% /tmp /dev/sda5 10317828 2196156 7597556 23% /usr /dev/sda8 15346304 1444488 13122264 10% /usr/local /dev/sda6 2063504 413860 1544824 22% /var /dev/sdb1 1833096736 92955700 1647025088 6% /raid0 /dev/sdc1 1833096736 963934088 776046700 56% /raid1 /dev/sdd1 1730092600 264919452 1377289568 17% /raid2 /dev/sde1 1730092600 225076752 1417132268 14% /raid3 /dev/sdf1 1730092600 208326584 1433882436 13% /raid4 /dev/sdg1 1730092600 220788532 1421420488 14% /raid5 spdc00:/pnfsdoors 400000 80000 284000 22% /pnfs/if.usp.br [mdias@spraid mdias]$ free total used free shared buffers cached Mem: 2055440 2038452 16988 0 581612 1175224 -/+ buffers/cache: 281616 1773824 Swap: 4192956 12724 4180232 </pre> Tambem estamos down no http://cms-project-phedex.web.cern.ch/cms-project-phedex/cgi-bin/browser. ---++Updates 11h05. Nao mexi em nada e estamos ok novamente no Phedex e no ganglia, mas extremamente instavel, com alguns nodes caindo de tempos em tempos. Os logs estao ok. A spg00 esta " pingavel" mas atingiu o pico de utilizacao,. Estou logado na spraid <pre> [mdias@spraid mdias]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 2.0G 586M 1.3G 31% / none 1004M 0 1004M 0% /dev/shm /dev/sda7 1012M 34M 927M 4% /tmp /dev/sda5 9.9G 2.1G 7.3G 23% /usr /dev/sda8 15G 1.4G 13G 10% /usr/local /dev/sda6 2.0G 405M 1.5G 22% /var /dev/sdb1 1.8T 89G 1.6T 6% /raid0 /dev/sdc1 1.8T 920G 741G 56% /raid1 /dev/sdd1 1.7T 250G 1.3T 16% /raid2 /dev/sde1 1.7T 212G 1.4T 14% /raid3 /dev/sdf1 1.7T 198G 1.4T 13% /raid4 /dev/sdg1 1.7T 210G 1.4T 14% /raid5 spdc00:/pnfsdoors 391M 79M 278M 22% /pnfs/if.usp.br [mdias@spraid mdias]$ free total used free shared buffers cached Mem: 2055440 2038660 16780 0 579184 1174588 -/+ buffers/cache: 284888 1770552 Swap: 4192956 12684 4180272 </pre>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r1 - 2006-09-28
-
MarcoAndreFerreiraDias
Home
Site map
Main web
Sandbox web
TWiki web
Main Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback