<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-146451251633146323</id><updated>2011-10-11T15:06:53.801-07:00</updated><title type='text'>3 CUDA  Z-Machines</title><subtitle type='html'>cudak1 &amp;amp; 2: mini-supercomputers (2008/2009, ~$4.5k US ea.). some assembly required!&lt;p align="right"&gt;
&lt;b&gt;720 cores&lt;/b&gt; on 3 gtx280 nVidia cards yield a measured 1.5 TFLOPs (theor. 3 TFLOPs) and send data to or from cpu at &lt;br&gt;7-14 GBytes/s, and 383 GBytes/s internally.
&lt;br&gt;
&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;p align="right"&gt; 
run three N=30700-body systems or 3 incompressible Navier-Stokes fluids on 2048x2048 grids, as smooth as pre-recorded videos (10-24fps)&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>12</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-731817685117032555</id><published>2009-03-17T13:33:00.000-07:00</published><updated>2009-03-17T14:55:15.647-07:00</updated><title type='text'>a welcome to two baby brothers</title><content type='html'>&lt;p&gt;&lt;br /&gt;well, I am definitely not given to moderation. Actually, now that I've spent all the money I am :-) &lt;br /&gt;&lt;br /&gt;two new machines: &lt;br /&gt;&lt;br /&gt;* * *&lt;br /&gt;&lt;br /&gt;&lt;b&gt; cudak2&lt;/b&gt;:&lt;br /&gt;&lt;br /&gt;very similar to cudak1 described below, you wouldn't tell them apart from outside, so I'm not posting pictures today. &lt;br /&gt;the same overall profile of a quiet, small, personal supercomputer. an office machine on which to learn cuda and develop applications, running 64-bit linux fedora 10 and cuda, as well as doing everything else you expect of a desktop workstation. (I crammed like 100 audio cd's into my rhythmbox files.) &lt;br /&gt;&lt;br /&gt;major differences: X58 architecture with 4-core (8-thread) Nehalem Intel chip (2.68 GHz, I believe). only slightly overclocked gpus: 720 cores on 3 nVidia GeForce GTX280 H2O (as opposed to H2OC in original cudak1). Different communication bandwidths via PCI-e bus on a P6T6 workstation beard by Asus. has a multiplexer preventing a permanent degradation of a link to the middle graphics card, like in the 790i architecture. That doesn't mean you can get 3 x (5-6)GB/s flow concurrently to all 3 cards, but maybe that's an unlikely request in practice. cpu simply cannot handle 3 cards at max speed. at least all three cards are now on a more level playing field. they reach the same peak bandwidth of 5.8 GB/s, which Asus calls "true 3-way SLI". whatever. these motherboards are rare though, I understand. &lt;br /&gt;&lt;p&gt;&lt;br /&gt;the new Intel cpu also includes its memory controllers, offloading tasks form the previously overworked or at least overheating northbridge. and that means: finally no noisy, microscopic northbridge fan on the P6T6 mobo.&lt;br /&gt;&lt;br /&gt;a nice surprise: you open up a system monitor and there are 8 separate cpus reported and graphed (twice the number of actual cores on Nehalem thanks to hyperthreading).&lt;br /&gt;&lt;br /&gt;at some point I may describe one technical mod which I made to thermally stabilize the motherboard. I revesed all the Zalman's fans and am now blowing the hot air from the radiator outside, as God intended it. higher coolant and card temps (still comfortably low vis-a-vis specs) but, importantly, lower component and air temps inside the box --&gt; no thermal hang-ups of the motherboard. &lt;br /&gt;&lt;br /&gt;* * * &lt;br /&gt;&lt;br /&gt;&lt;b&gt;cudak3&lt;/b&gt;: &lt;br /&gt;&lt;br /&gt;this one is a step-brother, not a twin brother of cudak1 vel Z-machine. Its an air-cooled monster &lt;br /&gt;in a Thermaltake Armour full-tower case, with watercooling applied to the Nehalem cpu, but with 3 aircooled GTX295 dual-gpu cards. So 6 gpus this time not 3, although each a little slower (main clock ~579 MHz, as opposed to the overclocked H2OC @ 680 MHz). That translates to a theoretical peak performance of 5+ TFLOPs. Benchmarks heat gpus up to 91 C, and the system becomes a bit noisy. &lt;br /&gt;oh well... the noise is a fair price to pay for a theoretical performance of a small campus-scale computing center. (I made a small mechanical modification to improve air flow, preemptively. So far I could not crash this system thermally, but I haven't tried extra hard, "just" 6 or 8 benchmarks running at the same time..) &lt;br /&gt;&lt;p&gt;&lt;br /&gt;total system cost of  cudak2 and 3 was on the order of $5.5k CAD each; that is currently something like $4.4k USD. more info later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-731817685117032555?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/731817685117032555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2009/03/two-new-brothers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/731817685117032555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/731817685117032555'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2009/03/two-new-brothers.html' title='a welcome to two baby brothers'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-3949601277419630771</id><published>2009-03-17T13:20:00.000-07:00</published><updated>2009-03-17T15:06:11.712-07:00</updated><title type='text'>a big can of Ooops.</title><content type='html'>the Z-machine described previously (cudak1) had an accident after just 2 months of duty.&lt;br /&gt;its flow meter, that little paddle wheel which sends signals to the controlling pcb logic of the Zalman, got stuck unobserved at night. the cooling system panicked and switched itself off instead of pumping even more coolant. if you wonder what could happen next.. do the words chernobyl &amp; tsunami mean anything to you? Zalman apparently uses bad handling of emergencies, and badly designed, faulty flow meters. what a shame, a very nice box as I said. it should absolutely switch off and protect the computer not just itself, in emergency. &lt;br /&gt;&lt;br /&gt;anyway, I won't spend time describing this accident. my doctor says it's bad for my blood pressure. a reconfigured system is running again. limping a bit but alive. I cleaned the !@#$&amp;^% flow meter. &lt;br /&gt;:-(&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-3949601277419630771?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/3949601277419630771/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2009/03/burning-rubber-wrong-way.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3949601277419630771'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3949601277419630771'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2009/03/burning-rubber-wrong-way.html' title='a big can of Ooops.'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-8892055931993511743</id><published>2008-12-21T04:37:00.000-08:00</published><updated>2009-03-17T14:25:38.763-07:00</updated><title type='text'>the road ahead</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6hF-AKajt5o/SWE3_Sct9MI/AAAAAAAAAog/zp5RkRTl6Cw/s1600-h/dec08085a2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 242px;" src="http://2.bp.blogspot.com/_6hF-AKajt5o/SWE3_Sct9MI/AAAAAAAAAog/zp5RkRTl6Cw/s400/dec08085a2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5287568997933249730" /&gt;&lt;/a&gt;&lt;br /&gt;my plans for using the ZMachine for &lt;a href="http://planets.utsc.utoronto.ca/%7Epawel/planets/iau202.pdf"&gt;scientific work&lt;/a&gt;  include:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt; porting some serious hydrodynamics to cuda. maybe PPM or symmetric high-order Kourganov schemes. it's not clear if I should use fortran bindings or switch to c/c++. (fortran would be soo cool and easy :-) &lt;/li&gt;&lt;br /&gt;&lt;li&gt; run 3D, variable resolution &lt;a href="http://planets.utsc.utoronto.ca/%7Epawel/rio.pdf"&gt;simulations&lt;/a&gt; of disk-planet interaction. study &lt;a href="http://www.popsci.com/military-aviation-space/article/2008-03/birth-planet"&gt;formation of extrasolar planets.&lt;/a&gt; &lt;/li&gt;&lt;br /&gt;&lt;li&gt; learn openGL and use it for visualization, especially of 3D flow lines.&lt;/li&gt;&lt;br /&gt;&lt;li&gt; create particle codes to study &lt;a href="http://rst.gsfc.nasa.gov/Sect20/dustdisks.jpg"&gt;dust disks in extrasolar planetary systems &lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt; etc. &lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAIbIhqUhI/AAAAAAAAAn0/kXxtkUey-bY/s1600-h/dec08-067ac2.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAIbIhqUhI/AAAAAAAAAn0/kXxtkUey-bY/s200/dec08-067ac2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282731625143751186" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;some of the challanges:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt; porting some serious hydrodynamics to cuda. :-)  &lt;/li&gt;&lt;br /&gt;&lt;li&gt; using those 16kB of shared mem skillfully &lt;/li&gt;&lt;br /&gt;&lt;li&gt; using single precision as much as possible: double precision isn't a strong suit of cuda. &lt;/li&gt;&lt;br /&gt;&lt;li&gt; multi-gpu computing &lt;/li&gt;&lt;br /&gt;&lt;li&gt; clustering with MPI (no urge to do it right now! actually ma be a bad idea at all) &lt;/li&gt;&lt;br /&gt;&lt;li&gt; prevailing over the ruthless Amdahls law &lt;/li&gt; &lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;cudak_1 (as I now tend to call the machine, cudak = eccentric/geek in Polish) may soon have siblings cudak_2 &amp; cudak_3, using i7 nehalems, and x58 mo-bo's asus P6T6. I'm currently (Jan 09) waiting for parts, incl. three evga gtx 295.&lt;br /&gt;watch this space, it should be interesting &amp; v. challenging - 6 gpus in one cudak (read: heterogeneous computing).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-8892055931993511743?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/8892055931993511743/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/patrzac-w-przod.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/8892055931993511743'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/8892055931993511743'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/patrzac-w-przod.html' title='the road ahead'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_6hF-AKajt5o/SWE3_Sct9MI/AAAAAAAAAog/zp5RkRTl6Cw/s72-c/dec08085a2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-6747668283286964125</id><published>2008-12-21T04:35:00.002-08:00</published><updated>2009-01-08T08:35:39.040-08:00</updated><title type='text'>so far so good</title><content type='html'>the ZMachine is a hardware success! (excuse my enthusiasm - this is the first computer I've built completely from scratch.) In a midtower with 790i mini-atx format motherboard, I have 3 powerful, o.c'd gtx280 GPUs (680 MHz base clock &gt; 602 Mhz stock). they pack 720 cores, waiting to be used in parallel. the quad-core cpu keeps the gpus busy and isn't choking, handling the workload ok. &lt;br /&gt;&lt;br /&gt;hydraulic assembly was fun, and it turns out you CAN achieve zero leaks ;-) Zalman is a bit cramped inside,  but after several hours you can figure out how to route all those power connectors and tubes. it's really a high-end box made of 4-5 mm aluminum plates and doors. it looks and sounds good.&lt;br /&gt; &lt;br /&gt;Zalman's 3 l/min pump (max rating, in practice half that value with all those waterblocks) is doing fine. in fact, when run on automatic setting, the cooling system never goes beyond the minimum fan rpm (1000) and flow rate (~1 l/min), since the cpu and gpus aren't able to heat the coolant to more than about 46 C (I think Zalman alarms and does strange things like shutting off above 60 C). This is one of the reasons the system (if not running cuda) is very quiet. Max load, producing up to ~1000 W of heat, causes my northbridge fan to go to audible levels, since the south- and northbridge aren't watercooled and easily heat up above 80 C.  &lt;br /&gt;&lt;br /&gt;the new x58 chipset (for i7 nehelem processors, socket LGA1366) will not have that problem. nevertheless, I'm waiting for updated waterblock mounts (some are available already) and a larger number of PCIe lines on those new architectures. the PCIe bandwidth seems to be a big problem with X58/i7. for all I know, they don't yet offer 3 x16 PCIe configuration like my evga mobo does: degrading just one of the three slots to 1.0 standard, equivalent to x8 slot.&lt;br /&gt;the X58 manufacturers are apparently under no pressure to widen PCIe, since the SLI/crossfire takes over from the pci express bus the duty of gluing the cards together. but why would they go back and reduce the PCI throughput as compared to 790i? beats me, unless this is a limitation of the QPI sections on the cpu right now.&lt;br /&gt;&lt;br /&gt;you may notice that I did not mention SLI. Unfortunately, SLI and CUDA aren't friends yet :-( But somebody wanting to use 3-way SLI could do it in the Zalman LQ1000 box. despite occasional thermal crash problems I'm rather happy with the system tests so far. those problems arise only when the machine is overloaded with computations (multiple large simulations per gpu), and are likely due to very busy north/southbridge on a 790i motherboard (this should not be a factor on newer i7/socket LGA1366 boards, as mentioned - you won't have a fan on the northbridge). &lt;br /&gt;&lt;br /&gt;the ZMachine is certainly pushing the limits, trying to be small, powerful and quiet at the same time. it seems that it will handle scientific simulations very well, as they tend to be similar to the simple fluid and particle codes includes as examples in the sdk. by the way, those examples are really useful. if I wanted to run the kind of test I described on my clusters I'd need dozens to hundreds of cpus. I can't give you an estimate of speedup yet, I only know&lt;br /&gt;that runnig the sdk's examples in emulated mode, where you compile with flags forcing execution on cpu, is not a fair comparison, because the programs would in reality be run very differently on cpus. &lt;br /&gt;&lt;br /&gt;* * *&lt;br /&gt;&lt;br /&gt;of course, since nvidia is doubling the graphics cards' transistors and performance so rapidly, and the first to appear are the air cooled cards, there is a valid question as to water vs. air cooling. in a server room, Tesla rackmounted machines may be the best choice (not necessarily the cheapest). I myself may soon build an air cooled production-run machine to be kept away from people's desks, based on the upcoming gtx 295 gpu. it will certainly be much cheaper, about half-priced compared with the existing gtx 280 cards. [about 1.66 of my cards are needed to match the planned performance of one new dual 295, so I had to pay $850 (CAD) * 1.66 ~ $1400 (CAD) for the performance I will be able to buy for under $500 (US) or $600 (CAN).] but two 295s will be much louder than my current setup! moreover, gpus are only 40% or so of my system's price..&lt;br /&gt;&lt;br /&gt;if the situation with mobo's continues for more than a year, CUDA will make sense on those fast x16 slots only, each of which will have to be shared by two gpus (decreasing communication bandwidth we have now?). we will have max 4 gpus;  power consumption and air cooling will restrict the clock speeds (e.g. ~580 MHz on the upcoming gtx295 vs. 680 MHz on my 280s). until something changes radically regarding the PCIe bus designs, or cooling/overclocking of gpus, in the near future we're going to have a slow progress in CUDA hardware, as the 4 new gpus on 2 cards will not be much faster than the current 3 water-cooled gpus. &lt;br /&gt;&lt;br /&gt;so enjoy the moment and, in theory, get onto the list of 500 fastest supercomputers (ending at some 10 TFLOPs performance) for just $20k, right now!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-6747668283286964125?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/6747668283286964125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/so-far-so-good.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/6747668283286964125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/6747668283286964125'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/so-far-so-good.html' title='so far so good'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-9095898059358719605</id><published>2008-12-21T04:35:00.001-08:00</published><updated>2009-01-21T12:58:20.559-08:00</updated><title type='text'>testing, testing, 123...</title><content type='html'>CUDA SDK provides a nice set of 20+ examples, some of which can also serve as benchmarks as constant heavy loads (run in many copies on a given gpu). &lt;br /&gt;&lt;br /&gt;useful options: add --device=0 to command line in order to run most example programs on the gpu 0 (main one), --device=1 on gpu nr. 1 (slot nr 1, the furthest from cpu on my mobo), and so on.&lt;br /&gt;&lt;h4&gt; bandwidth tests &lt;/h4&gt;&lt;br /&gt;~/cuda/bin/linux/release$ bandwidthTest --device={0,1,2,all} --memory={pageable,pinned}&lt;br /&gt;&lt;br /&gt;very curiously, I once got this funny, misconfigured slot 3:&lt;br /&gt;---------------&lt;br /&gt;the size of data packet transferred is 33MB, about a minimum for best efficiency of transfers.&lt;br /&gt; Host to Device Bandwidth for pinned memory&lt;br /&gt;gpu 0: from cpu 5.2 GB/s  PCIe: 1x16 (2.0) theor: &lt;8GB/s&lt;br /&gt;gpu 1: from cpu 5.7 GB/s  PCIe: 1x16 (2.0) theor: &lt;8GB/s&lt;br /&gt;gpu 2: from cpu 0.79 GB/s PCIe: 1x4 (1.0)* theor: &lt;1GB/s&lt;br /&gt;--&lt;br /&gt;* = as shown by nvidia-settings utility&lt;br /&gt;--------------&lt;br /&gt;inside the cards, the bandwidth is always as it should be: 128 GB/s or so!&lt;br /&gt;&lt;br /&gt;the bandwidth between cpu and gpus in the slot _3 is intriguing. I have yet to understand why my third slot was configured as PCIe x4, not x16 during this test!&lt;br /&gt;&lt;br /&gt;anyway, normal readings are now better. to achieve them, I have overclocked the PCI buses: slot 1,2 and 3, from automatic setting of 100MHz to, correspondingly, 115, 115, and 120 MHz, and the SPP-MCI comm speed from automatic 200 MHz to 240 MHz. I changed the latency timer of PCI to 100 from 128 CLK. &lt;br /&gt;&lt;br /&gt;-------------&lt;br /&gt;the size of data packet transferred is 33MB, about a minimum for best efficiency of transfers.&lt;br /&gt; Host to Device Bandwidth for pinned memory&lt;br /&gt;gpu 0: from cpu 5.874 GB/s, to cpu 5.875;  PCIe: 1x16 (2.0)  theor: &lt;8GB/s&lt;br /&gt;gpu 1: from cpu 5.875 GB/s, to cpu 5.876;  PCIe: 1x16 (2.0)  theor: &lt;8GB/s&lt;br /&gt;gpu 2: from cpu 2.075 GB/s, to cpu 2.017;  PCIe: 1x16 (1.0)* theor: &lt;4GB/s&lt;br /&gt;______________&lt;br /&gt;gpu 0-2 cumulatively: from cpu 13.8 GB/s, to cpu 13.8 GB/s, internally 383 GB/s.&lt;br /&gt;--&lt;br /&gt;* = as shown by nvidia-settings utility&lt;br /&gt;--------------&lt;br /&gt;I must stress that, unless the benchmark is cheating, the troughput of the 3 cards is additive, i.e., they transfer data w/o mutual interference. the bandwidthTest shows cumulative throughput of 13.8 GB/s each way. [edit: yes, I think the benchmark is cheating.. but I wasn't able to run a concurrent bandwidth test with 3 cards.. :-( maybe the the cpu can't handle 14 GB/s concurrently.]&lt;br /&gt;&lt;br /&gt;it is revealing to compare EVGA 790i nForce SLI FTW with the X58 motherboards from ASUS (P6T X58 deluxe) and Gigabyte (GA-EX58-extreme or UD5). The latter have 3 physical x16 slots just like the EVGA but unlike it, do not provide enough bandwidth to use them simultaneously: you can only use full (PCIe 2.0) x16 throughput on two first slots, and if you want to have a 3rd card, the second slot goes to x8. Thus, according to their documentation, they are WORSE than the EVGA board/chipset/cpu. well, it's sad. The ASUS Striker II Extreme board is a socket 775 board similar to my EVGA and from the documentation it seems to have very similar PCIe capabilities: two slots at full (2.0) x16 speed, one middle slot at (1.0) x16 speed. &lt;br /&gt;&lt;br /&gt;&lt;h4&gt; thermal tests &lt;/h4&gt;&lt;br /&gt;the tests described above run ok. however, these workloads and PCI clock settings produce conditions close to a  thermal instability of the motherboard.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SWEH5a8U3TI/AAAAAAAAAoI/t1kqHW1MwUo/s1600-h/Screenshot-1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 250px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SWEH5a8U3TI/AAAAAAAAAoI/t1kqHW1MwUo/s400/Screenshot-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5287516120575958322" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;temperatures depend a lot on applications. I ran a fluidsGL simulation of 'stable fluids'.&lt;br /&gt;It's a Fourier-based, incompressible, implicit hydrocode, in which I modified the source code to have a much larger array of 2048^2 cells and a much lower viscosity coefficient (which doesn't affect the speed). iterations on this enlarged grid using calls to cufft FFT library yielded the following stabilized temp's: T ~69/58 C (gpu0, chip/card), 66/55 C (gpu1), 64/54 C (gpu2).&lt;br /&gt;&lt;br /&gt;Zalman system was showing 43/35 C (coolant/box temperature).&lt;br /&gt;&lt;br /&gt;it is quite easy to thermally destabilize the motherboard by increasing the number of tasks run on each gpu from one to a few. the temperatures cited above are close to a maximum for long-term runs. everything depends on the type of application, of course, for instance running many small fluid grids instead of an equivalent one large grid tends to increase the demand on SPP/MCP and raise temps. ideally, I should be running just one big simulation per gpu, or even one simulation on 3 gpus, no intensive output to monitor via gpu0, so I should(?) be fine. &lt;br /&gt;&lt;br /&gt;I guess the thermal issues will stay with us for the foreseeable future, whether we have 65nm, 55nm or, one day, 25nm technology, since we're always going to push the performance to the limit.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt; computing power tests &lt;/h4&gt;&lt;br /&gt;the SDK 2.1 tests were done successfully. &lt;br /&gt;&lt;br /&gt;fluidsGL test was running at 17 fps on one gpu, or ~10 fps in 3 copies on 3 gpus, at resolution (2K)^2. at the standard resolution of 512^2, the frame rate was 266fps on one pgu, or 180+180+90 (=450fps combined on 3 gpus). let me comment on this. the algorithm, in addition to lots of interpolation (to perform advection step) also does at least two fft transforms of the 2-D, multi-variable data. 200 timesteps per second is thus equivalent to almost 10^3 fft transforms per second on a 512^2 array. in several seconds, the simulated fluid can travel across the computational grid. admittedly different, but not necessarily more computationally intensive, CFD hydrocodes run on a cpu would easily take a full coffee break to accomplish this.&lt;br /&gt;&lt;br /&gt;nbody test showed that there is no problem with having a sustained combined processing power (e.g., sum of FLOPs in many concurrently running nbody calculations) equal to 1.53 TFLOP. Theoretical sum is closer to 3 TFLOPs. 30000 particle N-body system oscillates, turns and evolves on the screen in a matter of seconds. (rendered by openGL in a game-like way, somewhat nicer than what we normally do in science :-). btw, somewhere here I said that Zalman cooling system never goes into high gear. well, that's no longer true, in this 3 x nbody test it did go to 1400rpm fans and 1.5 l/min flow rate. but check out the frame rate of those N-body simulations: about 20fps each, which means smooth real-time video during which each simulation computes N^2 ~ 10^9 gravitational interactions per frame. to add to the insanity of this calculation, by dragging &lt;br /&gt;the mouse over the screen you can turn the simulated 3D objects in space to get a different view, whether the simulation is running or pausing. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6hF-AKajt5o/SWJ0AtNzrBI/AAAAAAAAAow/xiSaKlw1vyE/s1600-h/Screenshot-8.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 250px;" src="http://2.bp.blogspot.com/_6hF-AKajt5o/SWJ0AtNzrBI/AAAAAAAAAow/xiSaKlw1vyE/s400/Screenshot-8.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5287916467972713490" /&gt;&lt;/a&gt;&lt;br /&gt;I have to find the actual top FLOPs in my own applications this winter or spring (2009). I believe they may be higher than 1.53TFLOP, because I will try not to give device=0 so much graphics to display as do the examples, forcing high frame rates. this will help Zalman remove heat from the motherboard chips such as northbridge/southbridge.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-9095898059358719605?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/9095898059358719605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/retrospekcja.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/9095898059358719605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/9095898059358719605'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/retrospekcja.html' title='testing, testing, 123...'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6hF-AKajt5o/SWEH5a8U3TI/AAAAAAAAAoI/t1kqHW1MwUo/s72-c/Screenshot-1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-3950685892059543700</id><published>2008-12-21T04:35:00.000-08:00</published><updated>2009-01-06T10:19:06.877-08:00</updated><title type='text'>it's alive!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAIRIFaKaI/AAAAAAAAAns/DsZl63FEnjQ/s1600-h/dec08-027a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 155px; height: 200px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAIRIFaKaI/AAAAAAAAAns/DsZl63FEnjQ/s200/dec08-027a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282731453226559906" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAIICgLW5I/AAAAAAAAAnk/WTcEZRazMjc/s1600-h/dec08-026a.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAIICgLW5I/AAAAAAAAAnk/WTcEZRazMjc/s200/dec08-026a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282731297109400466" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAHud6D0KI/AAAAAAAAAnc/K1RwjD7bz2U/s1600-h/dec08-048a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAHud6D0KI/AAAAAAAAAnc/K1RwjD7bz2U/s200/dec08-048a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282730857789116578" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAHm0FgGuI/AAAAAAAAAnU/ZN_tQNAd-bU/s1600-h/dec08-047a.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 152px;" src="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAHm0FgGuI/AAAAAAAAAnU/ZN_tQNAd-bU/s200/dec08-047a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282730726303734498" /&gt;&lt;/a&gt;&lt;br /&gt;perhaps the most important issue with 3-SLI-like configuration is the thermal stability, not of the cards which, as both the Zalman and the nvidia utility nvidia-settings show, are relatively cool (under maximum strain, they heat up to mid-70 C, when air-cooled versions would go to 100 C. &lt;br /&gt;likewise, there is never a danger of overheating the cpu, if watercooled.&lt;br /&gt;rather, the issue is that radiators are supposedly working better when drawing air into the box. &lt;br /&gt;in addition, that's a natural direction to support the power unit and the 12cm box fan both trying to blow the air out the back side of the box. so the radiator-WARMED air is blown into the box, not outside. this of course heats and cools the whole motherboard simulatneously. it heats the whole motherboard but efficiently cools its small chips, which are heating up during any intensive use. unfortunately but predictably, at maximum computational load, especially if the system is burdened with running multiple programs per physical card, something has to give. most probably the southbridge or northbridge overheats and the computer hangs. I wish there was some solution to this that allows the big fan to cool the radiator to the outside, while still providing a good flow of air near the motherboard..&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAHeR65T9I/AAAAAAAAAnM/oRP47fW69AA/s1600-h/dec08-067a.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAHeR65T9I/AAAAAAAAAnM/oRP47fW69AA/s200/dec08-067a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282730579693490130" /&gt;&lt;/a&gt;&lt;br /&gt;however, many combinations of the workload, fully utilizing all the resources (gpu, cpu, memory) are stable. let's take a closer look...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-3950685892059543700?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/3950685892059543700/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/test-test-test.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3950685892059543700'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3950685892059543700'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/test-test-test.html' title='it&apos;s alive!'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6hF-AKajt5o/SVAIRIFaKaI/AAAAAAAAAns/DsZl63FEnjQ/s72-c/dec08-027a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-7832719025059085725</id><published>2008-12-21T04:34:00.001-08:00</published><updated>2009-01-06T10:16:45.003-08:00</updated><title type='text'>the three-card game</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAF8HIs83I/AAAAAAAAAms/XLDzSvNYa3Q/s1600-h/dec08-042a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 150px; height: 200px;" src="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAF8HIs83I/AAAAAAAAAms/XLDzSvNYa3Q/s200/dec08-042a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282728893171430258" /&gt;&lt;/a&gt;&lt;br /&gt;already on the 2-card machine, I played with the os installation. initially, I installed a 32-bit RHEL 5.2 from media, only to change my mind and go for the 64-bit fedora10, which should give me a slight performance edge. &lt;br&gt;&lt;br /&gt;&lt;br /&gt;I had a nasty day or two without the X windows/desktop, since the nvidia driver version 180.06 and then 180.16 did NOT install ok. whoever wrote that script seems not to have known a magic spell I finally found on nvidia forum, you know, something like: &lt;br /&gt;"/usr/bin/nvidia-xconfig -a".  a spell without which your computer only outputs 24 lines of text ;-). half a day, and a bunch of rpm's later I started feeling at home in my tcsh and gnome desktop. I installed the newest sdk for cuda without incident. &lt;br /&gt;&lt;br /&gt;then the real fun began. the physical re-installation of the water-cooled cards. &lt;br /&gt;my first leak... :-( well, all those of you who used the extremely short barbs in the BFG gtx280 H2OC kit, meant for SLI installation, know what I mean. the supplied short pieces of clear lastic vinyl tubing don't tightly fit around the barbs, the white plastic clamps aren't really working and, besides, the short barbs are a bit too wide and catch on the aluminum card backplate painted in black, when you try to mount them on the card. all those mechanical/hydraulic issues &lt;br /&gt;can be solved by abandoning those too-short and too-wide barb fittings and using the 3/8 inch fittings, which are smoother and better quality anyway. just one thing: you have to shorten them so that the cards are close together. I put the cards in, measured the distance and cut off pieces of four 3/8" barbs with a carbon (diamond?) disk attached to an electric drill. &lt;br /&gt;I smoothed the edges so they won't cut the hoses and everything started looking good again (although I'm missing the home depot's garden hose a lot! :-) &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAGfwPPgJI/AAAAAAAAAm8/MhoVdysn-PQ/s1600-h/dec08-081a.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAGfwPPgJI/AAAAAAAAAm8/MhoVdysn-PQ/s200/dec08-081a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282729505500135570" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAGTYYdoWI/AAAAAAAAAm0/KK74X3C-yeM/s1600-h/dec08-079aa.jpg"&gt;&lt;img style="float:left; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAGTYYdoWI/AAAAAAAAAm0/KK74X3C-yeM/s200/dec08-079aa.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282729292937929058" /&gt;&lt;/a&gt;&lt;br /&gt;the LQ1000 case is very compact, like all mid-towers. for instance, I musty warn those of you who would like to use EVGA water-cooled 280s that this box DOES NOT WORK with them. they're too high (from PCIe bracket on mobo to the top of the card). you have to use BFG gtx280 H2OC cards. I was just lucky to have gotten them in my first iteration. they almost touch the big fan casing, so you have to route the cooling hoses over the low sectors of the cards. there is enough tubing in the Zalman kit for several waterblocks. &lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAF0Oa7kzI/AAAAAAAAAmk/s2ogJWWhuQs/s1600-h/dec08-080aa.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://2.bp.blogspot.com/_6hF-AKajt5o/SVAF0Oa7kzI/AAAAAAAAAmk/s2ogJWWhuQs/s200/dec08-080aa.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282728757687980850" /&gt;&lt;/a&gt;&lt;br /&gt;those hoses will likely touch the 24cm fan casing &lt;br /&gt;which is ok. the small door covering the disks will touch their sata cabling. it's all barely doable. if you install the lowest/furthest x16 card, you'll always worry a lot about the sharply bent hose coming down toward the bottom plate of the case. but it will all work! I bent and even clamped the hose in a working system with my fingers and the flow rate diminished noticeably only when I used so much force that I thought I'll stop the flow completely. but it kept going and Zalman's flow rate alarm wasn't even triggered. &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAFe3PbSwI/AAAAAAAAAmU/D4TQIOyH3l0/s1600-h/dec08-087a.jpg"&gt;&lt;img style="float:right; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAFe3PbSwI/AAAAAAAAAmU/D4TQIOyH3l0/s200/dec08-087a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282728390688459522" /&gt;&lt;/a&gt;&lt;br /&gt;in this picture you can see that I removed the white plastic clamps between the graphics cards (cf. previous pix), and installed the home-depot clamps on the shortened 3/8" barbs. no leaks now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-7832719025059085725?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/7832719025059085725/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/gra-w-dwie-karty.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/7832719025059085725'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/7832719025059085725'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/gra-w-dwie-karty.html' title='the three-card game'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_6hF-AKajt5o/SVAF8HIs83I/AAAAAAAAAms/XLDzSvNYa3Q/s72-c/dec08-042a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-3267908412648475470</id><published>2008-12-21T04:34:00.000-08:00</published><updated>2009-01-06T10:12:44.215-08:00</updated><title type='text'>the two-card game</title><content type='html'>Let's take a look at the first configuration I built, with two gtx280 gpus. I actually bought three cards, but got scared about the thermal limitations of my Zalman cooler (which are like those of Zalman XT external cooler): nominally only 500 W heat removed.&lt;br /&gt;well, theoretically I had 3 x 236W of heat just from the 3 cards! so my system, on paper, was limited to 2 cards...  but I figured that all the heat doesn't go into the coolant.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAEjFrIwLI/AAAAAAAAAmE/skLdSYki4d8/s1600-h/dec08-024a.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 240px;" src="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAEjFrIwLI/AAAAAAAAAmE/skLdSYki4d8/s320/dec08-024a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282727363770630322" /&gt;&lt;/a&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAEHxLpB3I/AAAAAAAAAl0/vxz1P0Rm-Vg/s1600-h/dec08-022a.jpg"&gt;&lt;img style="float:right; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 240px; height: 320px;" src="http://3.bp.blogspot.com/_6hF-AKajt5o/SVAEHxLpB3I/AAAAAAAAAl0/vxz1P0Rm-Vg/s320/dec08-022a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282726894413350770" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_6hF-AKajt5o/SVAD3-I2NAI/AAAAAAAAAls/f233_Ma2tGA/s1600-h/dec08-044a.jpg"&gt;&lt;img style="float:left; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 240px; height: 320px;" src="http://4.bp.blogspot.com/_6hF-AKajt5o/SVAD3-I2NAI/AAAAAAAAAls/f233_Ma2tGA/s320/dec08-044a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282726623013385218" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;the de-gassing of the cooling system has to be done with all the waterblocks below the pump. &lt;br /&gt;no leaks were found. if you use Zalman, remember not to give up (like someone on some forum did) &lt;br /&gt;after a few beeps and disconnections of the pump. this is a normal behavior. before most air is gone you'll have to restart the system up to ten times. I recommend to turn the waterblocks as much possible during this; it helps the air escape.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I like this shot, it shows perhaps the first supercomputer built from pieces of nylon-reinforced garden hose from home depot :-) the wide spacing between the cards is due to the location of the 2 fastest PCIe x16 slots.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVADmYNF7_I/AAAAAAAAAlk/eB6YS1Dbo7A/s1600-h/dec08-020a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 320px; height: 240px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVADmYNF7_I/AAAAAAAAAlk/eB6YS1Dbo7A/s320/dec08-020a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282726320772870130" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-3267908412648475470?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/3267908412648475470/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/gra-w-trzy-karty.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3267908412648475470'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3267908412648475470'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/gra-w-trzy-karty.html' title='the two-card game'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_6hF-AKajt5o/SVAEjFrIwLI/AAAAAAAAAmE/skLdSYki4d8/s72-c/dec08-024a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-548632811820707277</id><published>2008-12-21T04:33:00.000-08:00</published><updated>2009-01-06T17:26:05.572-08:00</updated><title type='text'>intro &amp; specs</title><content type='html'>the system I'm going to describe is my first step on a new path into shared-memory parallel computing. in other words, here you won't find anything about how to improve your highscores and frame rates in Crysis 3.14 or  Left 4 Dead 7.0. :-(&lt;br /&gt;&lt;br /&gt;my Z-Machine (when I find a good name for it, I'll edit out ZMachine :-) was designed with these objectives in mind:&lt;br /&gt;&lt;br /&gt;1. max power: max number of cards in one box, &lt;br /&gt;&lt;br /&gt;2. well interconnected: cards sitting on pci express bus (x16 if possible) and not dependent on the relatively very slow gigabit ethernet switches (which aren't very broadband; in PCIe terminology they are x1 or x2 devices!) &lt;br /&gt;&lt;br /&gt;3. quiet operation for office, not sever room setting&lt;br /&gt;&lt;br /&gt;* * *&lt;br /&gt;&lt;br /&gt;points 1 and 3 suggested water cooling, and when I started reading up on that subject, I was amazed that a little-known box called LQ1000 from a respected manufacturer Zalman has a nice cooling system integrated inside the box. although a bit expensive, it looks great (wine-colored gauges remind one of a bmw dashboard :-)&lt;br /&gt;&lt;br /&gt;btw, the box looks like so&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAEw9gbI6I/AAAAAAAAAmM/RcPGA07wVVQ/s1600-h/dec08-042a.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 240px; height: 320px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SVAEw9gbI6I/AAAAAAAAAmM/RcPGA07wVVQ/s320/dec08-042a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282727602096382882" /&gt;&lt;/a&gt;&lt;br /&gt;and not like this &lt;br /&gt;&lt;a href="http://www.bit-tech.net/news/2007/06/11/zalman_makes_watercooled_case__lq1000/1"&gt;&lt;br /&gt;prototype &lt;/a&gt; from a 2007 trade show.&lt;br /&gt;&lt;br /&gt;* * *&lt;br /&gt;&lt;br /&gt;next: which cards? nvidia geForce gtx280 was my choice (240 cores!). what's interesting, water cooled cards by BFG and EVGA are factory overclocked. great! &lt;br /&gt;I considered the newer versions of gtx260 with 216 cores but the price-performance calculus preferred gtx280. [I looked at the price and performance of the whole computer, not just one card!]&lt;br /&gt;&lt;br /&gt;next: the motherboard and cpu. well.. that was kind of unimportant if my hopes as to the gpus were&lt;br /&gt;right (-: so i settled on a run-of-the-mill quad-core intel processor...&lt;br /&gt;&lt;br /&gt;* * * &lt;br /&gt;&lt;br /&gt;it took me the last week of Nov 2008 to (over)design my machine while scanning the world for the following components (prices are approximate, in CAD):&lt;br /&gt;&lt;br /&gt;&lt;b&gt;ZMachine:&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;box and cooling: Zalman "ZMachine" LQ1000, with included cpu waterblock &amp; whole cooling system in a midtower. $800&lt;/li&gt;&lt;br /&gt;&lt;li&gt;cards: 3 x BFG GeFOrce gtx 280 H2OC, factory-overclocked setting, 680 MHz main clock - $686 each at bestdirect.ca&lt;/li&gt;&lt;br /&gt;&lt;li&gt;motherboard: EVGA nForce 790i SLI FTW - $350 [FSB clock 1350 MHz, +15% overclocked PCIe.&lt;br /&gt;Good mobo, except for a tiny northbridge radiator fan, which becomes loud when nb is getting a workout by cuda applications. however, at the end of 2008 there simply were no better boards. I could (and maybe would) have opted for ASUS Striker II Extreme or a Gigabit board with i7 nehelem cpu (socket LSA1366), but then I would have a wrong Zalman cpu waterblock bracket, and the really insufficient PCIe throughput, about which later..]&lt;/li&gt;&lt;br /&gt;&lt;li&gt;CPU: intel quad-core at 2.83GHz (Q9550) - $300(?)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;RAM: 2x2GB SLI-ready DDR3 1800  $ 440&lt;/li&gt;&lt;br /&gt;&lt;li&gt;PSU: Toughpower Thermaltake 1200W [has the required two modular +12V connectors to each of the three gtx280 cards, and is very quiet. pay close attention to the number of available power connectors if you construct a 3-way SLI!] - $430&lt;/li&gt;&lt;br /&gt;&lt;li&gt;2 x 1TB Spinpoint harddisks from Samsung (quiet) - $240 (both) [I have a backup partition 250GB on the second drive, still don't know what I gain :-) since if the 1st disk crashes, the second is not automatically bootable... well I'll sort it out later]&lt;/li&gt;&lt;br /&gt;&lt;li&gt;1 dvd-rom $29 [nice, quiet], kbd/mouse $10 ea.[spent too little? both aleady failing :-]&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Samsung SyncMaster 2443BW, 24" 1920x1200 monitor. $350 [I like it, pivots around 2 axes, adjustable height. great contrast etc]&lt;/li&gt;&lt;br /&gt;&lt;li&gt;OS: Fedora 10 , x86_64, driver: nvidia 180.16 - $0 [I downloaded and installed 6-7 GB over the net without any physical media in one night]&lt;/li&gt;&lt;br /&gt;&lt;li&gt; CUDA v. 2.1 beta. [installs &amp; works fine; I skipped compilation of those few examples that require some extra libraries] &lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-548632811820707277?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/548632811820707277/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/skladniki-i-przepis.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/548632811820707277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/548632811820707277'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/skladniki-i-przepis.html' title='intro &amp; specs'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6hF-AKajt5o/SVAEw9gbI6I/AAAAAAAAAmM/RcPGA07wVVQ/s72-c/dec08-042a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-4209881048829577868</id><published>2008-12-21T04:31:00.000-08:00</published><updated>2009-01-21T12:54:45.257-08:00</updated><title type='text'>GPU &gt;&gt; CPU</title><content type='html'>in early 2007, nVidia opened up the gates to a paradise. a free if not entirely open-source project called CUDA made its debut. it's a general-purpose graphics device computing, utilizing the massively parallel architecture of today's GPUs, or graphics processing units. in effect, all the recent nvidia cards became capable of carrying out parallel computational tasks. their raw power exceeds that of a CPU by a factor now typically ~10^2.  &lt;br /&gt;&lt;br /&gt;CUDA is best explained in &lt;a href="http://en.wikipedia.org/wiki/CUDA "&gt; this wiki page&lt;/a&gt;. &lt;br /&gt;It is nicely illustrated using real-life applications in the &lt;br /&gt;&lt;a href="http://www.nvidia.com/cuda"&gt;nVidia CUDA Zone&lt;/a&gt;. Typical speedups w.r.t. cpu are 5-100.&lt;br /&gt;&lt;br /&gt;mythbusters were hired to illustrate the power of parallel processing and constructed a cute, monstual &lt;br /&gt;&lt;a href="http://www.youtube.com/watch?v=ZrJeYFxpUyQ&amp;annotation_id=annotation_764898&amp;feature=iv"&gt;parallel paint gun&lt;/a&gt; to paint mona lisa in less than 80 ms. it's fun to watch the monster and its 1024 paintballs flying  slow-mo to meet their final destination: canvas. &lt;br /&gt;&lt;br /&gt;without much exaggeration one can say that gpus could only be ignored so long - as soon as there's a solution that speeds up your program 100 times, you have no choice but to change to that track, no mater how comfortable your old one was. &lt;br /&gt;&lt;br /&gt;for me, the old path was clusters and MPI. I started 10 yrs ago with a cluster of sun ultra5 workstations, and later continued with a cluster of custom built pc's in rack mounts. &lt;br /&gt;&lt;a href="http://planets.utsc.utoronto.ca/~pawel/hydra/index.htm"&gt;hydra cluster, 5 GFLOPs&lt;/a&gt;&lt;br /&gt;&lt;a href="http://planets.utsc.utoronto.ca/~pawel/binaries.html"&gt;sample application of hydra&lt;/a&gt;&lt;br /&gt;&lt;a href="http://planets.utsc.utoronto.ca/~pawel/antares/index.htm"&gt;ANTARES cluster, 61-144 GFLOPs&lt;/a&gt;&lt;br /&gt;&lt;brssssss&gt;&lt;br /&gt;MPI is a language (more precisely protocol and libraries that implement it, for exchange of data between nodes of a cluster). physical exchange was facilitated by commodity gigabit ethernet switches that became affordable about that time. &lt;br /&gt;&lt;p&gt;&lt;br /&gt;clusters were great, and essentially most today's supercomputers are built like that: farms of dozens to tens of thousands of machines hooked up by relatively slow interconnects. distributed memory and distributed processing power. which is ok for some problems, like hydrodynamics w/o radiation transfer or self-gravity in astrophysics, or frame-by-frame movie rendering and postproduction in a studio. &lt;br /&gt;&lt;br /&gt;so clusters were great, but not trouble-free. you had to wait (hours or days, depending on how ambitious your computation was!) for the requested number of processors on some big national supercomputer to be allocated to your simulation. or you could decide to build your own little cluster, if you had money, place and time for it. that made more sense to many, and could cost your grant agency 'only' $20k or so, unless you really needed smp (shared memory machine, then you had to multiply the cost 5-10 times.) &lt;br /&gt;&lt;br /&gt;you and your associates could only run a system of a few dozen nodes at best. beyond that magical number, frequent individual component breakdowns, software upgrades, and so on, needed to be taken care of by a professional sysadmin or technician (which you could not afford; so you used &lt;br /&gt;your nodes praying they don't fail, and did not repair those that eventually did.)&lt;br /&gt;&lt;br /&gt;on a small cluster, scientific long-term simulations could in practice be done in 2D but rarely in 3D, unless you were very lucky with your problem and/or very patient.. &lt;br /&gt;&lt;br /&gt;* * * &lt;br /&gt;&lt;br /&gt;let's skip to 2007 then. why is a gpu hundred times more powerful than a cpu? &lt;br /&gt;today, both are capable of parallel computation, since  they have multi-core structure. &lt;br /&gt;each gpu core is far less advanced on the control/vectorization side a bit slower. &lt;br /&gt;but your ~$350 4-core intel processor is no match for 216-240 cores of a ~$350 nvidia gpu, on the newest G200 card series (nForce gtx260, gtx2800, and from january 2009 also a 2-gpu card gtx295 with 480 cores). ASSUMING YOU CAN harness the combined power of those gpu cores...&lt;br /&gt;&lt;br /&gt;so here's the challange: to build and program a massively parallel system (with hundreds of cumputing nodes or cores) that is a bit more environmentally friendly than the old clusters: much less noise, much less total electrical power used, and finally much much more bang for the buck. &lt;br /&gt;and that means: a supercomputer in a signle computer case, performing thousands of GFLOPs!&lt;br /&gt;perhaps a thousand times the number of operations you could perform 10 yrs ago.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-4209881048829577868?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/4209881048829577868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/krzywe-sie-rozchodza.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/4209881048829577868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/4209881048829577868'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/krzywe-sie-rozchodza.html' title='GPU &gt;&gt; CPU'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-3752261530967465847</id><published>2008-12-21T04:30:00.000-08:00</published><updated>2009-01-05T11:24:51.895-08:00</updated><title type='text'>why cuda</title><content type='html'>why CUDA? the answer is simple: in my mother tongue (Polish) cuda means... "miracles".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-3752261530967465847?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/3752261530967465847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/dlaczego-cuda.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3752261530967465847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/3752261530967465847'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/dlaczego-cuda.html' title='why cuda'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-146451251633146323.post-4032150405620582760</id><published>2008-12-21T04:03:00.000-08:00</published><updated>2009-01-03T21:52:34.251-08:00</updated><title type='text'>why at all</title><content type='html'>I guess it's the envy...my teenage son built a little studio and I wanted to make some mess too&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6hF-AKajt5o/SU4yNQjdT9I/AAAAAAAAAlM/tJIYdMQMRoc/s1600-h/dec08-012ac.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://3.bp.blogspot.com/_6hF-AKajt5o/SU4yNQjdT9I/AAAAAAAAAlM/tJIYdMQMRoc/s200/dec08-012ac.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282214616315547602" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6hF-AKajt5o/SU40wJW8tDI/AAAAAAAAAlc/MAUk0-XGBWs/s1600-h/dec08-017a.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_6hF-AKajt5o/SU40wJW8tDI/AAAAAAAAAlc/MAUk0-XGBWs/s200/dec08-017a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282217414702707762" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/br&gt;&lt;br /&gt;&lt;br&gt;&lt;br /&gt;so I set up a little computer/hydraulic workshop in my library&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_6hF-AKajt5o/SU4y2NVX-8I/AAAAAAAAAlU/hD_yEy-lGpU/s1600-h/dec08-074ac.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://4.bp.blogspot.com/_6hF-AKajt5o/SU4y2NVX-8I/AAAAAAAAAlU/hD_yEy-lGpU/s200/dec08-074ac.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5282215319825808322" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/146451251633146323-4032150405620582760?l=cuda-z-machiny.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cuda-z-machiny.blogspot.com/feeds/4032150405620582760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/pierwsze-cuda-sa-najtrudniejsze.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/4032150405620582760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/146451251633146323/posts/default/4032150405620582760'/><link rel='alternate' type='text/html' href='http://cuda-z-machiny.blogspot.com/2008/12/pierwsze-cuda-sa-najtrudniejsze.html' title='why at all'/><author><name>you-know-who (ykw)</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://bp1.blogger.com/_6hF-AKajt5o/SFuTR71l0fI/AAAAAAAAAVI/EKd221MlmCQ/S220/IMG_0104-c.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_6hF-AKajt5o/SU4yNQjdT9I/AAAAAAAAAlM/tJIYdMQMRoc/s72-c/dec08-012ac.jpg' height='72' width='72'/><thr:total>0</thr:total></entry></feed>
