CSE 260 - Parallel Computation ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Latest Announcements Suggested reading (but we won't discuss it): The MicroGrid (from Andrew's Chien's Concurrent Systems Architecture Group). Update on step 4 of project and last assignment (due THURSDAY)! I've just heard from the SDSC consultants that if you want to compile an OpenMP C program, you can use the compiler /usr/local/apps/KAP/guide39/bin/guidec The last assignment (in place of BOTH step 4 of the project and the previous "last assignment") is to do one of the following by the last class. I actually think the first would be the most educational, especially if you've never written a Fortran program. * OpenMP version of project in Fortran, using a triangular sheet of metal instead of the square one. You needn't bother with optimizing the code. But you should try different scheduling options (static and dynamic and using different chunksizes so that you get the effect of both block and block cyclic distributions.) Incidentally, I recommend using at least 50 timesteps, so that any startup costs (like allocating and initializing the arrays) get amortized over many timesteps. The goal is to compute the parallel efficiency for the various strategies on small, medium and large problem sizes. Here are some hints on using OpenMP on the Sun ultra. * A second mini-project (1-2 page writeup; no class presentation). The goal is to teach me something. (There's lots I don't know, so it shouldn't be hard.) * Pthreads version of project. You're on your own in learning Pthreads. * Explore results of project. Specifically, try making the constant (currently .1) larger (e.g. .2, .5, 1.0, ...) and looking at the output. To "look at" the output, I suggest you find some convenient visualization package you can run on your workstation, and ship the data to yourself so you can run it locally. The goal is to have an animation of multiple timesteps. To keep the data size and animation speed reasonable, you'll want to use a a relatively small problem size (perhaps 32x32). But to see anything interesting, you may need to run many timesteps - perhaps 10,000. There's no need feed all the timesteps to the visualizer - you can try outputting every 5th or 10th or whatever. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Class Notes Class 1 in PowerPoint or pdf format. Class 2 in PowerPoint or pdf format. Programming Parallel Computers class in PowerPoint or pdf format. PDE's for Dummies class in PowerPoint or pdf format. Parallel Performance class in PowerPoint or pdf format. Model of Parallel Computers classes in PowerPoint or pdf format. Performance Programming class in PowerPoint or pdf format. Benchmarks and Applications class in PowerPoint or pdf format. Quizzlet 3 answers. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Assignments There will be a multi-part project involving writing and tuning a relatively simple parallel program. We'll explore improving single-node performance, and writing code using both shared-memory and distributed address space paradigms. A description of the project in PowerPoint or pdf format. Part 3 is due November 15. Information courtesy of Sunjeev Sikand: To get pthreads to assign one thread per processor, you need to declare them as system threads. Putting the following into your C program should work: pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setscope (%attr, PTHREAD_SCOPE_SYSTEM); pthread_create (&threadid[i], &attr, start_routine, arg); OpenMP reference manual for C. And for Fortran. Here is information (adapted from Kathy Yelick's class) on MPI. Also available in .pdf format. Here is some information on profiling and timing programs. Also available in .pdf format. Mini-projects Each student should do two mini-projects during the term. Here are some of the completed projects: * Sunjeev Sikand's comparison of single processor of Cray T90 vs. IBM Blue Horizon. * Xiaofeng Gao's mini-study on how fast Google's computers might run Linpack. * Erdem Kurul's mini-study on comparing two models of 1-D wave propagation. * Yang Yu's mini-study on Java Grande in .ppt or .pdf * Qian Peng's mini-study on SETI@home. * Deepa Veerappan's slides on Deep Blue * Angela Molnar on research projects on C language extensions for parallelism * John Kerwin on CC compiler parallelization options * Jessica Chiang on bioinformatics and parallel computing * Xiaofeng Gao's report on recent Gordon Bell prize winners ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Course information CSE260 is an overview of parallel hardware, algorithms, models and software. Topics include parallel computer architectures, a survey of commercally available multiprocessors, parallel algorithm paradigms and complexity, parallel programming languages, environments and tools, and an introduction to scientific applications that are often run on supercomputers. Instructor: Larry Carter. Class times: Tuesdays and Thursdays, 9:35-10:55, Room 2209 Warren Lecture Hall. Office hours: Monday and Wednesdays, 10:00-11:00 or by appointment (or drop by). My office is AP&M 4101. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Related material The UltraSPARC User's Manual for the processor in the Sun Enterprise 10000 used in our project. Ian Foster's on-line textbook, Designing and Building Parallel Programs. A listing of some supercomputers. An overview of research into using object-oriented languages and tools for parallel computation, compiled by Dennis Gannon. Note that this is from 1995, and so (for instance) doesn't mention anything about High-Performance Java efforts (such as Jalapino and Titanium). See Angela Molnar's mini-project on what has happened with some of these efforts. Slides used in a tutorial on single-processor optimization in PostScript or PDF Format.