This work was done at: Army Research Laboratory Attn: AMSRL-CI-AD APG, MD 21005-5067 Email: phil@sd.wareonearth.comPostscript versions of this paper can be found here.
This system allows any X application to be used from within a virtual space, without rewriting any software. In doing so, the richness of virtual environments are immediately enhanced, allowing the use of applications for the graphing of functions, image display, keyboard control of software, and the use of audio teleconferencing software and shared white boards. Any number of such servers can exist and be manipulated inside of the virtual environment. Use of transparency as a background color allows text windows to float in space or be used as heads-up displays.
The transition to virtual reality interaction, while extending the expressive power of both the computer and user, has usually involved leaving our traditional means of interaction behind. Completely new applications are being constructed for use in virtual environments, creating a split between those things that can be done in virtual reality, and those that must be done via traditional means. It would be advantageous if all of our existing software could be used from within the virtual environment. This saves the user from having to jump back and forth between virtual reality and a traditional computer screen, and saves a great deal of software from having to be rewritten. This paper presents one system for accomplishing this goal.
The transition to new human-computer interaction paradigms has happened before. First we went from batch processing, to dumb terminals with text based interaction. Systems such as ``curses'' and Tektronix style plotting extended text based applications. Finally window systems allowed graphical output and graphical user interfaces (GUI) to be developed. As each new system emerged, the ability to use the previous style of interaction from within it has been provided: dumb terminals still allow batch jobs to be submitted; window systems provide dumb terminal emulators, e.g. X11 provides xterm, to allow non--GUI applications to be run.
The system described here allows X11 to be run from within a virtual environment. In much the same way that multiple xterm's can be run on an X server, multiple X servers can be run inside of a virtual world. The implementation of the system will be described, hardware and software issue which effect its utility at the current time will be discussed, and ideas for future directions are explored.
The X11 Sample Server provided by the X Consortium is a highly portable software implementation of an X server. A great deal of the work required of an X server is implemented in a hardware independent fashion. There is a device dependent layer which includes the linkages to specific hardware platforms. The first step in this project, involved porting the sample server to a pseudo hardware platform, i.e. making it functional completely independent of specific hardware.
There are several papers that address porting the X sample
server\cite{angebranndt-1988-portinglayer,angebranndt-1988-portingstrategies,rosenthal-1990-godzilla}. % Possible one more porting paper
All of these are recommended reading, but anyone who undertakes such
a port quickly realizes that many of the details regarding available
functions within each layer are left as an exercise for the reader.
The approach taken here was to create a server called Xmem
which allocates a block of memory for its display frame buffer, and
accepts pseudo device input events from a fifo queue. A Unix SIGUSR1 signal
can be sent to make it dump the contents of its display memory to a file,
so that it can be processed or displayed elsewhere. More flexible is the option
to have it allocate its display frame buffer in a shared memory segment,
so that it can be accessed directly by other programs. An example of
starting this server is:
Xmem -shared :1
This starts an X server for display number 1, and places its memory
frame buffer into a shared memory segment. Following the usual X
convention, it listens to port 6000 + display_number for
client connections, as well as to the Unix domain socket of
/tmp/.X11-unix/X1. It also creates an input
fifo called /tmp/.X11-unix/X1-events where external programs
can send pseudo keyboard and mouse input. Xmem can manage
multiple memory screens, which can be either monochrome or color.
There seems to be little convention in choosing shared memory segment numbers. Xmem uses 100 * display_number + screen_number as the segment number. The first few bytes of this shared memory segment contain the width and height of the display, and whether it is color or monochrome. Following that is the actual display memory.
Once Xmem is running, a window manager can be started on it, and any number of clients can connect to it in the usual X fashion. Keyboard and mouse input can be passed to Xmem down the Xn-events fifo, using the messages described in Table~\ref{tab:events}. All of these pseudo input device messages are ASCII, and terminated by a newline character. Any number of programs can write to this same fifo.
Message Input Event +a Key `a' down -a Key `a' up ;a Key `a' press (down and up) <1 Mouse 1 down >2 Mouse 2 up m2 Mouse 2 press (down and up) =x y Mouse move to (x, y) Xmem pseudo input device messages.
X Server Running within a Window
The result is a complete X server, running entirely within a window residing on another X server. This server within a window gets its own window manager and its own client applications. Since this server display window is nothing more than an X client program, the entire thing can be moved and iconified as with any other application. Indeed, this entire process can be repeated from within this X server window, causing yet another layer of X servers within X servers.
To better understand the process taking place, consider the issue of tracking the mouse with the X cursor. The following list of events take place:
One factor that would improve performance is to go to a binary format for the pseudo input device messages. If a -debug option is used to Xmem, all input messages are printed. In this case, the time spent in printf would make fast sequences of events such a mouse tracking quite sluggish. Similar performance is probably still being lost by having to run scanf on the input x,y values. At the current time however, response was good enough not to worry about this issue.
The ability to run an X server within an X window is itself interesting. It allows the testing of changes to server code, or experimentation with new window managers, each without having to shutdown the work environment. It also provides an alternative to the ``rooms'' idea of multiple desktops\cite{henderson-1986-rooms}, where here an entire X server full of applications can be saved as an icon and revisited by deiconifying it.
The approach used here was to use the 2-D array of pixels that make up the X display as a texture, and map that texture onto a polygon (or polygons) in the virtual world. The existence of hardware accelerated textures makes this possible. For general text based applications, a minimum X display size of 512 by 512 pixels seems reasonable, resulting in 256K pixels in the texture (though in some cases 256 by 256 might be adequate). For reasonable interaction we would like to be able to redraw this texture 10 or more times per second. Every time the X display contents change, the texture would have to be modified or redefined.
Silicon Graphics IRIS 4D workstations were used to test this texture mapping idea. A polygon is created in a GL program, and the shared memory X display is used as a texture and applied to the polygon. The user can then walk around in the virtual world and see the X display(s). These X displays could appear in many different forms: as large screen televisions mounted to the wall of a room, as tablets that can be picked up and carried around, as see-through screens floating in space or following the user as a heads-up display (HUD), or something very unusual such as an X server on each panel of a soccer ball.
The Silicon Graphics GL library allows several texture features to be controlled. For example, the user can define textures with or without alpha components. Alpha controls the blending of textures with the background, and is the means for creating transparent X displays. Another important factor is the method used to scale up, or filter down, textures elements that are smaller or larger than their drawn size.
Without filtering, a texture drawn smaller than its native size will become aliased. For text displays, these rapidly become unreadable. A high quality filtering operation, such as trilinearly interpolated multi-image pyramid (MIP) maps\cite{williams-1983-pyramidal}, provides the most legible output. When drawing textures larger than their native size, they can either be interpolated up, or pixels can be replicated by choosing the nearest neighbors. Experience showed that interpolation works best for opaque displays, but nearest neighbor looked better for transparent displays since it avoids the blurring at the edges of every drawn element.
Overall, mixed results were achieved. For one, it takes almost one second to redefine a 512 by 512 texture on the IRIS. Once defined, it can be redrawn at high speed, but interaction with the X server is limited by this one second turn around time whenever the X display memory changes. On current hardware, texture memory is also limited, intended more for small replicated patterns than full screen sized images. Low-end machines such as the Indigo produce full resolution, trilinear filtered images, but at the expense of very slow rendering speed since these machines have no hardware acceleration for texturing. High-end machines such as the VGX, while providing very high speed redraws of the texture, limit MIP mapped textures to a maximum size of 128 by 128, which is inadequate for text applications. You can resort to tiling many small textures over the face of the server polygon, but this may result in texture swapping. A RealityEngine has not yet been tried, but it too has limited texture memory (4 MB). In all of these cases, changing the contents of a texture takes nearly a second, which is far too long for highly interactive requirements.
Linkage between X Server and Virtual Space
The single most important thing on which the user depends on fast feedback when using X is the pointer location. Without this, it is impossible to know which button you are going to push, or which window you will be typing in. Because of this, and due to the one second turn around time with the server, it is planned to handle a pointer symbol directly within the virtual world. This pointer is drawn on top of the X display polygon in the appropriate location. Using such a pointer, the user can have immediate interactive feedback of the pointer location. All pointer movements are still sent to the X server, but we no longer depend on X server output to see the pointer location.
An obvious analogy to the mouse is the position of the user, or some part of the user such as their hand. This 3-D position can be projected onto the plane of the current X display, and the corresponding 2-D coordinates used as the pointer position. Mouse button presses can come from collisions between the user and the server polygon. The system in which this was tested has full object--object collision detection. When an object strikes a server polygon, either the user themselves or a thrown object, a Button 1 press and release is sent to the X server. This is usually the ``select'' operation and is all that is needed to run some GUI applications.
Far more flexible to the above is to give the user a 3-D mouse like device, such as the Ascension Bird, the Flying Mouse, etc. The user then orients the hand held mouse to move the X pointer, and has mouse buttons directly available to them. This is the preferred input strategy for anything beyond trivial applications. Feedback can be provided by drawing a line from the users hand in the virtual space, in the direction they are pointing, analogous to a laser pointer.
Keyboard input is more difficult to handle than a mouse. There have yet to be virtual reality input devices that can substitute the for general utility of a keyboard. While there is a definite trend toward making all software usable without a keyboard, there are many things that are still best done by typing (at least for those that are typists). In the present system, when the user needs to type on a keyboard, an actual workstation keyboard is used, and the events are passed to the selected server as discussed above.
X Server in a Virtual Environment
Virtual environment applications all need to do some things in common. They need to display 3-D graphical output, and get control information through some mechanism, either traditional 2-D or new 3-D input devices. They may also want services such as 3-D audio output. To keep from writing support for these services into each and every application, several virtual reality ``operating systems'' have been proposed (e.g. VEOS and dVS). These operating systems provide an interface to peripherals and graphics display services.
Most virtual environment systems to date have been single applications, perhaps implemented as multiple communicating processes, but still with a single overall application running at one time. Some systems allow different applications to be selected, stopping one and starting another, much the way that many microcomputer operating systems allow the user to toggle between different processes, but only run one at a time. As virtual environment systems mature, the need to support multiple independent, and simultaneously active processes will become more important.
While implementing the system described in this paper, it became apparent that a model like the X11 server was not a bad one for a general virtual environment system. The X server deals with many of the issues needed for independent processes to coexist while sharing one set of hardware, as is the case for a user in a multi-application virtual environment. Applications (clients) connect to the virtual world server, which allocates them some of the shared 3-D space. They indicate which input events they are interested in, and ask the server to carry out 3-D drawing commands. Conventional 2-D windows would be a special case for such a server. The pros and cons of operating system models versus extended window system models for virtual environments is being explored.
A more ambitious project would be to capture the ``output'' of the X server at a higher level than the pixel display memory. In this way, actual drawing commands could be implemented inside of the virtual space, rather than bringing over the resulting pixels. An exciting possibility for this type of linkage, would be for the PHIGS Extension to X (PEX)\cite{thomas-1989-pex,gaskins-1992-phigs}. It should be possible to take the 3-D geometric information passed to the server via PEX, and actually display it as 3-D geometry within the virtual environment. The same could apply to other 3-D graphics extensions to X.
If graphics workstations were extended to allow rendering into non-orthogonal viewports, several tricks could be performed. In this paper, we are using textures as a way of placing a rendered image into a projected space. If such a warping were available at the end of the graphics pipeline, as an alternative to the 2-D viewport scale and clip, graphics could be rendered directly onto the 3-D display ``polygon.'' The same projection could allow mirrors to be implemented (render the scene as seen by the mirror into the mirror polygon), and things such as live video within a window to be placed into a 3-D environment.
At least three developments would help make this system more useful. Foremost is to increase the server display update rate, either by hardware improvements in the speed at which texture contents can be changed, or by a new approach allowing direct rendering into non-orthogonal view windows. Second, the resolution of most head-mounted display devices is far lower than that of workstations\cite{deering-1992-hires}, making it difficult to read text information from within virtual environments. Finally, improvements in input devices such as 3-D mice and head tracking techniques, would make control strategies for 3-D ``desktops'' easier to implement.