I'm going to try and express an understanding that is forming in my brain as a convergence of several interests I've expressed through the years.
Recently I've mentioned some first impressions from Vision Science: From Photons to Phenomenology. I noted that it seems like its models of visual perception mirror the way a modern graphics pipeline is structured. Reading further in, that impression remains. It's not entirely surprising - I can't imagine the designers and engineers who built up the modern pipeline were ignorant of theories and practices of computer vision, which is one of the three disciplines covered by this book (the other two are psychology and neurophysiology), and, for example, the explanation for the importance of effects like diffuse shading in a computer graphics guide I have read recently, OpenGL SuperBible, mirror those in Vision Science.
Much earlier in the year, I mentioned that I'd gotten into folklore the previous year, and specifically Edward T. Hall's The Hidden Dimension, which discussed how people from different backgrounds perceive and make use of space differently when they interact with each other, as well as how different cultures and genres express spatial information visually.
Even before that, I reflected on a talk I gave years ago about virtual reality and various types of immersion. I expressed an idiosyncratic and limited version of rejecting the immersive fallacy, the notion that the more video games are able to reproduce the sensory impressions that a real experience would provide, the more players will enjoy and be taken in by the game world.
To this let me add something I've not discussed, mostly because I don't have as much personal experience with it yet: physically-based rendering (PBR for short), which is a set of techniques on the graphics side of video games, film, and other visual media, meant to put sensory reproduction to practice by trying to approach a close physical simulation of the interaction of light with various real materials. This encourages moving towards computationally expensive techniques such as ray tracing to generate the visuals, practically expensive techniques such as photogrammetry to extract light interaction information from real materials, and then storing this information and making use of that when rendering, which tends to increase the memory and processing requirements further.
If you happen to be thinking about the next game or digital movie you're going to create, I think it's worth it to take a step back and ask: what is it that rendering is supposed to do for a video game or a movie? Its main task to provide information to a person, a player or viewer, about a virtual world; or possibly to pass along a mood. Perspective 3D rendering of a view of an environment and objects within it is one way of doing this, but it is not necessarily the best. Even in modern games and realistic visualizations, other forms are used. For example, map applications usually provide orthographic projections rather than perspective projections, because that is a better way of providing regional information in graphical form. Even in realistic modern 3d video games, a lot of information is still provided through UI and menus. What are those? Stylistic, non-perspective descriptions of information, laid over the 3D view, instead of being placed within it. And yet, after a bit of a learning curve, they are a very effective way to present information, better than if a perspective rendered substitute were required.
Making use of these non-perspective, non-simulative representations was a recurring problem for me when working on virtual reality applications. Since control of the visual field is given over to the computer, it needs to provide something close to what your own visual field provides, in terms of reactivity and the organization of information, so entirely stylized overlays, like subtitles placed at a constant place in the visual image, which work pretty well when watching a movie on a stationary display, can be jarring and even lead to simulation sickness. This results in solutions such as placing them inside of artificial objects in the environment, which leads to challenges such as making sure that they are not obscured by other elements in the environment.
But it is possible to think about this differently. Instead of starting with the idea of having a three dimensional world built towards being presented through a perspective projection (with at most a parallel orthographic projection for an automap), it should be possible to create a more sophisticated logical framework connecting them, and decide how to turn this into visual data later.
Lateral thinking about how to render still exists in contemporary game design. Take shadows, for example. Shadows require a bit of work to render, over just ignoring them. But as Vision Science explains, shadows are an important tool for the visual system to extract at least two pieces of information from the environment: their detailed spatial structure, and how far away they are from the viewer. However, these two uses of shadows don't have to be rendered the same way, and sometimes it's best not to. If you've played any modern third-person 3D platformer, you would have noticed that yes, there's naturalistic shadowcasting that is consistent with the lighting conditions, but there's also usually a small shadow right under your character, regardless of the direction of light. That's because when light is coming from above, the vertical distance between the object and the shadow gives you very good information as to where the object is in 3D space, and it also tells you where the object will fall if it were dropped. That's essential when you want your in-game avatar to perform complicated jumps, a basic problem of the platformer genre.
Instead of thinking of shadows as a physical effect that must be simulated as realistically as possible to create an immersive scene, you can think of shadows as a way of conveying information to the player. That is a creative choice. An ideal version of what I would call "perception-oriented graphics" would let you make decisions like this for the overall display of the system - whether you want to display a perspective projection of the space or an orthographic overview - as well as for individual objects or phenomena - like the question of whether to use light-reactive or drop-down shadows. Maybe some day I will have the opportunity to work on something like this.