Search Results

Search found 10 results on 1 pages for 'travisg'.

Page 1/1 | 1

Implementing algorithms via compute shaders vs. pipeline shaders

- by TravisG

With the availability of compute shaders for both DirectX and OpenGL it's now possible to implement many algorithms without going through the rasterization pipeline and instead use general purpose computing on the GPU to solve the problem. For some algorithms this seems to become the intuitive canonical solution because they're inherently not rasterization based, and rasterization-based shaders seemed to be a workaround to harness GPU power (simple example: creating a noise texture. No quad needs to be rasterized here). Given an algorithm that can be implemented both ways, are there general (potential) performance benefits over using compute shaders vs. going the normal route? Are there drawbacks that we should watch out for (for example, is there some kind of unusual overhead to switching from/to compute shaders at runtime)? Are there perhaps other benefits or drawbacks to consider when choosing between the two?

Read the article
Discovering path through unknown territory

- by TravisG

Let's say all the AI knows about it's surroundings is a pixel-map that it has which clearly shows walkable terrain and obstacles. I want the AI to be able to traverse this terrain until it finds an exit point. There are some restrictions: There is always a way to the exit in the entire map that the AI walks around in, but there may be dead ends. The path to the exit is always pretty random, meaning that if you stand at crossroads, nothing indicates which direction would be the right one to go. It doesn't matter if the AI reaches a dead end, but it has to be able walk back out of it to a previously not inspected location and continue its search there. Initially, the AI starts out knowing only the starting area of the whole map. As it walks around, new points will be added to the pixel-map as the AI corresponding to the AIs range of sight (think of it like the AI is clearing the fog of war) The problem is in 2D space. All I have is the pixel map. There are no paths in the pixel map which are "too narrow". The AI fits through everything. It shouldn't be a brute force solution. E.g. it would be possible to simply find a path to each pixel in the pixel map that is yet undiscovered (with A*, for example), which will lead to the AI discovering new pixels. This could be repeated until the end is reached. The path doesn't have to be the shortest path (this is impossible without knowing the entire map beforehand), but when movements within the visible area are calculated, the shortest and from a human standpoint most logical path should be taken (e.g. if you can see a way out of your room into a hallway, you would obviously go there instead of exploring the corner of your current room). What kind of approaches to solve this problem are there?

Read the article
Spherical harmonics lighting - what does it accomplish?

- by TravisG

From my understanding, spherical harmonics are sometimes used to approximate certain aspects of lighting (depending on the application). For example, it seems like you can approximate the diffuse lighting cause by a directional light source on a surface point, or parts of it, by calculating the SH coefficients for all bands you're using (for whatever accuracy you desire) in the direction of the surface normal and scaling it with whatever you need to scale it with (e.g. light colored intensity, dot(n,l),etc.). What I don't understand yet is what this is supposed to accomplish. What are the actual advantages of doing it this way as opposed to evaluating the diffuse BRDF the normal way. Do you save calculations somewhere? Is there some additional information contained in the SH representation that you can't get out of the scalar results of the normal evaluation?

Read the article
Flickering when accessing texture by offset

- by TravisG

I have this simple compute shader that basically just takes the input from one image and writes it to another. Both images are 128/128/128 in size and glDispatchCompute is called with (128/8,128/8,128/8). The source images are cleared to 0 before this compute shader is executed, so no undefined values should be floating around in there. (I have the appropriate memory barrier on the C++ side set before the 3D texture is accessed). This version works fine: #version 430 layout (location = 0, rgba16f) uniform image3D ping; layout (location = 1, rgba16f) uniform image3D pong; layout (local_size_x = 8, local_size_y = 8, local_size_z = 8) in; void main() { ivec3 sampleCoord = gl_GlobalInvocationID.xyz; imageStore(pong, imageLoad(ping,sampleCoord)); } Reading values from pong shows that it's just a copy, as intended. However, when I load data from ping with an offset: #version 430 layout (location = 0, rgba16f) uniform image3D ping; layout (location = 1, rgba16f) uniform image3D pong; layout (local_size_x = 8, local_size_y = 8, local_size_z = 8) in; void main() { ivec3 sampleCoord = gl_GlobalInvocationID.xyz; imageStore(pong, imageLoad(ping,sampleCoord+ivec3(1,0,0))); } The data that is written to pong seems to depend on the order of execution of the threads within the work groups, which makes no sense to me. When reading from the pong texture, visible flickering occurs in some spots on the texture. What am I doing wrong here?

Read the article
Efficiently separating Read/Compute/Write steps for concurrent processing of entities in Entity/Component systems

- by TravisG

Setup I have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code: Entity { id; map<id_type, Attribute> attributes; } System { update(); vector<Entity> entities; } A system that just moves along all entities at a constant rate might be MovementSystem extends System { update() { for each entity in entities position = entity.attributes["position"]; position += vec3(1,1,1); } } Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system. Problem In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input). Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races. Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point. The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting). A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities. Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time. However, that has disadvantages Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems. Solution? In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities. The Question How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

Read the article
Efficiently rendering to 3D texture

- by TravisG

I have an existing depth texture and some other color textures, and want to process the information in them by rendering to a 3D texture (based on the depth contained in the depth texture, i.e. a point at (x/y) in the depth texture will be rendered to (x/y/texture(depth,uv)) in the 3D texture). Simply doing one manual draw call for each slice of the 3D texture (via glFramebufferTextureLayer) is terribly slow, since I don't know beforehand to what slice of the 3D texture a given texel from one of the color textures or the depth texture belongs. This means the entire process is effectively for each slice for each texel in depth texture process color textures and render to slice So I have to sample the depth texture completely per each slice, and I also have to go through the processing (at least until to discard;) for all texels in it. It would be much faster if I could rearrange the process to for each texel in depth texture figure out what slice it should end up in process color textures and render to slice Is this possible? If so, how? What I'm actually trying to do: the color textures contain lighting information (as seen from light view, it's a reflective shadow map). I want to accumulate that information in the 3D texture and then later use it to light the scene. More specifically I'm trying to implement Cryteks Light Propagation Volumes algorithm.

Read the article
Marching squares: Finding multiple contours within one source field?

- by TravisG

Principally, this is a follow-up-question to a problem from a few weeks ago, even though this is about the algorithm in general without application to my actual problem. The algorithm basically searches through all lines in the picture, starting from the top left of it, until it finds a pixel that is a border. In pseudo-C++: int start = 0; for(int i=0; i<amount_of_pixels; ++i) { if(pixels[i] == border) { start = i; break; } } When it finds one, it starts the marching squares algorithm and finds the contour to whatever object the pixel belongs to. Let's say I have something like this: Where everything except the color white is a border. And have found the contour points of the first blob: For the general algorithm it's over. It found a contour and has done its job. How can I move on to the other two blobs to find their contours as well?

Read the article
"Unclutter" units in RTS game

- by TravisG

For intentional reasons, certain units in the game I'm currently programming don't have any collision detection and response among each other. This enables them to clutter right on top of each other. This is a wanted behavior, since there will be situations in the game when the player does want them to stack like that. However, I want to make the process of uncluttering them easy for the player, so that they just have to press a hotkey or click some button on the screen and have the units disperse just enough so it's easy to select a group of them with the mouse (if they stand on top of each other one mouseclick selects all units). How could I do this without running a brute force N^2 nearest neighbor search on all units?

Read the article
Generating and rendering not point-like particles on GPU

- by TravisG

Specifically I'm talking about particles as seen (for example) in the UE4 dev video here. They're not just points and seem to have a nice shape to them that seems to follow their movement. Is it possible to create these kinds of particles (efficiently) completely on the GPU (perhaps through something like motion? Or is the only (or most efficient) way to just create a small particle texture and render small quads for each particle?

Read the article
Spherical harmonics lighting interpolation

- by TravisG

I want to use hardware filtering to smooth out colors in texels of a texture when I'm accessing texels at coordinates that are not directly at the center of the texel, the catch being that the texels store 2 bands of spherical harmonics coefficients (=4 coefficients), not RGBA intensity values. Can I just use hardware filtering like that (GL_LINEAR with and without mip mapping) without any considerations? In other terms: If I were to first convert the coefficients back to intensity representations, than manually interpolate between two intensities, would the resulting intensity be the same as if I interpolated between the coefficient vectors directly and then converted the interpolated result to intensities?

Read the article

Developer IT

travisg - Developer IT

Search Results

Search found 10 results on 1 pages for 'travisg'.

Page 1/1 | 1

Implementing algorithms via compute shaders vs. pipeline shaders

- by TravisG

Read the article

Discovering path through unknown territory

- by TravisG

Read the article

Spherical harmonics lighting - what does it accomplish?

- by TravisG

Read the article

Flickering when accessing texture by offset

- by TravisG

Read the article

Efficiently separating Read/Compute/Write steps for concurrent processing of entities in Entity/Component systems

- by TravisG

Read the article

Efficiently rendering to 3D texture

- by TravisG

Read the article

Marching squares: Finding multiple contours within one source field?

- by TravisG

Read the article

"Unclutter" units in RTS game

- by TravisG

Read the article

Generating and rendering not point-like particles on GPU

- by TravisG

Read the article

Spherical harmonics lighting interpolation

- by TravisG

Read the article

1