Neutron Scattering Visualization

By: James Shaddix
Date: 11/14/2019
Email: jimmy.shaddix2.0@gmail.com

Task

As of the date shown above, I work for Dr. Kate Ross on a condensed matter physics team at Colorado State University. As part of my work, I was asked to create a data visualization tool for a Neutron Scattering Experiment conducted by the graduate student Gavin Hester. The plan is to build this tool using Python and the Dash Framwork. The Dash Framwork integrates Python's Plotly API with the Flask web hosting library.

  • I was asked to: create a heat map that we can be scanned over, such that, viewers can interactively look at different cross sections of the heat map.

What Is This Notebook For?

  • In order to start building this application, I need to first prototype some plots in Plotly. Than, I can build the Dash visualization application that makes use of these plots.
  • There is an additional problem. I was given relatively little information as to how the data will be presented, and so, before I get started on prototyping plots, I will first investigate the data file that I will be using. I was told that the file contains points that form a grid with associated intensity values from a Neutron Scattering experiment, but I was not informed how exactly this information is presented in the file.

Note:

In this notebook I have created some plotly plots. As a result, In order to run this notebook, you may need to install the plotly-extension for jupyterlab or jupyter notebook (depending on which program you are using).

Configuring The Notebook

Imports

In [1]:
# General Processing
import pandas as pd
import numpy as np

# Visualization
import plotly.graph_objs as go
import plotly.express as px
import plotly.io as pio
import matplotlib.pyplot as plt

# Standard Library
from typing import List

Setting Display Options

In [2]:
%matplotlib inline
np.set_printoptions(precision=2, linewidth=150)

The code below allows for the user to create a display object for displaying multiple pandas dataframes next to one another.

In [3]:
class display(object):
    """Display HTML representation of multiple objects"""
    template = """<div style="float: left; padding: 10px;">
    <p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1}
    </div>"""
    def __init__(self, *args: List[pd.DataFrame]):
        self.args = args
        
    def _repr_html_(self):
        return '\n'.join(self.template.format(a, pd.DataFrame(eval(a))._repr_html_())
                         for a in self.args)
    
    def __repr__(self):
        return '\n\n'.join(a + '\n' + repr(eval(a))
                           for a in self.args)

Inspect The Data

Define the data file to analyze

In [4]:
data_file = "../data/1K0Slice_Integratedpm0p1.csv"

Read in the Data

In [5]:
df = pd.read_csv(data_file,names=["x","y","z"])

Size of the Data

In [6]:
rows, columns = df.shape
print("Data Rows:   ", rows)
print("Data Columns:", columns)
Data Rows:    36461
Data Columns: 3

Inspect Elements

In [7]:
first_five_elements = df.head()
last_five_elements = df.tail()
display("first_five_elements", "last_five_elements")
Out[7]:

first_five_elements

x y z
0 -1.7929 0.002324 10.0610
1 -1.7851 0.002324 10.3220
2 -1.7748 0.002324 9.8859
3 -1.7652 0.002324 9.6727
4 -1.7552 0.002324 9.5339

last_five_elements

x y z
36456 1.7640 0.99742 0.0
36457 1.7738 0.99742 0.0
36458 1.7837 0.99742 0.0
36459 1.7932 0.99742 0.0
36460 1.8024 0.99742 0.0

Get the unique element counts

In [8]:
for key in df:
    print(f"unique {key} count:", df[key].unique().size)
unique x count: 361
unique y count: 101
unique z count: 26894
  • From the information I was given, and based on how few unique values there are in x and y, I am guessing that the x and y values form a grid of points and the z values represent the intensities in the Neutron Scattering Data.

Let's Check!

Plotting Raw Data

I am going to make some plots of the raw data to see if I can make sense of it.

x-data

In [9]:
plt.scatter(range(len(df.x)), df.x, s=1, marker='o');
plt.title("X-Data")
plt.ylabel("x-value")
plt.xlabel("x-index")
plt.show()
  • There is clearly a grid of coordinates, lets plot a smaller range to see if we can make sense of this.
In [10]:
plt.scatter(range(2000),df.x[:2000],s=1,marker='o');
plt.xlabel("x-index")
plt.ylabel("x-value")
plt.title("X-Data (First 2k Values)")
plt.show()
  • This plot is much more illuminating. As can be seen in the plot, there is a pattern of linearly spaced x coordinates that get repeated roughly every 300 elements. This interval likely occures every 361 elements because that's how many unique x elements I found earlier!

LETS CHECK!

If this is true, than every grouping of every 361 elements will be the same as the first grouping of 361 elements.

In [11]:
same_elements = True
for i,val in enumerate(df.x):
    if val != df.x[i % 361]:
        same_elements = False
print(same_elements)
True

Now I will inspect the range of the x coordinates.

In [12]:
print(f"Range(x) = [{df.x.unique().min()},{df.x.unique().max()}]")
Range(x) = [-1.7929,1.8024]

y-data

In [13]:
plt.scatter(range(len(df.y)), df.y, s=1, marker='o');
plt.xlabel("y-index")
plt.ylabel("y-value")
plt.title("Y-Data")
plt.show()
  • If your look closely, you can see that the y values seem to have a stepwise pattern, but lets make a smaller plot to verify this.
In [14]:
plt.scatter(range(2000), df.y[:2000], s=1, marker='o');
plt.xlabel("y-index")
plt.ylabel("y-value")
plt.title("Y-Data (First 2k Values)")
plt.show()
  • Sure Enough, y seems to stay constant for roughly every 300 values. It appears that in our data, a particular y value is chosen and than roughly 300 unique x elements are chosen.

Let's Check!

In [15]:
all_counts = [] # consecutive counts associated with each unique element
curr = df.y[0]  # current unique element
curr_count = 0  # count of the number of consecutive times I have seen `curr`
for val in df.y:
    if curr == val:
        curr_count += 1
    else:
        curr = val
        all_counts.append(curr_count)
        
        curr_count = 1

# Check that the number of uniqu
print("Did all unique consecutive elements show up the same number of times?", len(set(all_counts)) == 1)
print("How many times did each unique element show up consecutively?", all_counts.pop())
Did all unique consecutive elements show up the same number of times? True
How many times did each unique element show up consecutively? 361

Now I will inspect the range of the y coordinates.

In [16]:
print(f"Range(x) = [{df.y.unique().min()},{df.y.unique().max()}]")
Range(x) = [0.0023244,0.9974200000000001]

x vs. y data

In [17]:
%matplotlib inline
plt.scatter(df.y,
            df.x, 
            s=5,
            c=np.arange(len(df.x)) // len(df.x.unique()), 
            marker='o');
plt.colorbar()
plt.xlabel("y-data")
plt.ylabel("x-data")
plt.title("X-Data Vs. Y-Data");
  • The scatter plot above starts using a different color every 361 elements. The plot above is further indication that the x and y values form a grid for the z value intensity data.

z-data

In [18]:
plt.scatter(range(len(df.z)), df.z,s=1)
plt.xlabel("Z index")
plt.ylabel("z-data")
plt.title("z-Data");
  • This data seems rather odd at a first glance. I asked Danielle Harris about what was going on here (she is another graduate student that also works on Neutron Scattering Experiments) and she explained to me that this was normal behavior. She explained that those strange values we are observing come from the elastic line, that is washing out the rest of the data.

Conclusion

Now that I feel comfortable saying that the x and y values are grid coordinates for the intensity values that are represented by z, let's go ahead and make some plots!

Plots

Scatter

To get a better grasp on the data, I will go ahead and make a scatter plot.

In [19]:
data = [go.Scatter3d(x=df.x, y=df.y, z=df.z,
            mode="markers",
            marker = dict(
                color = '#FFBAD2',
                line = dict(width = 0.01)
            )
        )]
fig = go.Figure(data)
fig.show()
  • This scatter plot gives us an interesting view of the elastic line. From this data, we can see that the elastic line occurs near small y coordinates.

Handling The Elastic Line

I need to come up with a way of handling the elastic line, so that it does not wash out all of the data. In order to fix this issue, I will set a maximum value in the z data, so that elastic line appears as being a singular color.

  • First lets look at some of the average values across the y coordinates to see if it is obvious where the elastic line is in the data.
In [20]:
avg_z_by_y = np.array(df.z).reshape((361, -1)).mean(axis=1)
#plt.plot(avg_z_by_y);
data = go.Scatter(y=avg_z_by_y)
layout = go.Layout(
    xaxis=go.layout.XAxis(title="Y Index"),
    yaxis=go.layout.YAxis(title="Average Z"),
    title=go.layout.Title(text="Average Z value for each Y value")
)
fig = go.Figure(data=data, layout=layout)
fig.show()
  • When you zoom in near the bottom left corner of the plot, you can see that the elastic line does not seem to be affecting the data after the $22_{nd}$ y index.

I will now create a histogram of the z data that occures after the $22_{nd}$ y value.

In [21]:
# 22nd yth value
y_no_elastic = df.y.unique()[21]

# Grab z values where we aren't considering values in the elastic line
z_no_elastic = df[df.y > y_no_elastic].z
df_z_no_elastic = pd.DataFrame({"z":np.array(z_no_elastic)})

# plot
px.histogram(df_z_no_elastic, x="z", title="Intensity Distribution Without Elastic Line")
  • As we can see in the data, we have values that are smaller than zero, and I am not sure why the measurment system reported this, because this does not make any sense for intensity data.
    • NOTE: I asked Gavin Hester about this, and he informed me that this is a result of the background being being over subtracted from the data.
  • It's also important to note that there are outliers in the data. I will pick a z value to cap the data that comes well before these outliers, so as not to wash out the rest of the heatmap that I will be making.
  • There is a very large bin that occurs near zero, which indicates that the detection mechanisim is picking up a larger space than necessary to capture the data.

I will now recreate the same histogram with any of the data in the large bin that occures near zero to better view the data.

In [22]:
dz = df_z_no_elastic.copy()
dz = dz.mask((dz.z >= 0) & (dz.z <= (100*(10**-6))))
In [23]:
px.histogram(dz, x="z", title="Intensity Distribution Without Elastic Line And without data z=[0-100u]")
  • If you zoom in on the bulk of the data, you will find that the data appears to take a gaussian shape.

I will now create a cumulative histogram that depicts what percentage of the data is less than a given z value.

In [24]:
px.histogram(dz, x="z",nbins=350,cumulative=True, histnorm='percent', title="Cumulative Intensity Distribution Without Elastic Line And without data z=[0-100u]")
  • After hovering over some of the values in the graph, I found z=0.025 is larger than 99.92% of the data (not including the elastic line, or the zero line) and it appears to be smaller than most of the outliers, so I will take it to be a good boundary value to set as a maximum when I create the heat map.

Creating a New Z value to test things out

In [25]:
# creating a copy of the z data
new_z = df.z.copy()

# adding a minimum value
min_val = 0
new_z.loc[new_z < min_val] = min_val

# adding a maximum value
max_val = 0.025
new_z.loc[new_z > max_val] = max_val

# printing results
largest_values  = np.array(new_z.sort_values(ascending=False).head(5))
smallest_values = np.array(new_z.sort_values(ascending=True).head(5))
pd.DataFrame({"Largest Z Values":largest_values, "Smallest Z Values":smallest_values})
Out[25]:
Largest Z Values Smallest Z Values
0 0.025 0.0
1 0.025 0.0
2 0.025 0.0
3 0.025 0.0
4 0.025 0.0

Heat Map

Below, I have defined a colorscale to use for the heatmap.

In [26]:
colorscale= [
    
    [0, 'rgb(0, 0, 0)'], # black
             
    [0.1, 'rgb(153, 51, 255)'], # purple

    [0.2, 'rgb(51, 51, 255)'],  # blue
             
    [0.3, 'rgb(51, 153, 255)'], # light blue
             
    [0.4, 'rgb(51, 255, 255)'], # teal

    [0.5, 'rgb(51, 255, 153)'], # light green

    [0.6, 'rgb(51, 255, 51)'],  # green

    [0.7, 'rgb(153, 255, 51)'], # yellow green

    [0.8, 'rgb(255, 255, 51)'], # yellow
             
    [0.9, 'rgb(255, 153, 51)'], # orange
             
    [1, 'rgb(255, 51, 51)']
]

I will now create the heatmap. I will also throw in an example line that I will be using to scan over the image.

In [27]:
heat_trace = go.Heatmap(
    z=np.array(new_z).reshape(-1, (len(df.x.unique()))),

    colorscale=colorscale,
    showscale=True #showing colorbar
)

line_trace = go.Scatter(x=[20,20],y=[0,len(df.y.unique())-1], marker={"color":"red"})
data=[heat_trace,line_trace]
heat_fig = go.Figure(data)

# Update The Figure
heat_fig.update_layout(
    #margin=go.layout.Margin(
    #    l=70,
    #    r=50,
    #    b=50,
    #    t=150,
    #    pad=40
    #),
    #paper_bgcolor="LightSteelBlue",
    title=go.layout.Title(text="Neutron Scattering Heatmap", xref="paper", x=0.5),
    yaxis=go.layout.YAxis(
        title="Y Axis Text",
    ),
    xaxis=go.layout.XAxis(
        title="X Axis Text",
    )
)

heat_fig.show()
  • The heatmap came out just fine! There are still some styling issues that can be handled, but this is good enough to start using as a template for the web visualization I will be using. You can see the finished product here