How many stars are there in the night sky? A hundred? A thousand? Ten thousand? A hundred thousand? We could just go to Wikipedia, but that’d be lame. We could also go out tonight (provided the weather is nice) and try to count. It would be an exercise in patience, and ultimately, in futility. Mad respect to those past astronomers who catalogued the sky with little more than their eyes and a piece of paper. Maybe we could select a small part of sky, count the stars, then extrapolate from there. But no, we won’t do any of that today.
Today, we will look at the results of the Hipparcos survey, one of the first, large scale surveys of our galactic neighborhood. More than 118’000 stars have been observed over extended periods of time, and their brightness and parallax were measured. Follow-up studies have blown this initial survey out of the water, with up to billions (!) of objects in the night sky sampled — but that is too much data to go over here. For now, we will simply obtain, parse, and analyze the data from the more humble Hipparcos project, and try to answer the initial question: how many stars are there in the night sky?
The Hipparcos Survey
Hipparcos, or the High Precision Parallax Collecting Satellite, and also a play on the name of the ancient Greek astronomer Hipparcus, was a mission by the European Space Agency with the goal of mapping the motion, brightness, and parallax of more than a hundred thousand stars. The satellite launched in 1989, and for 4 years analyzed the 118’000-something predefined objects. The majority of the stars to be observed were selected based on their apparent magnitude — practically all stars potentially visible to the naked eye were included in that input list.
It took another 4 years to comb through the observational data, and compile a catalogue of the positions, velocities, as well as distances of the studied stars, in addition to a plethora of photometric data. The catalogue can be accessed here (in a plain text format), with an explanatory document here. As you can easily see, it is a mess.
Wrestling the Data into a DataFrame
Let us first inspect the data file:
The columns are separated by the | character, and most relevant data appears to be in its own column. The exception here is the position of the star, which is given by two values (right ascension, or RAdeg
, and declination, or DEdeg
, both measured in degrees) in a single column. Not shown in the image above, but also present, are the standard deviations of all astrometric data. Likewise, photometric quantities are included for most stars, however, those won’t be relevant for us now.
We thus import the relevant libraries (pandas, numpy, and pyplot, notably):
import pandas as pd import numpy as np from matplotlib import pyplot as plt
And start with reading the input data as a CSV file. We specify the delimiter to be |, skip the first 11 as well as the last row, and tell pandas to ignore the header. Next, we want the parser to only care about specific columns. The list of columns passed to the usecols
argument and the list of labels passed to the column names
argument should make clear what data we are interested in:
import pandas as pd import numpy as np from matplotlib import pyplot as plt import seaborn as sns def parse_input_file(filename): data = pd.read_csv(filename, delimiter='|', engine='python', skiprows=11, skipfooter=1, index_col=0, header=0, usecols=[1, 4, 7, 9, 10, 11, 12, 13, 14, 15, 16], names=['index', 'magnitude', 'raw position', 'parallax', 'proper motion alpha', 'proper motion delta', 'error alpha', 'error delta', 'error parallax', 'error motion alpha', 'error motion delta']) # convert position string into two floats for the two position coordinates data = parse_raw_position(data) # remove rows with NaN values (data missing) data.dropna(inplace=True) # convert all miliarcsecond values to degrees data = convert_to_degrees(data) # reinterpret magnitude values as floats data['magnitude'] = data['magnitude'].astype(float) return data
Once the data is read in, we need to take care of a few issues. First, the position of each star is determined by two numbers present in the same column. We need to take the 'raw position'
column and apply the string split function to each entry. We specify expand=True
in order to split the original column into several new ones. The values are to be interpreted as floating point numbers instead of strings. Finally, the original column is dropped:
def parse_raw_position(data): data[['alpha', 'delta']] = ( data['raw position'] .str.split(expand=True) .astype(float) ) # drop original column data.drop('raw position', axis=1, inplace=True) return data
Next, we convert all input provided in miliarcseconds (mas) to degrees. The relevant columns are multiplied by 1000 to go from miliarcseconds to arcseconds, and then by another factor of 3600 to obtain the values in degrees:
import pandas as pd import numpy as np from matplotlib import pyplot as plt import seaborn as sns def convert_to_degrees(data): columns = ['parallax', 'proper motion alpha', 'proper motion delta', 'error alpha', 'error delta', 'error parallax', 'error motion alpha', 'error motion delta'] # divide all miliarcsecond values by 3600 and 1000 to converto to degrees for column in columns: data[column] = data[column].astype(float) / 3600000.0 return data
Now that all data entries are of the correct type and have consistent units, we can visually check out the catalogue by looking at some plots:
data = parse_input_file('data.txt') plt.hist2d(data['alpha'], data['delta'], bins=125, cmap='Blues') plt.xlabel('ascension (°)') plt.ylabel('declination (°)') plt.xticks([0, 60, 120, 180, 240, 300, 360]) plt.yticks([-90, -60, -30, 0, 30, 60, 90]) plt.show()
Interestingly, the stars are not distributed uniformly across the sky. First of all, it is important to realize that the position of stars in the sky is given by two coordinates: right ascension and declination. These are spherical angular coordinates, and thus can not be straightforwardly mapped to a rectangular plot. Hence, the star density in the very north and south (top and bottom, respectively) seems to be lower — but this is just an artifact of the coordinate system used. What is more interesting are the two ‘snakes’ of higher density weaving their way through their plot. The ‘U’ shaped one corresponds, presumably, to the Milky Way, the sideways projection of our Galaxy that appears as a bright band of stars stretching across the night sky. As the Earth’s rotation axis (defining the North and the South poles) is not perpendicular to the Galactic disc, the straight band is warped into a wavy pattern in a ‘geocentric’ coordinate system. However, it is not clear to me what the less apparent ‘‘ shape corresponds to. I already contacted an astronomer friend of mine, and hope to be able to update you soon on the origin of this second ‘snake’.
Distance
I was mentioning the position of the stars (in the night sky), as well as their distance. However, till now we only saw parallaxes. What is this parallax? And how do we find the distance of the star from it?
Parallax is the physical phenomenon which, in normal words, is related to the fact that objects appear to be in different places when observed from different angles. Hold an upright finger in front of your face, and close one eye. Then switch eyes without moving the finger — it moves with respect to the distant background. This effect can be used to measure astronomical distances. As the Earth orbits the Sun, it moves in a circle of radius 1 AU (around 1.5 million kilometers). Hence the same observation done half a year later is carried out from a vantage point some 2 AU away. The following graphics illustrate this:
The distance (in AU) of a star can then be obtained from the parallax simply as:
In Python, we can calculate it as follows:
def calculate_distance(data): data['distance'] = 1.58125e-5 / np.tan(np.pi / 180.0 * data.loc[data['parallax'] > 0.0000001]['parallax']) return data data = calculate_distance(data)
Note that we convert the degrees into radians, then multiply by a conversion factor leading to a distance measured in lightyears. We just need to be careful about negative parallaxes, which are, as the program supervisors note, a consequence of measurement errors. Let’s look at the distribution of the stars sampled in the Hipparcos survey as a function of distance:
plt.hist(data['distance'], bins=100) plt.xlabel('distance (ly)') plt.ylabel('count') plt.ylim([0, 10000]) plt.show()
The closest star is indeed Alpha Centauri, located, well, where it is supposed to be located, at a distance of 4.22 lightyears:
data.iloc[data['distance'].argmin()] magnitude 1.101000e+01 parallax 2.145361e-04 proper motion alpha -1.048789e-03 proper motion delta 2.133778e-04 error alpha 3.638889e-07 error delta 4.194444e-07 error parallax 6.722222e-07 error motion alpha 4.222222e-07 error motion delta 5.055556e-07 alpha 2.174489e+02 delta -6.268135e+01 distance 4.223016e+00 Name: 70890, dtype: float64
Visible Stars
The brightness of stars is measured in magnitudes. A magnitude is a number that describes how much brighter or dimmer a given object is with respect to an agreed upon reference. It is a logarithmic scale, and is set up such that a difference of 5 magnitudes corresponds to a factor of 100 in the brightness of the objects. Moreover, somewhat confusingly, brighter objects have a smaller magnitude. And so, in order to filter only the visible stars, all we need to do is to isolate those with a magnitude of around 6.5 and less. A magnitude of 6.5 is on the edge of being seen by a human eye under perfect conditions, and is thus chosen as the cutoff:
def filter_visible(data, cutoff_magnitude): visible = data[data['magnitude'] < cutoff_magnitude] return visible visible = filter_visible(data, cutoff_magnitude=6.5)
We can now plot a subset of the night sky. How about the rectangle of ascension between 10h and 14h, and declination between 40 and 70 degrees? Recognize the pattern of stars?
brightness = np.power(10, -(visible['magnitude'] - visible['magnitude'].min()) / 10) brightness = (1 - brightness).to_numpy() plt.scatter(visible['alpha'], visible['delta'], c=brightness, s=brightness * 4, cmap='Greys', marker='.', vmin=0.5, vmax=1.0) plt.xlim([220, 140]) plt.ylim([40, 70]) plt.xlabel('ascension (°)') plt.ylabel('declination (°)') ax = plt.gca() ax.set_facecolor((0.0, 0.0, 0.0)) plt.show()
So, How Many Stars?
Let’s look at the number of rows of the visible
DataFrame:
visible.info() Int64Index: 8785 entries, 25 to 118322 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 magnitude 8785 non-null float64 1 parallax 8785 non-null float64 2 proper motion alpha 8785 non-null float64 3 proper motion delta 8785 non-null float64 4 error alpha 8785 non-null float64 5 error delta 8785 non-null float64 6 error parallax 8785 non-null float64 7 error motion alpha 8785 non-null float64 8 error motion delta 8785 non-null float64 9 alpha 8785 non-null float64 10 delta 8785 non-null float64 dtypes: float64(11) memory usage: 823.6 KB
8785 entries. But keep in mind, the data span the whole night sky, half of which is obstructed by our planet at any given point in time. This means that there are around 4400 stars visible on a good night. Of course, when the conditions for observation are worse, the number is smaller (often by orders of magnitude in cities). On the other hand side, I only considered stars the average magnitude of which is below the visibility limit. At any given time, there will be many variable stars which may be visible at maximum brightness despite being too dim to observe consistently throughout the year.
Conclusion
That’s it! 4400 stars visible in the night sky. The snake and bears can go to sleep now.
There’s much more data present in the Hipparcos catalogue. For example, the movement of each star was measured, giving us its speed and direction as it travels through the night sky. Obviously, this velocity is infinitesimal, measured in thousandths of arcseconds per year. But over countless millennia, the motion becomes apparent. Next time, we will try to simulate how the night sky will look like in the distant future!
Until next time!