Stack Overflow AI

This project was done along with Machine Learning Club and an independent study at UW-Platteville written in Python. This project used a machine learning model called a Transformer to generate Stack Overflow answers. This model is originally used by Google to translate languages, in this case the model translates questions into answers. What makes this model work so well is the use of self-attention with masked output to input. Self-attention is a machine learning technique introduced in 2017 used to associate context with sequenced data, also self-attention is becoming the new standard for creating machine learning models for sequential data.

To achieve this python was used with TensorFlow to create a Transformer model. Keras was used along with TensorFlow to create some of the Transformer layers. To get a training dataset Google Big Query was used for getting Stack Overflow questions and answers. TensorFlow Datasets was used to tokenize the data and creating datasets for training.

A more detailed report can be found here.

All source code can be found here.

Example

Question

I have a list of datasets.

dfList <- list(df1,df2,df3)

Each dataset looks like this.

apples, oranges
1,      2
NA,     4

I want to programatically change each dataframe's NAs to 0s. How do I do this?

My code so far...

lapply(
  X = dfList,
  FUN = cbind,
  is.na = FALSE
)

Predicted Answer

You can use dplyr to get the index of the element in the list:

df$list <- data.frame(x=1:10) 
df$list <- data.frame(x=1:10) 
df$list <- data.frame(x=1:10) 
df$list <- data.frame(x=1:10) 
df$list <- data.frame(x=x,y=10) 
df$list <- data.frame(x=x,y=x=y,y=y=y) 
df$list[x=x:10]